T800 Posted April 25, 2016 Share Posted April 25, 2016 I checked my server the other day and noticed an "array health report [FAIL]" warning and 107 errors on the parity disk and disk 14 had lots or relocated sectors. I ran a parity test which came up fine but it still said there had been 107 errors on the main screen. I've got a 2TB hot spare to replace disk 14 which keeps relocating sectors nearly every time I log in. I'm on a 2nd of 3 cycles preclearing a 4TB to replace parity. I went on this afternoon to see how the preclear was going on and the errors weren't there anymore, it now says 0. Does parity actually need replacing now it says 0? If I have to replace both which do I replace 1st, disk 14 or the parity disk? Thanks Quote Link to comment
Squid Posted April 25, 2016 Share Posted April 25, 2016 There's a big difference between the errors column on the Main tab and the actual health of the drive(s) in question (although they are often related) The errors column is reset with every stop/start of the array and is merely a running counter of the number of read errors the drive has thrown that required reconstruction from the rest of the array drives. SMART reports are what really matters in cases like this. You should really post your diagnostics Quote Link to comment
T800 Posted April 25, 2016 Author Share Posted April 25, 2016 So should I run SMART tests on the parity and disk 14 and post here? Quote Link to comment
trurl Posted April 25, 2016 Share Posted April 25, 2016 So should I run SMART tests on the parity and disk 14 and post here? You should really post your diagnostics Tools - Diagnostics. Post complete zip. Quote Link to comment
T1000 Posted April 26, 2016 Share Posted April 26, 2016 Here we go. tower-diagnostics-20160426-0729.zip Quote Link to comment
JorgeB Posted April 26, 2016 Share Posted April 26, 2016 Parity errors were probably caused by this: Device Model: ST4000DM000-1F2168 Serial Number: W300PBWN 183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1 187 Reported_Uncorrect 0x0032 099 099 000 Old_age Always - 1 Error was recent: 9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 11693 Error 1 occurred at disk power-on lifetime: 11089 hours (462 days + 1 hours) But it passed a extended test after that, so the disk should be ok for now: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 11347 - You have some disks with UDMA_CRC errors, two of the with very high counts, this could be old errors, but would should monitor them for a few weeks, a avlue increase of 2 or more usually means a bad SATA cable. Device Model: ST32000542AS Serial Number: 5XW0N2Q3 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 12721 Device Model: ST32000542AS Serial Number: 6XW1QTW0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 13573 Device Model: ST32000542AS Serial Number: 5XW199NW 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 45 Regarding disk14, reallocated sectors by themselves don't indicate a bad disk, but it could be the start of more issues to come, many don't like having disks like that in the array, but it's up to you, you should at least do an extend SMART test. After, you should also do a parity check. Quote Link to comment
T800 Posted April 26, 2016 Author Share Posted April 26, 2016 Thanks for that. I will swap disk 14 as it's just sat there and I have a few of them. Would it be worth replacing the sata cables? To keep an eye on it do I just run diagnostics in a few week's and compare the numbers? Quote Link to comment
JorgeB Posted April 26, 2016 Share Posted April 26, 2016 To keep an eye on it do I just run diagnostics in a few week's and compare the numbers? Yes, these could be old errors as the value is never reset, so check them once a week for a couple of weeks to confirm if they are stable. Quote Link to comment
T800 Posted April 26, 2016 Author Share Posted April 26, 2016 Thank you for everyone's help, much appreciated! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.