davehahn Posted February 8, 2017 Share Posted February 8, 2017 I have 2 disks failing. One is full of data, the other is empty. See screenshot. I would like to remove the empty bad drive from the array without causing parity to need rebuilt - is that possible? The goal is to be able to rebuild the drive that has data without the empty drive causing an issue that makes the rebuild fail. Any suggestions? I guess I should mention I had a power failure, and when unraid booted back up and started a parity check, these two drives are racking up the errors like mad. Parity check is only at 30% and says it will take 72 days to complete... so I feel that at least one drive failing out is imminent. Quote Link to comment
JorgeB Posted February 8, 2017 Share Posted February 8, 2017 You have 3 4 disks with errors, post your diagnostics Quote Link to comment
davehahn Posted February 8, 2017 Author Share Posted February 8, 2017 I have 4 failing disks - the disks are old junk disks and a few read failures doesn't concern me, unraid handles that nicely. But obviously 500K failures on a disk that's empty is no bueno - I want to eject that from the array, and replace the one that has data and has 30K errors. The log leads me to believe it's a media failure and not a SATA cable. I had to break up the diagnostics into 2 files because it was over the 320KB forum attachment limit. Quote Link to comment
JorgeB Posted February 8, 2017 Share Posted February 8, 2017 Of the disks showing read errors 3 appear to be clearly bad, disk18 maybe all readable. Disk12: Device Model: WDC WD20EADS-00S2B0 Serial Number: WD-WCAVY0135415 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 314 197 Current_Pending_Sector 0x0032 198 196 000 Old_age Always - 729 198 Offline_Uncorrectable 0x0030 198 198 000 Old_age Offline - 737 Disk15: Model Family: Western Digital Caviar Green Device Model: WDC WD20EADS-55R6B0 Serial Number: WD-WCAVY1357136 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 6 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 3 Disk18: Device Model: WDC WD20EADS-00R6B0 Serial Number: WD-WCAVY0324118 5 Reallocated_Sector_Ct 0x0033 195 195 140 Pre-fail Always - 33 196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 Disk21 Device Model: WDC WD20EADS-00R6B0 Serial Number: WD-WCAVY0772324 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 770 197 Current_Pending_Sector 0x0032 199 196 000 Old_age Always - 397 198 Offline_Uncorrectable 0x0030 200 196 000 Old_age Offline - 155 A good rebuild it's impossible, best bet would be to try and copy all possible data from those disks. PS: I didn't check SMART for the other disks yet, it's possible there are more. Quote Link to comment
trurl Posted February 8, 2017 Share Posted February 8, 2017 You clearly should have setup Notifications before you let yourself get into this mess. Quote Link to comment
JonathanM Posted February 8, 2017 Share Posted February 8, 2017 I have 4 failing disks - the disks are old junk disks and a few read failures doesn't concern me, unraid handles that nicely. I'm not sure what you mean by read failures being handled, I guess you are ok with losing the data and have good backups? After a bad experience early on, I no longer leave any questionable disks in the array. A known bad disk jeopardizes the ability of unraid recovering the data on any other failed disk. Quote Link to comment
davehahn Posted February 8, 2017 Author Share Posted February 8, 2017 I have 4 failing disks - the disks are old junk disks and a few read failures doesn't concern me, unraid handles that nicely. I'm not sure what you mean by read failures being handled, I guess you are ok with losing the data and have good backups? After a bad experience early on, I no longer leave any questionable disks in the array. A known bad disk jeopardizes the ability of unraid recovering the data on any other failed disk. I mean: From http://lime-technology.com/wiki/index.php/Troubleshooting If your array has been running fine for days/weeks/months/years and suddenly you notice a non-zero value in the error column of the web interface, what does that mean? Should I be worried? Occasionally unRAID will encounter a READ error (not a WRITE error) on a disk. When this happens, unRAID will read the corresponding sector contents of all the other disks + parity to compute the data it was unable to read from the source. It will then WRITE that data back to the source drive. Without going into the technical details, this allows the source drive to fix the bad sector so next time, a read of that sector will be fine. Although this will be reported as an "error", the error has actually been corrected already. This is one of the best and least understood features of unRAID! Yes - I have offline offsite backups and I'm not terribly concerned with the data. I use old disks until they are marked dead. My critical data lives elsewhere. But with the intent of kicking this array down the road another day and not resorting to backups - I can conclude the best course of action with highest probability of success is to rsync the data on the failing full drive to one of the existing empty disks, then drop both the failing full and failing empty drives out of the array. Quote Link to comment
JonathanM Posted February 8, 2017 Share Posted February 8, 2017 Occasionally unRAID will encounter a READ error (not a WRITE error) on a disk. When this happens, unRAID will read the corresponding sector contents of all the other disks + parity to compute the data it was unable to read from the source. This passage assumes ALL the other disks are OK at that sector. With multiple bad disks, there is a good chance the computed data will be wrong, corrupting the written data. But with the intent of kicking this array down the road another day and not resorting to backups - I can conclude the best course of action with highest probability of success is to rsync the data on the failing full drive to one of the existing empty disks, then drop both the failing full and failing empty drives out of the array. Sounds reasonable. Removing all drives with known issues will ensure the best chance of a successful recovery should one of your "good" drives die a sudden death. Quote Link to comment
davehahn Posted February 8, 2017 Author Share Posted February 8, 2017 This passage assumes ALL the other disks are OK at that sector. With multiple bad disks, there is a good chance the computed data will be wrong, corrupting the written data. This is true, there is a good chance - but I have a cronjob that runs on the first of the month that does find /mnt/user -type f -print0 | xargs -0 md5sum > "/mnt/user/scripts/md5sums.$(date +%F_%R)" and I keep a file of hashes from my offsite backups - both as an index, and as a reference to compare to if I encounter a file I suspect may have become corrupt. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.