Mange Posted September 17, 2018 Share Posted September 17, 2018 (edited) Hi My Unraid has been running flawless for over a year. Yesterday I ran i parity-check and I got warnings for read-errors on 2 drives. The array it still up so I assume the bad sector was rewritten. Today i got another warning of read-errors. Its not a lot (today 4 on one drive, 22 on the other), but I still need help figure out what to do next. Is this signs of 2 failing disks or is it something else? Should i swap out the faulty disks now before it is to late? Would really appreciate some help explaining the errors to me. Diagnostics attached... nas-diagnostics-20180917-1911.zip Edited September 17, 2018 by Mange Quote Link to comment
trurl Posted September 17, 2018 Share Posted September 17, 2018 SMART looks OK. Run Extended SMART test on each disk. Quote Link to comment
JorgeB Posted September 17, 2018 Share Posted September 17, 2018 SMART looks OK Allow me to disagree here, both disks are failing and need to be replaced, both show recent UNC errors (read errors) on SMART and have a non zero Raw_Read_Error_Rate and Multi_Zone_Error_Rate, you want both of these to be 0 on WD disks, like it currently is on disk1 Quote Link to comment
JorgeB Posted September 17, 2018 Share Posted September 17, 2018 PS. you need to replace the SATA cable on your SSD. Quote Link to comment
trurl Posted September 17, 2018 Share Posted September 17, 2018 1 hour ago, johnnie.black said: Allow me to disagree here, both disks are failing and need to be replaced, both show recent UNC errors (read errors) on SMART and have a non zero Raw_Read_Error_Rate and Multi_Zone_Error_Rate, you want both of these to be 0 on WD disks, like it currently is on disk1 Why are those not in the list of SMART attributes monitored by default in Unraid? Quote Link to comment
themaxxz Posted September 17, 2018 Share Posted September 17, 2018 (edited) 1 hour ago, johnnie.black said: Allow me to disagree here, both disks are failing and need to be replaced, both show recent UNC errors (read errors) on SMART and have a non zero Raw_Read_Error_Rate and Multi_Zone_Error_Rate, you want both of these to be 0 on WD disks, like it currently is on disk1 Ok, it seems that this log information is printed by the 'smartctl -l xerror' command. Perhaps 'SMART xerror log' could be added to the 'Self-test' unraid UI menu of the hard drives, as it seems xerror log contains more valuable info not always shown in 'error log'. Edited September 17, 2018 by themaxxz Quote Link to comment
JorgeB Posted September 17, 2018 Share Posted September 17, 2018 (edited) 2 hours ago, trurl said: Why are those not in the list of SMART attributes monitored by default in Unraid? Because not all brands use them the same way, raw_read_error_rate is a multibyte value on Seagates for example, and multi_zone_error_rate is mostly on the majority of WD and some Samsung disks. But for WD disks both attributes are a very good indication of problems, a value of high double digits or above on read_raw_rate_error is very bad news, that together with multi_zone errors, the recent UNC errors on SMART plus the type of error on the syslog leaves me no doubt they are failing, they will both very likely fail an extended SMART test, though there's a small chance one or both could pass, since these errors are not always repeatable unlike pending sectors, but even if they do they will almost certainly fail again in the very near future. Edited September 17, 2018 by johnnie.black Quote Link to comment
trurl Posted September 18, 2018 Share Posted September 18, 2018 2 hours ago, johnnie.black said: for WD disks both attributes are a very good indication of problems, a value of high double digits or above on read_raw_rate_error is very bad news, that together with multi_zone errors So we should customize monitoring on WD to include these attributes? Quote Link to comment
JorgeB Posted September 18, 2018 Share Posted September 18, 2018 So we should customize monitoring on WD to include these attributes?I do, you just need to keep in mind that although 0 is the ideal value for raw read error rate, just having a few errors, usually up to low double digits, doesn't necessarily mean you have a problem, unlike say you get 1 pending sector, but it's never a good sign and if the attribute keeps climbing it will very likely have issues soon, best case some slow sectors that with time usually turn to bad sectors, i.e., read errors. Quote Link to comment
trurl Posted September 18, 2018 Share Posted September 18, 2018 4 hours ago, johnnie.black said: I do, you just need to keep in mind that although 0 is the ideal value for raw read error rate, just having a few errors, usually up to low double digits, doesn't necessarily mean you have a problem, unlike say you get 1 pending sector, but it's never a good sign and if the attribute keeps climbing it will very likely have issues soon, best case some slow sectors that with time usually turn to bad sectors, i.e., read errors. Thanks. Most of my drives are WD Red. I have added those attributes to monitoring on those disks. Quote Link to comment
Mange Posted September 18, 2018 Author Share Posted September 18, 2018 Thanks for help and explanation. Really appreciate it. I will get the 2 REDs replaced. Is there any best practice for doing this? Should i replace the parity first and rebuild and then change the 2nd disk, or should I do it the opposite way? Another question, the 4TB RED have warranty till 2019. Is this something that WD will replace or not? Anyone had any experience with this? Thanks again for all help and feedback! Quote Link to comment
JorgeB Posted September 18, 2018 Share Posted September 18, 2018 Replace disk2 first since it's in worse state, still you are likely to get some read errors on parity during the rebuild, so some corruption on the rebuilt disk, but maybe you'll get lucky. 6 minutes ago, Mange said: Is this something that WD will replace or not? Anyone had any experience with this? They should replace without issues, just say the problem are bad sectors (or UNC at LBA SMART errors), you cal also run an extended SMART test and if they fail is another good reason to use for the replacement. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.