Read-errors on 2 disks


Mange

Recommended Posts

Hi

 

My Unraid has been running flawless for over a year. Yesterday I ran i parity-check and I got warnings for read-errors on 2 drives. The array it still up so I assume the bad sector was rewritten. Today i got another warning of read-errors. Its not a lot (today 4 on one drive, 22 on the other), but I still need help figure out what to do next. Is this signs of 2 failing disks or is it something else? Should i swap out the faulty disks now before it is to late?

 

Would really appreciate some help explaining the errors to me. 

Diagnostics attached...

nas-diagnostics-20180917-1911.zip

Edited by Mange
Link to comment
1 hour ago, johnnie.black said:

Allow me to disagree here, both disks are failing and need to be replaced, both show recent UNC errors (read errors) on SMART and have a non zero Raw_Read_Error_Rate and Multi_Zone_Error_Rate, you want both of these to be 0 on WD disks, like it currently is on disk1

Why are those not in the list of SMART attributes monitored by default in Unraid?

Link to comment
1 hour ago, johnnie.black said:

Allow me to disagree here, both disks are failing and need to be replaced, both show recent UNC errors (read errors) on SMART and have a non zero Raw_Read_Error_Rate and Multi_Zone_Error_Rate, you want both of these to be 0 on WD disks, like it currently is on disk1

 

 

Ok, it seems that this log information is printed by the 'smartctl -l xerror' command.

 

Perhaps 'SMART xerror log' could be added to the 'Self-test' unraid UI menu of the hard drives, as it seems xerror log contains more valuable info not always shown in 'error log'.

 

Edited by themaxxz
Link to comment
2 hours ago, trurl said:

Why are those not in the list of SMART attributes monitored by default in Unraid?

Because not all brands use them the same way, raw_read_error_rate is a multibyte value on Seagates for example, and multi_zone_error_rate is mostly on the majority of WD and some Samsung disks.

 

But for WD disks both attributes are a very good indication of problems, a value of high double digits or above on read_raw_rate_error is very bad news, that together with multi_zone errors, the recent UNC errors on SMART plus the type of error on the syslog leaves me no doubt they are failing, they will both very likely fail an extended SMART test, though there's a small chance one or both could pass, since these errors are not always repeatable unlike pending sectors, but even if they do they will almost certainly fail again in the very near future.

 

 

Edited by johnnie.black
Link to comment
So we should customize monitoring on WD to include these attributes?

I do, you just need to keep in mind that although 0 is the ideal value for raw read error rate, just having a few errors, usually up to low double digits, doesn't necessarily mean you have a problem, unlike say you get 1 pending sector, but it's never a good sign and if the attribute keeps climbing it will very likely have issues soon, best case some slow sectors that with time usually turn to bad sectors, i.e., read errors.  

 

 

Link to comment
4 hours ago, johnnie.black said:

I do, you just need to keep in mind that although 0 is the ideal value for raw read error rate, just having a few errors, usually up to low double digits, doesn't necessarily mean you have a problem, unlike say you get 1 pending sector, but it's never a good sign and if the attribute keeps climbing it will very likely have issues soon, best case some slow sectors that with time usually turn to bad sectors, i.e., read errors.  

 

 

Thanks. Most of my drives are WD Red. I have added those attributes to monitoring on those disks.

Link to comment

Thanks for help and explanation. Really appreciate it.

 

I will get the 2 REDs replaced. Is there any best practice for doing this? Should i replace the parity first and rebuild and then change the 2nd disk, or should I do it the opposite way?

 

Another question, the 4TB RED have warranty till 2019. Is this something that WD will replace or not? Anyone had any experience with this?

 

Thanks again for all help and feedback!

 

 

Link to comment

Replace disk2 first since it's in worse state, still you are likely to get some read errors on parity during the rebuild, so some corruption on the rebuilt disk, but maybe you'll get lucky.

 

6 minutes ago, Mange said:

Is this something that WD will replace or not? Anyone had any experience with this?

They should replace without issues, just say the problem are bad sectors (or UNC at LBA SMART errors), you cal also run an extended SMART test and if they fail is another good reason to use for the replacement.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.