Why I get so many read errors?


zzgus

Recommended Posts

The server has been working with no apparent problems some months.

A month ago I replaced an HD, well I shrink the array without 1 disc. The disc gave me a lot of problems with read errors.

 

Now, one month after the problems, another time read problems with another disc.

 

I havent touch the server in this month, I refer to cables, etc.

 

Can someone take a look at my diagnostics file to see if I have any thing wrong?

 

It would be greatly appreciated.

 

Thankyou
Gus

 

unraid-media-diagnostics-20180919-1323.zip

Link to comment

On healthy WD disks this attribute should be 0, or at least a very low value, usually up to lower double digits can be OK:

 

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    89

 

These together with recent UNC at LBA errors mean the disk had read errors recently, and it will likely have more again in the near future, this is just the last one:

 

Error 9 [8] occurred at disk power-on lifetime: 35417 hours (1475 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 03 30 00 01 6c c3 8e 20 e0 00  Error: UNC 816 sectors at LBA = 0x16cc38e20 = 6119722528

Note that current power on hours are 36397, so these are recent errors.

 

You'll find out if it's OK during the parity sync.

  • Like 1
Link to comment
  • 7 months later...

After some time with apparent no problems got some new errors. Time to take an in depth search for good/bad disks.

Some help will be greatly appreciated.

 

If we look at the smart table on unraid we have the following attributes:

 

#1 - Raw read error rate

#3 - Spin up time

#4 - Start stop count

#5 - Reallocated sector count

#7 - Seek error rate

#9 - Power on hours

#10 - Spin retry count

#11 - Calibration retry count

#12 - Power cycle count

#192 - Power-off retract count

#193 - Load cycle count

#194 - Temperature celsius

#196 - Reallocated event count

#197 - Current pending sector

#198 - Offline uncorrectable

#199 - UDMA CRC error count

#200 - Multi zone error rate

 

and for every #Attribute we have those values:

 

Flag

Value

Worst

Threshold

Type

Updated

Failed

Raw Value

 

What are the most important #Attributes to know if a disk is reliable and what value must I look?
What's the difference between the "Value / Raw Value"? I have seen they differ from disk to disk.

 

############################################################

 

As @johnnie.black said one of the values to be aware off is:

 

#1 - Raw read error rate -> with values to 0 up to low double digits values.

 

I have disks from 0 to 45 to 228 of raw value

 

############################################################

 

Thankyou
Gus

 

 

 

 

 

 

Edited by zzgus
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.