Drive Read errors


TimV

Recommended Posts

I just built a system with brand new Iron Wolf 8TB drives and several are throwing the occasional read error out.  One drive has been kicked out of the share twice and I'm in the process of returning it as I feel that several hundred read errors are a bit excessive.  My first drive had 1 or 2 read errors and hasn't any since.    Right now, a 3rd drive just thew out 8 errors after behaving for almost 2 weeks.    When I run the SMART tests on them, it finds no errors.    If find it hard to believe so many drives could be faulty right out of the box.   My 10Tb Iron Wolf parity drive has been a champ.  A

 

I'm running Unraid 6.9.0-rc2 and my 5 drives are going thru an HBA (the drives didn't come with SATA cables like I expected).  Could that be the issue?  Is there a setting to raise the threshold for a drive getting marked disabled?  I'm expecting my new drives to arrive tomorrow and need to hold out until then.  lol

 

THanks.

 

Tim.

Link to comment

THis is the result of the short test.  The long test is still running.  I attached a screen print as the below text is poorly delimited.

 

Attributes

#ATTRIBUTE NAMEFLAGVALUEWORSTTHRESHOLDTYPEUPDATEDFAILEDRAW VALUE

1 Raw read error rate0x000f081065044 Pre-fail Always Never 126051424

3 Spin up time0x0003081080000Pre-failAlwaysNever0

4 Start stop count0x0032100100020Old ageAlwaysNever111

5 Reallocated sector count0x0033100100010Pre-failAlwaysNever0

7 Seek error rate0x000f074060045Pre-failAlwaysNever24871967

9 Po wer on hours0x0032100100000Old ageAlwaysNever456 (19d, 0h)

10 Spin retry count0x0013100100097Pre-failAlwaysNever0

12 Power cycle count0x0032100100020Old ageAlwaysNever20

18 Unknown attribute0x000b100100050Pre-failAlwaysNever0

187 Reported uncorrect0x0032100100000Old ageAlwaysNever0

188 Command timeout0x0032100099000Old ageAlwaysNever21475164165

190 Airflow temperature cel0x0022073050040Old ageAlwaysNever27 (min/max 23/40)

192 Power-off retract count0x0032100100000Old ageAlwaysNever15

193 Load cycle count0x0032100100000Old ageAlwaysNever435

194 Temperature celsius0x0022027050000Old ageAlwaysNever27 (0 19 0 0 0)

195 Hardware ECC recovered0x001a081065000Old ageAlwaysNever126051424

197 Current pending sector0x0012100100000Old ageAlwaysNever0

198 Offline uncorrectable0x0010100100000Old ageOfflineNever0

199 UDMA CRC error count0x003e200200000Old ageAlwaysNever0

240 Head flying hours0x0000100253000Old ageOfflineNever73 (157 82 0)

241 Total lbas written0x0000100253000Old ageOfflineNever7819716734

242 Total lbas read0x0000100253000Old ageOfflineNever68405468088

 

image.thumb.png.e95b9e2eba58454577991aeee1a00685.png

Edited by TimV
Link to comment

Got the firmware updated.   I opened the case to add M.2 cache and it seemed like the SAS cable had a subtle click when I checked the connection.  That might've been the problem.

 

Everything ran fine for about 16 hours and then I got another set of 8 errors on the same drive as yesterday as well as 1 error on a 2nd drive which kicked it immediately out.    Maybe it's the cables, maybe it's the LSI card.   I'm kinda getting the idea it's in the LSI HBA.  I ordered new cables and I'm about to switch to some unused LSI cables.    Gotta wait for the pre-clear to finish on the new drives I added.

 

TIm.

Edited by TimV
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.