Drive Read errors

TimV · January 27, 2021

I just built a system with brand new Iron Wolf 8TB drives and several are throwing the occasional read error out. One drive has been kicked out of the share twice and I'm in the process of returning it as I feel that several hundred read errors are a bit excessive. My first drive had 1 or 2 read errors and hasn't any since. Right now, a 3rd drive just thew out 8 errors after behaving for almost 2 weeks. When I run the SMART tests on them, it finds no errors. If find it hard to believe so many drives could be faulty right out of the box. My 10Tb Iron Wolf parity drive has been a champ. A

I'm running Unraid 6.9.0-rc2 and my 5 drives are going thru an HBA (the drives didn't come with SATA cables like I expected). Could that be the issue? Is there a setting to raise the threshold for a drive getting marked disabled? I'm expecting my new drives to arrive tomorrow and need to hold out until then. lol

THanks.

Tim.

JorgeB · January 27, 2021

Please post the diagnostics after there are errors and before rebooting.

TimV · January 27, 2021

THis is the result of the short test. The long test is still running. I attached a screen print as the below text is poorly delimited.

Attributes

#ATTRIBUTE NAMEFLAGVALUEWORSTTHRESHOLDTYPEUPDATEDFAILEDRAW VALUE

1 Raw read error rate0x000f081065044 Pre-fail Always Never 126051424

3 Spin up time0x0003081080000Pre-failAlwaysNever0

4 Start stop count0x0032100100020Old ageAlwaysNever111

5 Reallocated sector count0x0033100100010Pre-failAlwaysNever0

7 Seek error rate0x000f074060045Pre-failAlwaysNever24871967

9 Po wer on hours0x0032100100000Old ageAlwaysNever456 (19d, 0h)

10 Spin retry count0x0013100100097Pre-failAlwaysNever0

12 Power cycle count0x0032100100020Old ageAlwaysNever20

18 Unknown attribute0x000b100100050Pre-failAlwaysNever0

187 Reported uncorrect0x0032100100000Old ageAlwaysNever0

188 Command timeout0x0032100099000Old ageAlwaysNever21475164165

190 Airflow temperature cel0x0022073050040Old ageAlwaysNever27 (min/max 23/40)

192 Power-off retract count0x0032100100000Old ageAlwaysNever15

193 Load cycle count0x0032100100000Old ageAlwaysNever435

194 Temperature celsius0x0022027050000Old ageAlwaysNever27 (0 19 0 0 0)

195 Hardware ECC recovered0x001a081065000Old ageAlwaysNever126051424

197 Current pending sector0x0012100100000Old ageAlwaysNever0

198 Offline uncorrectable0x0010100100000Old ageOfflineNever0

199 UDMA CRC error count0x003e200200000Old ageAlwaysNever0

240 Head flying hours0x0000100253000Old ageOfflineNever73 (157 82 0)

241 Total lbas written0x0000100253000Old ageOfflineNever7819716734

242 Total lbas read0x0000100253000Old ageOfflineNever68405468088

Edited January 27, 2021 by TimV

JorgeB · January 27, 2021

Diagnostics are under Tools -> Diagnostics

trurl · January 27, 2021

28 minutes ago, TimV said:

I attached a screen print as the below text is poorly delimited.

Diagnostics would have given us everything you posted there plus everything else we asked for.

TimV · January 27, 2021

cygnus-diagnostics-20210127-0935.zip

This is what you're looking for, yes?

JorgeB · January 27, 2021

Looks more like a connection/power problem, but also make sure you update the LSI firmware to latest, you can do that using Unraid.

TimV · January 27, 2021

Thanks for the help. Can you explain a bit more about connection/power problem? I'll get that firmware updated.

TIm.

JorgeB · January 27, 2021

Check/replace both cables.

TimV · January 28, 2021

Got the firmware updated. I opened the case to add M.2 cache and it seemed like the SAS cable had a subtle click when I checked the connection. That might've been the problem.

Everything ran fine for about 16 hours and then I got another set of 8 errors on the same drive as yesterday as well as 1 error on a 2nd drive which kicked it immediately out. Maybe it's the cables, maybe it's the LSI card. I'm kinda getting the idea it's in the LSI HBA. I ordered new cables and I'm about to switch to some unused LSI cables. Gotta wait for the pre-clear to finish on the new drives I added.

TIm.

Edited January 29, 2021 by TimV

Drive Read errors

Recommended Posts

TimV

Link to comment

JorgeB

Link to comment

TimV

Link to comment

JorgeB

Link to comment

trurl

Link to comment

TimV

Link to comment

JorgeB

Link to comment

TimV

Link to comment

JorgeB

Link to comment

TimV

Link to comment

Join the conversation