CRC errors on multiple disks, one of them is disabled (yes, tried cables swap)


Go to solution Solved by johnny2678,

Recommended Posts

Hi, hoping some kind soul can provide a sanity check.  Been running Unraid 6.9 for >18 months, mostly just hands off.  For my dockers/VMs it just works... I have one 12TB parity with 8 assorted drives (all native WD Reds or shucked WDs) hooked to a SAS HBA card via 2 SAS Breakout cables from Cable Creations.

 

About a year ago, a couple drives in one bank started reporting the occasional CRC error.  At first, it was happening infrequently, then the pace started to pick up.  After reading about it, most suggested to reseat the cables or try new ones.  I pulled the drives, checked the snugness of each one, and then turned it back on.  NO change, still getting CRC errors.  So I replaced the cable.  All drives reporting errors were on one breakout cable so I just replaced that one.  This time I went the a Cable Matters breakout cable.

 

That stopped the CRC errors.  I had a few nice quiet months until about Nov 2021 and then CRC errors started on 2 drives in the other bank.  I quickly ordered a replacement breakout cable, pulled the drives and reseated them all with the new cable.  That was a week ago.  Haven't had any CRC errors since on those drives.  Great right?!

 

But now just a few days later, one 10TB drive in the first bank is reporting CRC errors in the thousands and the drive is going into a disabled state.  Tried reseating the cables, stopping array, unassigning the disk, starting the array, stopping, reassigning, and then starting - but the drive errored out with CRC errors and went back to a disabled state during the rebuild?  Drive is about one year old.

 

questions:

  • Is this drive really dead? Drive is a shucked WD 10TB in use for about 1 year
    • in 10 years running Synology/Unraid, I've never had to replace a drive because it went bad so I'm skeptical.  hoping it's just a config/hardware issue.
  • Is it my cables again?  Am I using the right ones?
  • Is it my SAS HBA card? Ordered from Art of the Server on eBay based on recommendations in this forum

 

Any suggestions on what to try to harden my setup would be welcome.  Thanks in advance

 

edit: grammar

 

15620-beast-diagnostics-20220105-0811.zip

Edited by johnny2678
Link to comment
Jan  4 17:59:48 15620-BEAST kernel: sd 1:0:6:0: Power-on or device reset occurred

These repeating errors suggest a power/connection problem, could be the backplane, or backplane slot, or even the HBA, though less likely, start by using a different slot if available, or swap lots and see if the issue stays with the slot.

Link to comment
43 minutes ago, JorgeB said:
Jan  4 17:59:48 15620-BEAST kernel: sd 1:0:6:0: Power-on or device reset occurred

These repeating errors suggest a power/connection problem, could be the backplane, or backplane slot, or even the HBA, though less likely, start by using a different slot if available, or swap lots and see if the issue stays with the slot.

Thanks so much for the suggestion @JorgeB.  I will try one at a time - first to swap slots.

 

If you don't mind me asking - how do you read the line you quoted?  Is it saying that a device (card) is having power issues?  or the entire rig has a power problem?  Running a corsair 850w with 1 cpu and no gfx (other than intel onboard).  SD 1:0:6:0: refers to what?

Link to comment
20 minutes ago, johnny2678 said:

how do you read the line you quoted?

In the syslog.

 

20 minutes ago, johnny2678 said:

Is it saying that a device (card) is having power issues? 

Not necessarily, like mentioned it's usually a power or connection problem, it's like the device disconnected and connected, could be power, could be SATA.

 

21 minutes ago, johnny2678 said:

SD 1:0:6:0: refers to what?

To the device, in this case disk4

Link to comment
42 minutes ago, JorgeB said:

To the device, in this case disk4

 

Ahh, thank you.  hopefully this means the drive is ok.  I've got a few things to try.

 

Using Rosewill cages that uses 2x 4-pin power from the PSU to power each 4-drive backplane (I think I'm saying that right).  Wonder if that's the issue? 

 

For now, will try and reseat the power cables in both the PSU and backplanes.  Thanks again for your suggestions.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.