Jump to content

Multiple Drives with Errors


Recommended Posts

This morning I found my second parity drive was disabled, due to write errors.  I believed it was a bad cable, swapped to an unused on the same breakout cable, unassigned it, started the array, stopped, reassigned, and left it to sync/rebuild.  This evening I got back to 2 other drives with errors. I paused the rebuild and left the server on, with the array started.  Dockers were not started on this reboot.

 

In the past, I've had two with read errors, but it was neither of these drives.  One was resolved by replacing the drive, as I planned to upgrade it anyway, but the old drive has been used elsewhere without issue.  The second was resolved by replacing the breakout cable.  The cable may have not needed to be replaced, but I did.  Neither were plugged directly into the motherboard. 

 

Those past errors and multiple drives with errors at the same time makes me suspicious of the HBA card, but maybe I'm just this unlucky.

 

Any insight on my current issue and next steps would be appreciated.

Edited by SweetLogan
Link to comment
5 minutes ago, trurl said:

How are these disks powered? Any splitters? Too many on one cable?

There are splitters, with drives spaced across two cables, so this is definitely a possibility.  I think was trying to avoid molex to SATA adapters and was stupid in the process.

 

That's not a quick fix to get a new cable for my PSU, but it's easy enough to do.  Are there any steps I should take, before shutting down my server as is?  Just in case there is another issue or to prevent any data loss.

Link to comment
2 minutes ago, trurl said:

Those are actually preferred over SATA-SATA splitters since Molex can handle more current. 

Well.  Crap.  I read some comments where people identified them as a fire risk and thought I was smart to avoid them.  Plus, the SATA splitters lacked the wire that allowed me to not tape pins for the easy stores I shucked.

 

1 minute ago, trurl said:

Since you have dual parity and only one disabled disk, and all disks are mountable including the emulated disk, should be OK if you can complete rebuild.

Even with Parity 2 mid-rebuild and two drives with errors?  Is that due to drive 7 only being read errors, causing it to not be disabled?

Link to comment
2 minutes ago, SweetLogan said:

I read some comments where people identified them as a fire risk

Any SATA power connectors with molded plastic around the wires instead of IDC or crimped can be a hidden hazard. The 4 pin end is not vulnerable to that issue, the wire separation is huge in comparison.

image.png.35c19009e45a5e9380062c124262a79c.png

 

Link to comment
21 minutes ago, JonathanM said:

Any SATA power connectors with molded plastic around the wires instead of IDC or crimped can be a hidden hazard. The 4 pin end is not vulnerable to that issue, the wire separation is huge in comparison.

 

I've definitely been using some of the bad ones.  Looks like I need new splitters, when necessary, and to make sure I evenly distribute power across the three cables I have in use. (I was incorrect saying two above)

 

Edit to add: Thank you for the info/picture.

Edited by SweetLogan
Link to comment
1 hour ago, trurl said:

Since you have dual parity and only one disabled disk, and all disks are mountable including the emulated disk, should be OK if you can complete rebuild.

Sorry for the second question - Are you saying I resume the rebuild now or shut down and try to rearrange power to determine if that's the issue, so I don't risk another drive erroring out in the remaining rebuild time?

Link to comment

I'm ready to rebuild and would like confirmation that my next steps are appropriate.

 

My plan is to drop parity 2, rebuild on top of the existing disk for the disabled disk 3, and then add parity 2 back after that is successful.  Is that the best course of action?

 

Parity 2 is 18tb and disabled disk 3 is 8tb, so it seems like the fastest way to get back to a protected state.

Link to comment

Fastest way to get back to dual parity would be to go ahead and rebuild parity2 and disk3 at the same time. And once it has gotten to the end of disk3 rebuild you will again be protected by single parity, and it can continue with the rest of parity2 rebuild.

Link to comment
5 hours ago, trurl said:

Fastest way to get back to dual parity would be to go ahead and rebuild parity2 and disk3 at the same time. And once it has gotten to the end of disk3 rebuild you will again be protected by single parity, and it can continue with the rest of parity2 rebuild.

Awesome.  I wasn't sure if it would rebuild two at the same time or if that created problems/risks that wouldn't be there doing them one at a time.  Thank you, again.

Link to comment

Just to wrap up this issue, I'll outline the steps I took.

 

-Removed power splitters where possible. Though they were evenly balanced across 3 cables, only two were necessary (one used for the SSDs).

-Replaced the HBA card.  I believe this was the issue.  Both drives with current issues and the drives I've previously encountered issues with (its good to keep notes) were connected to the same port on the HBA card.  After the last issue, I replaced the cable, so I was confident the card was the problem.

-Replaced the relevant breakout cable from the HBA card.  I had a spare, no reason not to.

 

-Rebuilt parity 2 and disk 3 on top of themselves, simultaneously, in maintenance mode.  No errors/alerts.

-Disk 3 is no longer disabled, but it is unmountable.  Looks like my next step is XFS repair and then recover lost files/sort through the lost+found.  I may swap in a spare drive and rebuild disk 3, again, just in case it was a connection or drive issue in the rebuild.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...