red balled drive (s)

Swixxy · November 9, 2013

Hi All,

I'm running version: 5.0-beta14

I'm having a problem at the moment with drives erroring. I had a drive red ball on me about a week ago, so i swapped it out with a brand new drive and rebuilt the array with no problems, however since then i've had other drives error.

I now have a drive say red ball saying

md: disk3: ATA_OP e0 ioctl error: -5

Does anybody know the reason for this? All the drives that failed are in the same backplane, i've tried using direct SATA connectors instead of the breakout cable, with no luck in changing it.

What could be the issue here? could a faulty backplane be causing this? Or is it much more likely the drives are actually all dying at the same time.

itimpi · November 9, 2013

In my experience when several drives appear to fail simultaneously it seem to most frequently be power related - either the cabling or the power supply itself. In such scenarios it is not unusual for one drive to take down all drives on the same adapter.

I have also seen such symptoms for a badly seated adapter board. I assume in such a case then vibration might cause a momentary interruption in signals to the adapter board.

Swixxy · November 9, 2013

Hi itimpi thanks for the reply.

Is there any recommended way of me testing this out? Or are you saying that it could well be, whatever the problem is has 'killed' the other drives as well & so just check all cabling & replace all drives?

I don't want to put another drive in to then kill that drive, without first fixing the underlying problem. I just cant see how i can single out the cause.

Edit:

Just caught the edit! I will go check all the connectors and boards to make sure they're all properly seated. Thank you

itimpi · November 9, 2013

It is likely that the drives themselves have not failed - just that they have dropped offline for an instance.

The moment unRAID gets a write failure it will redball the drive. Even if you subsequently fix the issue that caused the redball, the drive will remain redballed until you take some action to tell unRAID that the drive contents are OK.

tyrindor · November 9, 2013

It is likely that the drives themselves have not failed - just that they have dropped offline for an instance.

The moment unRAID gets a write failure it will redball the drive. Even if you subsequently fix the issue that caused the redball, the drive will remain redballed until you take some action to tell unRAID that the drive contents are OK.

How do you tell unRAID the drive is OK without doing a new config setup? Something i've never seemed to find.

itimpi · November 9, 2013

How do you tell unRAID the drive is OK without doing a new config setup? Something i've never seemed to find.

The only way I know of is to stop the array; unassign the drive; start the array without the drive assigned; stop the array again; assign the drive; and then restart the array to rebuild the drive from the other drives and parity.

If you are reasonably certain the drive is OK, then the new config approach is normally the fastest.

I often take a hybrid approach where I take the first approach to get the disk invalidated, but then use a spare disk for the rebuild instead of the one unRAID has just redballed. That means I have the original disk (which is probably OK) as a fall back for recovering data in the (unlikely) case where the rebuild fails. If the rebuild succeeds I then put the removed disk through a preclear_disk.sh cycle to check it is really Ok. While the safest approach, this does require you to have a spare drive available.

Joe L. · November 9, 2013

How do you tell unRAID the drive is OK without doing a new config setup? Something i've never seemed to find.

The only way I know of is to stop the array; unassign the drive; start the array without the drive assigned; stop the array again; assign the drive; and then restart the array to rebuild the drive from the other drives and parity.

If you are reasonably certain the drive is OK, then the new config approach is normally the fastest.

I often take a hybrid approach where I take the first approach to get the disk invalidated, but then use a spare disk for the rebuild instead of the one unRAID has just redballed. That means I have the original disk (which is probably OK) as a fall back for recovering data in the (unlikely) case where the rebuild fails. If the rebuild succeeds I then put the removed disk through a preclear_disk.sh cycle to check it is really Ok. While the safest approach, this does require you to have a spare drive available.

Remember though that the drive was taken off-line WHEN A WRITE TO IT FAILED

Therefore, it is guaranteed that its contents are not correct. (remember... a WRITE TO IT FAILED !!!!)

Best bet is to re-construct it. If you force it back online, you should perform a file-system check at the very least.... remember ... a WRITE TO IT FAILED and the file-system might be corrupt.

garycase · November 9, 2013

I don't think Joe emphasized enough that "A WRITE TO IT FAILED" !! 8)

In other words, it's virtually guaranteed that there's at least one bad sector of data on the drive.

You indicated that you've had other errors in the past week -- those were (I assume) just read errors (since you didn't have additional red-balls) ... which means UnRAID wrote that data back okay; but it also indicates that you're definitely having some issues with either your power; your cables; or (since they're all in the same backplane) that specific backplane.

There's no "magic bullet" method of isolating which of those is the issue. You've already tried using SATA cables instead of a breakout connector -- but was that to the backplane or directly to the drives (removed from the backplane)?

From what you've outlined; I'd try removing the drives from that backplane.

Also, post the details of your configuration.

... and while you're at it; upgrade to v5.0 => you're running a very early Beta release.

hackztor · November 11, 2013

Using the reconstruction of data from parity is the only way I know of to get a red balled back.

Is their another way (1st way is still the safest, but is their a quick way to get it back up? Force it back ect)

red balled drive (s)

Recommended Posts

Swixxy

Link to comment

itimpi

Link to comment

Swixxy

Link to comment

itimpi

Link to comment

tyrindor

Link to comment

itimpi

Link to comment

Joe L.

Link to comment

garycase

Link to comment

hackztor

Link to comment

Archived