Two Disks Failed


Recommended Posts

I have two disks that have failed in the past 24 hours. (Oddly, smartctl reports 'no errors')

One is a parity (I have dual parity), one is a data drive.

 

My question is how should I go about solving this?

Which drive should I replace first?

Or, should I reconfigure my array with a single parity drive, then replace just the one data drive. Then add an additional parity?

I've got the replacement drives coming on Monday.

image.png.a151e7b0e9ce46a01f3b070d850211ba.png

Parity device:  "Parity Device is Disabled"

Disk 9: "Device is disabled, Contents emulated"

 

Thanks.

 

Link to comment

The drives have been disabled because a write to them failed, not necessarily because the drives have failed.    Frequently the problem is not the drive itself having a problem but an external factor such as the SATA/Power cabling to the drive.   Posting your system diagnostics zip file (obtained via Tools -> Diagnostics) might allow for some informed feedback on this.

Link to comment
On 7/6/2020 at 12:13 AM, TQ said:

Or, should I reconfigure my array with a single parity drive, then replace just the one data drive. Then add an additional parity?

You shouldn't change any config during fault happen.

 

Except :

In this case, I will try rebuild the parity disk first, unplug the disable data disk and keep in untouch. Because rebuild on org. disk success or not won't change anything, but this need some special procedure and any fault could make case even worst.

 

On 7/6/2020 at 12:13 AM, TQ said:

Which drive should I replace first?

 

On 7/6/2020 at 12:13 AM, TQ said:

Parity device:  "Parity Device is Disabled"

Disk 9: "Device is disabled, Contents emulated"

Data disk always should replace first and keep the disable data disk, if recover fail, you may get back data from it.

 

Edited by Benson
Link to comment

Both disks getting disabled at the same time suggests a connection/controller issue, I would start by upgrading the firmware on both HBAs, especially the second one which is very old, and it's where both disabled disks are connected, both disabled disks likely also share a miniSAS cable, so you can also swap/replace that to rule it out, after that and since disk9 is mounting correctly you can rebuild on top and re-sync parity, you can do both at the same time, if it happens again it would be important top see the syslog.

Link to comment
  • 5 months later...

5 month update for anyone following...

 

There were actually bad sectors on the data drive. A rebuild atop itself revealed that.

Here's what I did to fix the problems I saw, not to mention, moving to a new city in the process

 

  • Moved everything to a new case!
  • As suggested by JorgeB, I flashed firmware updates to both HBAs
  • Replaced all SAS cables from HBAs to disks
  • Fired it up and started rebuild on data drive.
    • Failed at 99%
    • Replaced that drive, restarted the rebuild
  • Replaced both parity drives with WD Red Pros

So after all of that, and multiple parity syncs/data rebuilds, I am back in business.

Thanks to you all, @JorgeB, @Vr2Io

  • Like 2
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.