Jump to content

Multiple failed disks


tucansam

Recommended Posts

Disk 15 showed tons of unrecoverable errors, so I replaced it.  It came up as "unmountable" so I checked the box to format it, and clicked format.  Disk 16 immediately went into an error state (red x).  I have two parity disks, so I began reconstruction of Disk 15, and just bought an overnight replacement for Disk 16 (I keep one spare on hand and used it to replace Disk 15).  During the rebuilt, Disk 19 has now thrown  red X!  I now have one freshly formatted disk that was 0.000001% done being reconstructed, and two disks that have just thrown red x's.

 

My only priority at this point is data preservation, all other considerations are secondary.

 

This array has given me absolutely no end of trouble since Day 1, with months or even years of trouble free running, followed by weeks of multiple cascading disk failures, errors, etc.  They all come in spurts.  I've swapped cables, power supplies, etc over the years.  And years.  And years.  

 

I'm do have 20+ disks, which is far too many, and I'm trying to talk myself into dropping $$$$$$ on a brand new sever with the same capacity but a third the number of disks.  Too many variables with this many drives, cables, controllers, etc.

 

In the mean time....

 

The disks that are all failing right now are on an HBA and a SAS expander in a separate chassis.  Not sure if that's a coincidence or not but I doubt it.

 

The red X's have started right after I pulled the tower to swap disk 15 -- while I was in there I reseated all of the cable and data connectors on the disks in that chassis (6 disks total, three have now apparently died).


The original Disk 15 was definitely dead as per SMART.  These other two disks are probably fine (99.9% of my red x's over the decade+ that I've used unraid have been false alarms).

 

Once again.... many disks down.... data preservation key.  How do I proceed?

 

Thank you.

Edited by tucansam
Link to comment

You say that you issued a format command on disk15 before trying to rebuild it?   You would have got a big warning NOT to do this unless you were prepared to lose the disk contents!    The format would have created an empty file system on the emulated disk15 and updated parity to reflect this so a rebuild would just end up with an empty disk as all a rebuild does is make a physical disk match the emulated one.  Do you still have the original disk15 untouched as if it has not completely failed it could be the best chance of recovering the contents.

Link to comment

Yes, I still have the original Disk 15.  I am more concerned about the other two disks that are now showing up as Red X's, although I don't believe them to be truly problematic, as I looked at all SMART data for them.

 

My old procedure for disk replacement was to pre-clear the disk on a separate unraid server (this was years ago).  Then, at some point, the new unraid version started either doing it automatically (I think I remember that being a thing) or I just stopped doing it.  I would pop in a new disk, start the array, unraid would rebuild it, and its off to the races.

 

The last three times I've replaced a disk, this has happened.  The array starts, the disk shows up as "unmountable" and is offline, yet a parity sync starts.  I end up confused and format the disk.  In fact, the very last time this happened, I lost data as well, but I thought it was me mis-remembering something along the way and screwing it up.

 

Right now I have the server powered down with two red X disks and one that needs to be rebuilt.  Plus the old drive that I'm replacing.  

Link to comment

The correct handling of unmountable disks is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.   This applies whether it is happening to an actual physical drive or to an emulated one (which is what you will have is the drive is shown as disabled with a red ‘x’).    A format is never the correct answer if you want to keep any data.
 

In the case of an emulated drive then we always recommending trying the check/repair before attempting a rebuild.   The reason is that since a rebuild only makes a physical drive match the emulated one if for any reason the repair went badly you will at that point still have the physical disable drive available untouched to use as an alternative for data recovery purposes.

Link to comment

Here is current state of things.

 

Disk 15 is the new disk, formerly "unmountable," which got 30-60 seconds through a format before I aborted and shut down the array.  I have the original Disk 15, the one with SMART errors.

 

Disk 16 is physically present, but not showing as assigned.

 

Disk 19 is also red x'd.

 

I have two new-in-box 8TB disks waiting, arrived a few hours ago.

 

I have absolutely no idea how to proceed at this point (two parity disks if it matters).  At the very least I'd like to get 16 and 19 back in the game, and worry about 15 later.  Unless there is a better order to this.....

 

unraid.jpg

Link to comment

Disk 19 reported no errors on the extended test.

 

Disk 16 stopped showing up in the list of disks.  I re-seated all power and data cables (external disks are directly connected to a break-out cable coming from an HBA with external SAS connections) and rebooted everything.  Disk 16 showed up again (still showing as "new disk") and I have started another extended test (it reported "host disconnect" or something to that effect).  It is showing 97 "Report Uncorrected" from previous tests.

 

I'll advise when the extended test finishes -- thank you for your help.

Link to comment

Unless I'm missing it somewhere else, under Settings -> Disk Settings, I set "default spindown delay" to "Never," and the extended SMART on #16 isn't running for more than a few minutes without "Interrupted (host reset)" being the result.

 

I see there is a place to set individual disk spindown as well -- I have set it to "never" for #16 specifically and run extended SMART yet again.

 

Link to comment
On 9/29/2022 at 7:38 AM, tucansam said:

My old procedure for disk replacement was to pre-clear the disk on a separate unraid server (this was years ago).  Then, at some point, the new unraid version started either doing it automatically (I think I remember that being a thing) or I just stopped doing it.  I would pop in a new disk, start the array, unraid would rebuild it, and its off to the races.

FYI - A clear disk has never been required on any version to REPLACE a disk. Unraid only requires a clear disk when ADDING to a NEW slot in an array that already has valid parity. This is so parity will remain valid since a clear disk is all zeros, so has no affect on parity. When ADDING a disk, Unraid will clear it if it hasn't been precleared.

 

For REPLACING a disk, doesn't matter at all what was on the replacement disk since it is going to be completely overwritten.

 

Not formatting an unmountable disk has also been that way on all versions of Unraid, but old versions may not have had good warnings against it. Format is a write operation that updates parity (how could parity be valid otherwise?), so rebuild can only result in a formatted disk.

 

1 hour ago, tucansam said:

A format was begun, but aborted not even a minute into it.

 

I do have the original Disk 15.

Doesn't matter much how long format ran. Format doesn't take very long anyway. It just writes a small amount of metadata to represent an empty filesystem. Hang on to that original disk 15, you will need it to copy its data back to the array after you get the other disks taken care of.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...