one drive failed, while rebuilding with a new drive, a second drive failed


Recommended Posts

A couple of days ago I had a 2 TB drive fail. I just received a new 8 TB drive today, and installed it. The array was in the process of rebuilding when it all of a sudden was cancelled. A different drive (5 TB) is now reportedly in error state. As soon as I saw this, I stopped the array.

 

Is there a way to recover from this situation? What should I do next?

Link to comment
3 hours ago, jpowell8672 said:

 

Thanks for the quick response. I had a read through those threads, and their situation is a little different. I also have some new information.

 

The two drives that failed for me are Disk 8 and Disk 10. As I mentioned above, after I replaced Disk 8 and while it was rebuilding, Disk 10 failed.

 

Currently, this is what Unraid is reporting:

 

Disk 8:

Orange Triangle

"Device contents emulated"

"Unmountable: No file system"

 

Disk 10:

Red X

"Device is disabled, contents disabled"

"Unmountable: No file system"

 

I could really use some help...

Link to comment

At this point what are my options?

 

I assume worse case scenario is that I will have to build the array from scratch. Will I be able to start with a very small array (parity and one data drive), mount a drive from the old array, and copy files from the mounted old drive to the new array? Then once copied, add the old drive to the array and repeat with the next old drive? Am I correct? Is there a better option?

 

Is there any way to recover files from old Disk 8 and 10? It seems that there is no longer a file system on them. They were XFS. Can I perform some sort of recovery or undelete?

 

Is there a way to recover the old directory structure and file lists?

Link to comment

The data on all the ‘good’ disks will be safe and will not need any sort of recovery.   At the very least you will be able to create an array of those disks and keep their data intact.   The question is what to do about the disks which are in a suspect state to try and recover all their data.
 

There is a good chance the data on the other disks that ‘failed’ may also be recoverable so keep those disks intact.  Do you still have the 2TB disk that you replaced with another disk?   If so it’s data is probably intact if it has not physically failed and thus recoverable.   Have you taken any other action that needs to be taken into account?
 

 

Link to comment

Your assumptions are spot on. When I replaced Disk 8 (2 TB) with the new drive (8 TB), I set it aside, so like you, I assume the data on that old 2 TB drive is still intact.

 

When it was part way through rebuilding Disk 8, it reported Disk 10 had failed, and it cancelled the rebuild. I let it sit for a few minutes while I scratched my head. After a while, a bunch of other disks were reporting failures, so I shutdown. Assuming it was a cabling or power supply issue, I cracked open the case to make sure all cables were seated properly. Then I started up the machine, and noticed that now Disk 8 seemed to be mounted (even though the rebuild only got to around 3%) and it now said it was rebuilding Disk 10. WHAT! I cancelled that rebuild (it only got to around 1-2%). That is the state I am now in.

Link to comment

To to that follow the instructions below carefully:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Assign any missing disk(s)
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 8 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk8 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

 

 

Link to comment
1 hour ago, johnnie.black said:

I meant the diags from when it failed, so we could see the syslog.

 

Assuming disk10 is OK you can re-enable it with the invalid slot command to rebuild disk8, disk8 can be rebuilt on top of the old one or to be safer using a spare.

I don't have the diags from when it failed.

 

I don't think I can assume Disk 10 is OK. I reports as having no filesystem. It seems that a rebuild of that drive started as I mentioned above. Is there a way to see if the drive is OK so that I can re-enable as you mention?

Link to comment

OK. Wow. So it looks like I might be OK! I can't believe it.

 

You have been so unbelievably helpful, I can't thank you enough. I will update you when there is something else to report. The data rebuild is at 0.4%. I will let it run overnight and check in on it in the morning.

 

Again, thank you very much.

Link to comment

:)

Unraid Parity sync / Data rebuild: 2019-10-07 09:33 AM

Notice [UNDROBO] - Parity sync / Data rebuild finished (0 errors)
Duration: 1 day, 23 hours, 20 seconds. Average speed: 47.3 MB/s
Unraid Disk 8 message: 2019-10-07 09:33 AM

Notice [UNDROBO] - Disk 8 returned to normal operation
ST8000DM004-2CX188_ZCT0AQQK (sdc)

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.