Hard Drive Failed - Rebuild failed - 4 Drives showing no data


Recommended Posts

I'm absolutely gutted right now. I have a 9 disk array (8+1 parity drive) and one failed. Conveniently the other drives on that controller card decided to not be available after I rebooted (thinking this 'failed drive' after 2 years may be a random unraid issue that required a reboot to fix). After I rebooted the array was completely offline and 4 drives on my LSI card were no longer seen by UNRAID. I did some troubleshooting and decided to replace the card with another LSI in IT Mode (9811i, new flashed successfully right before install). On boot up the LSI card sees the 4 drives attached, I get into Unraid, it sees the 3 original drives and the new drive that I popped in to replace the failed drive. I start it on its way to rebuild the array and go to bed.

 

It finally finished this morning and there were A LOT OF ERRORS (screen grab attached). I checked some folders I knew I had files in from the past few days, and those files are not there. I clicked into the folder icons next to each drive in the array and the disks attached to my motherboard SATA controller all show directories/etc, but when I clicked the folder icon next to any of the drives on the new card, they are empty. Additionally, the drives on the card aren't reporting temperatures. 

 

I had shut down all my docker containers during the array rebuild/check process and went to start them up after it finished. They start but none of the GUIs load in the web browser. I did notice my log file is at 100%, I assume due to the read errors during the parity check.

 

Anything you can think of? I have a new server coming that I was planning on migrating the drives, networking card, and Quadro card over to but now I am concerned that I may have just lost 28TB of data.

 

image.png.fe69a11d1321562a1e425f9b791099a2.png

Screenshot 2021-04-10 101037.png

zeus-diagnostics-20210410-1026.zip zeus-syslog-20210410-0128.zip

Link to comment
16 minutes ago, JasenHicks said:

I did some troubleshooting and decided to replace the card with another LSI in IT Mode

 

Why did you replace the card? The replacement doesn't seem to be working - masses of PCIe bus errors. Is there any reason not to use the original one, which might well have been working properly?

 

Link to comment

The card was replaced because it wasn't being seen by the system. I.e. none of the drives were visible on it and they were being reported as "missing". I couldn't find it in the device listings in UNRAID which led me to replacement. Based on the PCIe bus errors, it looks like something else may be afoot? 

 

 

Link to comment

Could it be an UNRAID specific thing? I see it on boot. Loads up and shows the drives before getting to the UNRAID boot screen. System is in a rack and has been for about 2 years, CPU hasn't been touched.

 

Ill try a different slot to see what that yields but I did move it around last night a few times to get it recognized.

Link to comment
5 hours ago, JasenHicks said:

It finally finished this morning and there were A LOT OF ERRORS (screen grab attached). I checked some folders I knew I had files in from the past few days, and those files are not there.

That many errors suggest the controller dropped (or all the disks there dropped, syslog doesn't show what happened), rebooting or fixing the issue should bring all your data back.

Link to comment

@ChatNoir - I didn't. I will be once I recover.

 

@JorgeB - I think the PCI-E slot on my X399 motherboard failed or something. I spent the last 4 hours migrating my i7700K board, memory and CPU to my UNRAID system, managed to get all drives assigned (except my cache pool), checked parity valid and started the array. It looks like everything is being seen and there is at least data on each of the drives now. I am running the parity check now, I did notice that it found quite a few errors already (perhaps its fixing the replacement drive that's blank). 

Looking at the "MAIN page" no errors yet. I think at this point last time there was a ton. We might be on the right track! 

Once this is done, I suppose I have to troubleshoot the cache pool so that the dockers and such start running properly.

 

I appreciate all the help so far everyone. The community spirit is great :)

 

 

 

image.png

image.png

  • Like 1
Link to comment

So, what do I do when the Parity check finishes but you are missing files? Seems pointless to have a parity drive that's suppose to save you from a single drive failure when it doesn't actually work. Or I am not finished?

 

For awareness, when I click the  folder icon for Drive 7 (highlighted in pink) it's empty, but it shows over 4TB used. I suspect, that's where my missing files are.

 

image.thumb.png.99728ec2b5cd74f45376918463c89285.png

 

tower-diagnostics-20210412-1713.zipimage.png.a0ad31d97709c62a7d0cf7ad3f46b2a7.png

Link to comment
6 minutes ago, JorgeB said:

Is the drive that failed still intact? And are you sure it actually failed, i.e., is is dead or still detected?

 

Also keep in mind that Unraid (or any RAID) is not a backup, you should still have backups of any important data.

 

1. It did report as failed, however, I do have it and I may just shut down and put it back in to see if I can access it. Nothing to lose at this point.

 

2. Yes, I understand that. Nothing mission critical was lost, however, its still not going to be fun recovering it from old backup drives, etc. if I can't get make that work. 

 

I guess ultimately, its probably OK that this happened. It's made me re-evaluate some architecture in the system and gave me an excuse to splurge on another server.

 

- I assume I can just shut down the system, swap in the old HDD #7, and boot it back up re-assigning the old drive as #7?

 

Edited by JasenHicks
Link to comment
1 hour ago, JorgeB said:

First try to mount it with the UD plugin (with the array stopped).

 

 

 

Holy shit.... when I mounted it I was able to see files on it again! Looks like the SMART passed on it too... Now, I think I changed too much to figure out exactly what happened. That being said, I am not risking this long term and will still bring another server online to be safe and start a scratch UNRAID install on it. 

 

So, what's next? You're on this journey with me now @JorgeB and we're close to the end!

 

 

 tower-smart-20210412-0331.zip

Link to comment
8 minutes ago, JorgeB said:

Change that filesystem's UUID (Settings -> Unassigned devices) then you can have the array started together with that disk and compare both disks content, you can for example use rsync.

 

Click this guy?  Should it "do" anything that I can see?

 

image.thumb.png.e5679a7093024a43e54979cd85532fda.png

 

Once I get that done, I want to make sure I do this right:

 

1. Start the Array

2. Mount the ZAD8EYTY drive

3. rsync between the two? Should I do that in the UNRAID terminal? 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.