Large Disk Failure Help


tential

Recommended Posts

I suffered a large disk failure last year.  I kicked my server by accident while cleaning.  I thought I had suffered a MASSIVE data loss, and simply turned off my server, cried, and tried to forget it.

It's been a year, and I'm trying to figure out which drives are safe and which ones need to be trashed.

I opened my unraid server, and only 4 drives were shown missing.  The rest all had glowing smart reports. 

 

What steps can I take to get myself running again?

 

I imagine I need to test the drives and some other things, but I wanted to do it all right.

 

Any suggestions on how to get myself running again?  I did some googling and saw that I need to do something with the new Config to get myself runnig, but I imagine there is more I should do. 

 

What are the necessary steps to thoroughly get myself running again?

 

I have a server rack in an enclosed safe cabinet this time around, so hopefully I won't make the same mistake!

 

Edit: There was no parity drive.  Not that it would stop a 4 disk failure, and not that I shouldn't have one, this time around I will use 2, but well, I didn't use a parity drive.

Edited by tential
Link to comment
  • Replies 208
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

First, unless you legit kicked it downstairs while it was running, you probably knocked connections loose instead of completely crashing the heads. I'd start by individually evaluating the drives in a recovery environment, that is to say connecting only a single drive, and booting a recovery CD that won't try to automatically write to the disk. If the disk comes up readable, I'd try to copy any data you deem important, then after that is done, run a long smart test on the drive, and get another smart report.

 

If by chance all the drives are readable (except for the parity drive of course) and pass smart, then it's just a matter of putting them back in place. As long as NOTHING is written to the drives, then we can recover as many truly bad drives as you have good parity disks.

 

Once you have inventoried the drives one by one, then come back and we can help you set up a plan to either reuse or replace or whatever is needed to get your unraid back in shape.

 

Link to comment
17 minutes ago, jonathanm said:

First, unless you legit kicked it downstairs while it was running, you probably knocked connections loose instead of completely crashing the heads. I'd start by individually evaluating the drives in a recovery environment, that is to say connecting only a single drive, and booting a recovery CD that won't try to automatically write to the disk. If the disk comes up readable, I'd try to copy any data you deem important, then after that is done, run a long smart test on the drive, and get another smart report.

 

If by chance all the drives are readable (except for the parity drive of course) and pass smart, then it's just a matter of putting them back in place. As long as NOTHING is written to the drives, then we can recover as many truly bad drives as you have good parity disks.

 

Once you have inventoried the drives one by one, then come back and we can help you set up a plan to either reuse or replace or whatever is needed to get your unraid back in shape.

 

 

I kicked it pretty hard I imagine.  It only moved a couple of inches.

It was still running for a couple of days, but then things got wonky and I couldn't access my files well anymore.  Then I couldn't start the array as drives were missing, everything is connected though, I checked the connections. 

I don't have parity drives, so I'm out of luck on that front.

 

Is there a way to inventory the drives within UNRAID by connecting them one at a time?

If not, I'll use my windows PC, I just don't have drive bays or anything (stripped the PC out, and tossed the drive bays when I moved last weekend as I just didn't have space for clutter). so I'll have to attach it one bare drive at a time.

Edited by tential
Link to comment
4 hours ago, tential said:

Is there a way to inventory the drives within UNRAID by connecting them one at a time?

Yes, but what I was trying to accomplish was getting the drives isolated out of the unraid environment, to eliminate any other problems besides the drives. Using your windows PC is a good alternative, unplug the windows boot drive, connect the drive you want to evaluate, and boot the recovery CD.

 

You haven't given any sort of description of your hardware, so I am left to guess at what would knock 4 drives offline simultaneously.

Link to comment

My server is a Rosewill Chassis (15 bays)  (8 bays full (9th bay had an HDD but no sata cable attached, I must have forgot to setup that drive)

z87 mobo

4 GB Ram

500w PSU

 

I kicked/tripped over my computer, while it was running, while running no parity drives, thats why the 4 drives knocked off right?

Should that not have happened?

 

I hadn't realized you linked this site "http://www.system-rescue-cd.org/"

For some reason I thought it was a forum ad.  I'll have to work through those drives 1 by 1.  Thanks for the unplug the bootdrive advice too.  Would have been annoying to constantly have to specify boot from USB And hope I don't get into the windows environment by accident.

 

Edited by tential
Link to comment
24 minutes ago, tential said:

I kicked/tripped over my computer, while it was running, while running no parity drives, thats why the 4 drives knocked off right?

Should that not have happened?

Since it was running, yeah, it's possible you killed all 4 drives, but it's also possible something else is going on with connections or the motherboard.

 

It's best to deal with the drives one at a time in a different system so you know for sure what's good and what's bad. Also, with only a single drive spun up in an open case, any nasty mechanical clicking, screeching or scraping noises will be very obvious, instead of trying to differentiate what's happening with all the drives at once.

 

The beauty of unraid in this scenario is that you only lost the files on the drives that are completely bad, and even then if the files are priceless you can send the individual drive out for recovery at a MUCH lower cost than if you needed to recover a standard RAID array.

Link to comment
1 minute ago, jonathanm said:

Since it was running, yeah, it's possible you killed all 4 drives, but it's also possible something else is going on with connections or the motherboard.

 

It's best to deal with the drives one at a time in a different system so you know for sure what's good and what's bad. Also, with only a single drive spun up in an open case, any nasty mechanical clicking, screeching or scraping noises will be very obvious, instead of trying to differentiate what's happening with all the drives at once.

 

The beauty of unraid in this scenario is that you only lost the files on the drives that are completely bad, and even then if the files are priceless you can send the individual drive out for recovery at a MUCH lower cost than if you needed to recover a standard RAID array.

 

Oh yes, thats why I like UNRAID.  Even running my setup like a complete idiot, I should still be ok.  The files weren't priceless, hence why I was running it so stupidly. 

 

I don't have drive bays in my case, so each drive will have to sit bare on my desk.

 

I was dreading this process because I had thought I had destroyed ALL of my harddrives.  I had 9 HDDs in there, and was so upset, I bought 10 HDDs to completely restart my build.  It was only today when I booted into unraid to start working on my server again after a long hiatus that I was surprised to see that I hadn't killed as many drives as I previously thought.  If I can get that data off those drives maybe, I'll be REALLY lucky.   But ya, I don't mind, I'm just happy to almost be up and running again.  My custom cabinet is almost done for my server rack, my shelf for my server for the rack just came today, once everything is put together this weekend, I doubt I'll ever have to open up my cabinet to look at my rack again. 

 

I gotta get a USB tomorrow and start the HDD checks.  I'm going to have a lot of fun tomorrow, thanks for the help, I'll be updating this with my progress.

Link to comment
9 hours ago, pwm said:

Have you just tried to reseat memory, any addon boards, and all SATA/power connectors (and for the SATA connectors reseat both at the motherboard and at the drive)?

 

Giving the computer a real knock may have unseated cables or boards.

 

I'm pretty sure the drives are dead.  The server is usually being written too, and I was low on space, so it would be writing to the drives trying to spread out the little space I had left.  Most likely, the kick just unseated them, and I should have made sure they were all secure right afterward to prevent any rattling.

We'll see though, again, no big deal.

 

So when I'm checking these drives, how do I go about restarting my unraid build and adding my data back?  If the drive is good how do I add it back, vs if the drive is bad obviously nothing I can do about that.

Link to comment

It's possible to hurt the drives mechanically.

 

But it's seldom you also break the electronics in a way that you can't communicate with them.

 

If all cables/cards are reseated, then you would most likely be able to have the drives found during boot. And in that case, you should be able to get SMART statistics from them informing if the drives detects something wrong or not.

Link to comment

Are you planning on using parity this time around?

 

You can do a "new config" in tools, and it will erase all memorized drive positions. Then as you add confirmed good drives to slots, they will be available on array start. As long as you don't have parity, you can continue to do this one drive at a time, setting a new config for each addition. Once you have valid parity in place, adding a drive to an unoccupied slot will zero the drive to maintain parity, unless you set a new config which will then recalculate parity using the data already on the drive.

 

I've always had parity maintained, so I'm unclear whether you can add drives to a non-protected array without doing a new config, but I wouldn't experiment with drives that have valuable data.

Link to comment

ss

2 hours ago, jonathanm said:

Are you planning on using parity this time around?

 

You can do a "new config" in tools, and it will erase all memorized drive positions. Then as you add confirmed good drives to slots, they will be available on array start. As long as you don't have parity, you can continue to do this one drive at a time, setting a new config for each addition. Once you have valid parity in place, adding a drive to an unoccupied slot will zero the drive to maintain parity, unless you set a new config which will then recalculate parity using the data already on the drive.

 

I've always had parity maintained, so I'm unclear whether you can add drives to a non-protected array without doing a new config, but I wouldn't experiment with drives that have valuable data.

Yes, I'm planning on using 2 Parity drives.  I learned my lesson lol.

 

It was my recollection that you can't add a drive with data on it?  Am I wrong in that? I  thought I had to transfer the data over to a newly formatted drive? 

If I can add a drive with data on it to an array.... I've been doing this wrong for so long omg -.-

 

I already have all the drives necessary to fill my server this time around so once this puppy is in place, it's not moving anywhere.

Edited by tential
Link to comment

I would first do a new config and add the data drives only, look at the SMART info, run at least the short SMART test on all disks, if you like after this grab and post the diagnostics, and if all looks OK start the array, you can then try and copy any irreplaceable data first, do a read check on all disks or add the parity disks and begin a parity sync.

Link to comment
7 hours ago, tential said:

Ok, so I can start a new config, add 2 new parity drives to get it started,  then add my in new HDDs as I check them. 

If you set up parity, then the drives you add WILL be erased. You can only add drives without erasing them if you don't have parity.

 

Add the parity drives LAST, after all your data drives are confirmed and in place.

Link to comment

Disk 4.zipDisk 3.zipDisk 2.zipDisk 1.zipDisk 10.zipDisk 9.zipDisk 8.zipDisk 7.zipDisk 6.zipDisk 5.zip

 

So I connected each drive to sata 1 and each one turned on.  So yes Jonathan... you're right, my noob self didn't completely destroy any drive.

 

I have attached the smart reports I'm not sure what's going on at all now.

 

Let me know if these are still ok, SMART says they are, that would be insane though if I got out of this with no data loss....

 

Link to comment
6 minutes ago, trurl said:

It would be a lot easier for us all if you would just go to Tools - Diagnostics and post the complete diagnostics zip. Then we will only have to download one file that will have all the SMART for each disk plus a lot more info.

 

Will that work when I only have one drive in at a time?

I'll do it right now.

tower-diagnostics-20180115-0353.zip

Edited by tential
Link to comment
1 hour ago, tential said:

Will that work when I only have one drive in at a time?

No.

 

Now that you know each drive spins up on its own, CAREFULLY trace and unplug and replug each and every connection, both power and SATA. Also unplug and replug any PCIe cards.

 

After you've pretty much reassembled your server from scratch, then start it up and see how many drives are missing, if any.

 

Get diagnostics after unraid boots. I'm on the fence about whether or not you should attempt to start the array before obtaining diagnostics, maybe @johnnie.black has an opinion on whether starting the array would be beneficial at this stage of diagnosis / recovery. I'm operating under the "first do no harm" principle, and don't want anything to write to the drives yet until we know more about their health and the overall health of the server.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.