4 Drives Simultaneously Failed? - [UPDATE] All data recovered!


ekim

Recommended Posts

Just been upgrading my server.

 

Plugged everything back in and on booting 4 of my drives can't be found. I have tried them in different ports, different cables and checked the power is plugged and no matter what I try UnRaid can't seem to find the drives.

 

I don't have any way to test the drives outside of UnRaid at the moment. Does anyone have any ideas of anything else I can try? What happens if I can't get the drives to be found again I have dual parity but obviously it isn't possible to recover from a four drive failure.

 

*edit * for clarity I have 8 data drives and 2 parity drives in total. the 2 parity drives and 4 other data drives are found fine.

 

*UPDATE* 26th December

 

Took me a while, but managed to get the array back up and running without losing any data. I got some advice on the HDDGuru forums that two of the drives might be salvageable if I removed one of the diodes and soldered over a resistor on the PCBs.

 

I got a local electronics shop to do the work for me, and having bought and precleared four new drives, I plugged in the repaired drives on Christmas Eve. They both showed up in Unraid right away. I was able to use them to rebuild the data on the other two failed drives. After that rebuild had completed I took them out and built the array on the final two new drives.

 

As far as I can tell I haven't lost any data at all! My server is now back up and running. It's a Christmas miracle!

Link to comment

Four disks not being recognised sounds suspiciously like one port of a SAS HBA not working or the breakout cable not seating properly but since you say you've tried them connected to other ports perhaps it's something else. Are they SAS disks, perhaps? See here: http://lime-technology.com/forum/index.php?topic=48508.msg500313#msg500313

 

Post your diagnostics so that people can better help you.

 

Link to comment

It looks as though there are three disks plus an SSD connected to your motherboard - all being detected. Four disks are being detected by your LSI card. There's no sign of the other four, which I assume are also connected to the LSI card. So either all four are dead - unlikely - or they have something in common, such as a bad power splitter or a bad connection to the controller.

 

I'd start by checking that they really do have power and by reseating the breakout cable. Sometimes the SAS end looks seated but isn't. If that doesn't work I'd try connecting them one at a time to a motherboard port - keep a track of them by their serial numbers.

 

Link to comment

Correct, I did mentioned them as I wasn't really concentrating on them last night.

 

Just tried putting one of the drives which can't be found in to my old UnRaid build. It's not being found there either - which doesn't look good!

 

Assuming all four drives have failed - which I am beginning to think could be the case. What are my options are keeping hold of the data and starting up the rest of the array.

 

*edit* Just tried another two of the drives. They didn't work either, I had them sat outside of the case and it felt, to me, as though the drives weren't spinning at all. Seems like they are getting any power.

 

I have a HDD Dock arriving tomorrow so I can try them plugged in to my laptop. I am not holding out too much hope though.

Link to comment

I hope something like this hasn't happened to you: https://lime-technology.com/forum/index.php?topic=53341.0

 

If all four really are dead then you have lost data. See later posts in that thread for options for swapping hard drive logic boards. The up-side of unRAID is that the data on the remaining drives is intact.

 

Once I had determined that they truly are dead, what I would do is a New Config. Then assign the remaining working disks (be careful not to mistakenly assign a data disk as parity - check the serial numbers) and let parity re-build. That way you will have a working system and your remaining data will be protected. Then it's a case of buying new disks, pre-clearing them, adding them to the array and restoring your data from backups.

 

Link to comment

Everything is packed away at the moment. So will be taking another look later.

 

The cables from the PSU came with it so there shouldn't have been an issue there. I'll see if I can get anything out of the four disks tomorrow or get any ideas on why they may have stopped working.

Link to comment

Ok, had a look today as I had a USB dock arrive so I could test each of the drives in isolation.

 

I took out the cache drive, my spare drive and the four which are dead.

 

The cache drive and the spare drive both spin up when connected to my laptop (but I am unable to mount them - I tried mounting them in OSX but gave up). I can see the drives in disk utility and the size reports correctly.

 

The other four drives don't spin at all. Only one of the drives smells like burnt plastic. I took the PCB off of that and I can see that one of the chips there has melted a little. The other drives don't smell at all.

 

Given I have so many of the same type of drive I was able to swap a PCB on a working drive which matched with one of the dead drives. When I plugged it into the dock on my laptop the 'dead' drive begun spinning. I couldn't read the data but it seems that mechanically the drive was fine.

 

Just need to work out my next steps now.

 

Would be interested in thoughts on the following.

 

1) Could one drive cause three others to fail?

2) Should I be looking for somewhere else that the issue might have occurred?

3) If I get replacement PCBs should be able to get the array up and running again?

 

Link to comment

Ok, just worked out the problem. I have a Lian Li D8000 and have installed some backplanes to make life easier for me to remove and install new drives. Each of the backplanes has both a molex and a sata power connection. On one of the backplanes I had managed to install the molex connector upside down! Don't know how I managed that.

 

Thinking over what happened again. Two of the drives weren't showing up after I had reinstalled everything. I shut down and as a quick test just swapped them over with two other drives. Obviously doing this killed them one after the other! Whoops.

 

At least I know what the problem was. Just need to see what I can do in terms of getting the data off and hopefully bringing the array back to life!

 

What would be the recommend next steps?

Link to comment

You could try eBay - search for "scrap drives for parts only". The vast majority of dead hard drives still have working PCBs. Or you could buy new drives for your server and move a working PCB around your collection of dead ones in order to copy off the data. It might be an opportunity to rationalise or consolidate your storage.

 

I'm glad you found the problem though. It had to be something along those lines.

 

Link to comment

You could try eBay - search for "scrap drives for parts only". The vast majority of dead hard drives still have working PCBs. Or you could buy new drives for your server and move a working PCB around your collection of dead ones in order to copy off the data. It might be an opportunity to rationalise or consolidate your storage.

 

I'm glad you found the problem though. It had to be something along those lines.

Switching only the circuit boards may or may not work, depends on the specific drive model. The folks at donor drives would be able to tell you whether you need any of the chips or info off of the original boards.
Link to comment

Already looked in to it. I'm going to need to swap the ROM chip from the dead drive on to a donor PCB to get the drive back up and running.

 

I have managed to source the donor PCBs on eBay - but they are from Hong Kong so seeing if I can find any one in the UK who can help.

 

Once the PCBs are changed what level of trust should I have in these drives? Should I be just looking to get the data off and then replacing the drives or once I have the drives back up and running I should be ok to keep them in as is.

 

Also will this have caused any issues with the backplane I had the molex plugged in to or the PSU? I think I have an old SATA drive which I can sacrifice to see if it dies.

 

Just thinking if I could bring just two of the drives back to life and get two additional replacement drives I'll be able to bring the entire array back online with my dual parity drives.

Link to comment
  • 2 weeks later...

Took me a while, but managed to get the array back up and running without losing any data. I got some advice on the HDDGuru forums that two of the drives might be salvageable if I removed one of the diodes and soldered over a resistor on the PCBs.

 

I got a local electronics shop to do the work for me, and having bought and precleared four new drives, I plugged in the repaired drives on Christmas Eve. They both showed up in Unraid right away. I was able to use them to rebuild the data on the other two failed drives. After that rebuild had completed I took them out and built the array on the final two new drives.

 

As far as I can tell I haven't lost any data at all! My server is now back up and running. It's a Christmas miracle!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.