Multi HDD PCB Board failure; worst nightmare


Recommended Posts

I am dealing with possibly my worse fear, which is a complete loss of all on my UnRaid data with no backup available.  Potential loss of about 10TB of mostly movies and and tv shows.

 

This is my own doing, and not Unraid's fault, but still keeping some small hope on recovering my data.

 

Remember. Unraid is great, but when physical damage to HDDs occur, you could be screwed without an actual separate backup which i will plan for going forward.

 

So... to explain

 

In downgrading my server to a smaller case, what was suppose to be a hotswappable 5 bay cage was not what it appeared.  This is/was a new build MB server, CPU, and I used the RAM from my older pc.  

 

When I added in one of the drives to the bay with the power on, there was a power surge that appears to have burned out the PCB boards on all 4 of my HDDS.   None of these HDDs will power on.  I have tried connecting these 4 HDDs direct to the MB with no luck, and can not feel any vibration from them when powering them.  Other HDD's in the home do power up on the MB.  Ironically, the MB and everything else seems ok.  Unraid and USB devices all seem to work.  I should have just turned the machine off first!

 

So, in researching the PCB, i have ordered matching PCB's from these 4 HDDs.  I think I may need to find someone locally to swap the BIOS chips on the PCB's from the original ones to the replacement ones.  A lot of info seems to indicate that swapping the BIOS on the PCB is also necessary.

 

Assuming I can get these HDD's alive and the data is OK on them, I plan to then transition the data off 1 drive at a time to new HDDs.

 

The Unraid setup on the 4 HDDS were 1 parity; 3 data drives.

 

So what would be the best way to possibly move the data from each old HDD to a new one?

 

The easiest would be try and get all 4 drives going again in Unraid, and then swap them out one at time and let it rebuild.  But i'm not sure if that will be possibly or not.

But if I do get the drive's going, what would be best method to attempt to read and copy them to new HDDs.

 

As a note.  the current dead HDD's are Seagate 4TBs.  And I am planning to use WD drives going forward.   Nothing against Seagate, but going to go with a different manufacturer on the new HDs.

 

I am still about 7 to 10 days away waiting on the replacement PCB boards.   And can report back also on how things work out.  But if anyone has any tips or advice it would be very helpful.

 

Thanks,

PT

Link to comment

Ok, so this is a hypothetical question as I read up more on PCB boards and my potential options.


I believe 1 of the PCB boards I have coming is an exact match and will work without the need for a Bios Swap on the PCB.    And that PCB board will match 3 of the my 4 drives, matching the Parity drive and 2 data drives.

 

Would this work....

 

If I were to dd clone the Parity and the 2 data drives to new Western Digital Drives.  And then attempt to start the array, would it be able to rebuild the 4th drive?  The 3 drives would have new model numbers, but in theory they would be identical?

Link to comment
9 hours ago, ptmuldoon said:

If I were to dd clone the Parity and the 2 data drives to new Western Digital Drives.  And then attempt to start the array, would it be able to rebuild the 4th drive?  The 3 drives would have new model numbers, but in theory they would be identical?

If all 3 disks are exact copies and parity is valid yes, also disks need to be same capacity, can't be larger, then you'll need to do a new config and use the invalid slot command, if you get 3 drives working let us know and I'll post the instructions.

Link to comment

@johnnie.black

Thanks for getting back to me and I will let you know how it goes hopefully in a few days when the PCB boards arrive.   

 

As I look into some of the options in copying the data and seeing the progress, any recommendations for using dd and the block speed?    I was reading also that pv is helpful in watching the status?  Does unraid have pv installed and do either the below seem ok?

 

I have 32GB of Ram in the MB to work with.  Below assumes sdb is the old drive and sdc is the new drives to copy to.

dd if=/dev/sdb of=/dev/sdc bs=100M status=progress

dd if=/dev/sdb bs=100M | pv | dd of=/dev/sdc
Edited by ptmuldoon
Link to comment

Before messing with live data, I strongly recommend replicating your copy scenario with other non involved drives, and experimenting with command line options on healthy drives. Take a known good drive with some data in whatever format, could be NTFS, whatever, doesn't matter, and run your copy routine to a second drive. Make sure that things happen how you would expect, as in that the source drive is unchanged and the destination drive becomes a true clone.

 

That way you have an idea about how things should behave before you play with live data.

Link to comment
  • 2 weeks later...

Just want to report back on this as I slowly work to recover my data.

 

I received a matching PCB board for my hard drive; same Model # and firmware.  Doing a pure swap of the PCB board did get the HD to power up but could not read the drive.

 

I then was able to find someone locally to swap the bios chip on the PCB board of the old one onto the new one.   SUCCESS in that I could now read the drive and I am currently dd copying the data off to a new HD.  Its a 4TB drive, and guessing the copy will take roughly 8 to 10 hours or more. 

 

I need to do this 3 more times (at least 1 more data and the parity) for my other HD's so now deciding if I want to wait it out for the other PCB boards to arrive (possibly 30 days) that shipped internationally.   Or possibly use this working PCB board and and continue to swap the Bios Chip from the next HD onto the working board.  I would prefer to wait, but I know the other PCB boards were not a 100% identical match either.

 

So baby step progress, and hoping to get everything recovered in a few more days or weeks.  Fortunately everything is mainly movies, tv shows and media, and no critical data that I need immediately.  

 

And I guess one possible question.   I believe the parity drive is not actually a formatted drive correct?  So will dd copying the parity to a new drive be the same process as copying a data drive?

Edited by ptmuldoon
Link to comment
  • 2 weeks later...
On 5/17/2020 at 3:56 AM, johnnie.black said:

If all 3 disks are exact copies and parity is valid yes, also disks need to be same capacity, can't be larger, then you'll need to do a new config and use the invalid slot command, if you get 3 drives working let us know and I'll post the instructions.

So i'm getting closer here and could possible use some instruction on getting back up and running.  To summarize, I originally had a 1 Parity, 3 Data drive setup, all with 4TB drives.   In my server crash, I have been able to recover and copy the data of 2 data drives on to new 4TB drives.  I expect to get my parity drive running later this week and then copied over to a new 4TB drive.

 

So I will be placing all new drives in the server with clones of the parity and 2 data drives.     And then plan is to rebuild the 3rd data drive.    Is there specific instructions needed in making a new config and the invalid slot command?  

Edited by ptmuldoon
Link to comment
11 hours ago, ptmuldoon said:

Is there specific instructions needed in making a new config and the invalid slot command?  

Since data disk order doesn't matter with single parity these will assume the missing disk is disk3, you need to be on Unraid 6.2.4 or newer.

 

-Tools -> New Config -> Apply
-Assign parity plus the other 2 existing data disks, assign the new disk as disk3, double check all assignments.
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 3 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk3 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

Link to comment
  • 2 weeks later...

Thank you for the instructions.

 

I hope to report back sometime tomorrow or next day with some success.  But a question I'm hoping someone can confirm.

 

Can you actually mount the parity disk outside of the array?

 

For my first 2 data disks that I was able to recover after a bios chip swap on the hard drive PCB boards.  I have confirmed I could see the drive in the motherboard bios, and then also successfully mounted the drive in unraid (command line) to see its data.  Then unmounted that drive and copied the data over new HDs.

 

For the parity drive, the motherboard also sees the drive.  But in unraid, when I try to mount it I get get an error of:

 

"mount (2) system call failed: Structure needs cleaning"

 

And I think/hope that is normal since the parity drive does not actually have any data on it?

 

I am going to start the dd process now to copy the parity to a new identical size drive.    And then hope to get the rebuild with above instructions completed tomorrow.

 

 

Link to comment

OK, Thanks

 

I'm hoping nothing is wrong with the Parity Drive, but I just checked on it and its showing an Input/output error.  It shows

dd: error reading '/dev/sdb' : Input/output error 

5429+1 records in

5429+1 records out

 

It copied roughly 500GB of the 4TB drive.    I don't think this is normal and should expect it to be copying over the full 4TB?    The other 2 data disks went well.  But unsure if the parity copy failed or not.

 

If the Parity failed, I could possible attempt to recover my 3rd data disk, and then start the parity from scratch.  But trying to work with what I have available.

 

Link to comment
9 hours ago, ptmuldoon said:

dd: error reading '/dev/sdb' : Input/output error 

This means there was a read error, you can try ddrescue as it will skip them while trying to get as much data as possible, still if there are errors there could always be some corruption on the rebuilt disk, unless by luck the problem sector(s) correspond to empty space there.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.