Jump to content

Failed/Failing Drive... HELP!


mp54u

Recommended Posts

OK,  I should no better than murphys law kicking me in the @##$ when I need to migrate my data around.  But none the less,  let's review my idiocy and see if someone can help me out of the situation.  So if you have a free moment,  please take a seat and listen to my idiocy and hopefully this will be an example of what not to do for others migrating to unraid or any other solution for that matter.  But any help would be greatly appreciated.

 

So I started out a couple days ago.  A sunny afternoon.  I was tired of moving data all over the place to keep space on hand and never having peace of mind that my data was protected.      had 3 different nas computers/devices running all over the place and decided I was tired of migrating data all over the place.  Here comes UNRAID to the rescue.  So I don't need like size drives,  It is parity protected, and is cheap and easy to install.  Sold! 

 

  So my journey began BUT....  (there's always one of them right?)  I didn't have enough space free nor enough money to make things work.  So I decided I would start my data mover service (that's me,  cp, rsync, right click copy, right click paste....) and move data off of 3 of my 1TB disks to make a 3TB array in unraid.  This would not be parity protected but thats fine,  were talking 24 hours without parity,  how big of a risk could that be!  (This is where I wish I could tell you I was drinking or some kind of excuse!)  But none the less,  I did exactly that.  I freed up enough space of Mythtv recordings on my MythBackend server to copy the data from the 3 - 1TB disks to my 3TB drive on Mythtv server.  Great.  I have now shutoff my DNS323 and removed 1- 1TB drive from the Mythtv server that was barely used.  I now have 3TB of space to use to start my unraid journey and it is happening fast!  So I kicked off the next round of the "data mover" service and walaaa,  my data is on the UNRAID array.  I have disk1, disk2, and disk3 totaling 3TB.  I was intrigued to watch the shares fill each drive to about 50% and move to the next one.  No more trying to cram things here and there on the many places I had to store my digital treasures before.  I  now have 1 treasure chest to store it all.  Everything was going great.  While the file copies were going,  I had the fun of setting up the MythTV docker container,  getting all of my recordings scheduled before the Wifes shows start recording at 6PM.  Done!  Eazy Peazy.  We are well on our way now.  Data is copied over to the new disks,  Recordings are starting to happen from the docker container.  I am loving life. 

 

  Now,  last night I went to bed with a smile on my face.  I had to head up to the local computer store here today on lunch at work because I needed to buy some more Molex to SATA power adapters so I could move the now Empty 3TB drive from my done for mythtv server over to the UNRAID array and get the parity sync started.  But wait......  What is this?  Who is Murphy and damn it with his laws!  I started to see Smart errors on Disk 2.  I just happened to have logged into the Array from work to check some things out and I noticed this weird read only issue when trying to create a new share.  So I look and there it is.  An error.  My now legendary journey of being the best "data mover" EVER was now for naught.....  I stepped into action.  I was only 1 hour away from home,  and install my shiny parity drive but it might be too late.  We are in the car now, doing 75.  The reality of not knowing what data was on what disk was is setting in and I was wondering what data is getting lost.  Why is my bit and byte karma kicking me here.    So I get back home and run down to the dungeon where I keep all my toys and start looking into things.    I was able to stop the array.  That made me feel a little better.  I thought,  let's shutdown the server,  reseat sata cables,  try different ports, anything.  So I start the machine to the spin click's of a dying drive.  I get into unraid,  it says the drive is missing.  BAM!!!!    There he is.  Murphy and his LAW kicking me in the face.  "But, But,  I have the parity drive and power cable now!"  Too Late my friend.    So I proceed to power off the machine again.  Set it on it's side,  it spins up,  and now it sees the drive.  So I quickly add the parity drive hot and do a reboot not shutdown to make sure it doesn't stop spinning or start the click spins again.  I have now brought the machine back online.  Disk 2 is still unmountable.  Parity is syncing right now.  I am a little worried that parity will be screwed but all the data that was on disk 2 is no longer visible. 

 

I think I set the default file system to btrfs on top of it all.  I have used it quite a bit on linux builds at home and have been pretty comfortable with it.  But usually that is a raid situation.

 

I will later write up PART 2 of what my lessons learned were.    For now,  can someone help me here?  I am about ready to head for the beer but I know the Monster Energy session that is about to take place is probably my better bet. 

 

Can I get anything off this disk?  I have done nothing else but the parity sync is still running.  I am not even sure I should let it finish.  With the drive missing,  what will happen to this 10 hour parity sync?   

 

(Note:  please don't hold my horrible story telling against me!  It sounded better in my head)

 

 

 

nsdata2-syslog-20151023-2009.zip

Link to comment

I think you are out of luck my friend, but I am no expert so let the experts chime in. Its never a good idea to do projects like this when budget is tight, 'if you can't do it properly the first time, why bother'? right? Also, its always better to have more then one copy of your data, just for situations like this. I hope you can get some of your data back, good luck.

Link to comment

I have a full copy of all my photo's so when I saw that,  I felt a lot better.  Those were all on Disk 2 but are the most valuable to me.  But I did loose some music, movies, and tv-shows that I had on there.  Also a bunch of ISO's of OS's and such I used to build virtuals and the like.  But again,  I can re-download most of that software and in the grand scheme of things,  the movies Ive ripped over the years are not that big of a deal.  I also have a copy of my documents/home directories for the family. 

 

I think the panic is when you loose a disk,  since there is the obfuscation of the /mnt/user filesystem in front of all of this,  it is hard to understand what data was lost.  So a bit of panic sets in.

 

I am going to move forward assuming my data is lost.  The disk now has 20 of the 187 smart errors and No filesystem is detected. 

 

Like you said,  I should never have done it the way I did.  I still feel it is like the chance of getting struck by lightning having a disk fail in the 24 hour period of being un protected,  but leaving the data unprotected is exactly why people loose data so I have to live with the decision I made. I am looking forward to having parity protection moving forward and that is the reason I chose unraid.  It is a really nice product when used right.  It is perfect for my home use so I don't have to manage many arrays/devices.  It let's me use a bunch of differing size disks and makes it look like one namespace.  That is the key for me to have with the parity protection.

 

 

So,  I am still looking for some advice on the parity sync that is running.  I am unsure if I should let it run or if I should cancel it,  than eject the bad disk somehow, and restart the parity sync. 

 

 

 

 

Link to comment
So,  I am still looking for some advice on the parity sync that is running.  I am unsure if I should let it run or if I should cancel it,  than eject the bad disk somehow, and restart the parity sync. 
I'd let it run. Unraid parity has no concept of files, only the drive as an entire thing. So, the filesystem being unmountable has no bearing on whether or not an accurate image of what's on the failing drive is being written to parity. If you let parity generation complete, then you can rebuild whatever was able to be read from the failed drive onto a good drive, and then possibly get some of your data back with file system recovery tools. The unmountable drive doesn't mean your data is gone, just that the standard mount command failed for some reason. Knowing which file system type you are working on is critical, so before you try recovery tools, make sure you know what file system the drive was using.
Link to comment

  Thank you for the reply.  I hope someone reads this as they are getting familiar with UnRaid and decides to not cut corners like I did trying to get their data over to it.  If I didn't have my photo's copied to my laptop,  I would have been a whole lot more upset.  In my case,  the media I lost was not a big deal.  It is either replaceable or not that important in the grand scheme of things.  Now that I have been bit though,  I will be extra diligent to do parity checks, and btrfs scrubs more often to ensure I don't get bit again. 

 

Im not getting anything off the drive now.  Ive been a storage admin for over 15 years.  I would never do something like this at work.  I suppose at work,  I have a budget for the equipment I need to keep things going.  At home,  it is whatever I can scrape together.

 

 

Link to comment
Im not getting anything off the drive now.
If the parity build is still running, then there is a good chance at least some of your data is recoverable. I'm not sure what you mean by "not getting anything off the drive", if the read errors column isn't incrementing and the parity build is continuing, then you are getting data from the drive. Like I said, unmountable doesn't mean data gone.

 

If you are truly at peace with losing everything on the drive, then stop the parity generation, click the new config in the tools section, and reassign the good drives in the appropriate slots and start generating parity based on the good drives. Just be VERY sure you know which drive goes in the parity slot, it would suck to erase another data drive by accidently writing parity to it.

Link to comment

Thank you for the help.  I did infact give up on it.  The errors were climbing in line with the reads.  That drive was just toast.  I am perfectly ok now.  Like I said before,  this experience reminded me of what data is important.  Pictures were the first thing I was worried about.  And I am fortunate to have made a recent copy of all of them so in the grand scheme of things,  I came out ok. 

 

I look forward to learning more about things and trying to contribute to the community. 

 

 

Link to comment

I would suggest that you read up on the preclear command/plugin and use it to "test" any further disks you add before putting valuable data on them.  It is useful to stress the disks and check the whole disk has no known issues before using as part of your Unraid disks.

 

If you decide not to use the preclear command I would recommend that you instead use your disk manufacturer's tools to run a "full" disk check before adding it to Unraid.

 

Also i would not recommend btrfs for data drives in Unraid just because the data recovery tools appear to be less successful than the other supported file systems.  It will therefore be of interest to know whether you successfully recover anything from your failed drive or not using the btrfs tools.

 

unfortunately, while most file system recovery tools have some chance of recovering data from corrupted file system, they generally have problems fixing corruption with disks which fail due to actual disk read errors; so I suspect the signs are not good as your problem is due to a failing disk.

Link to comment

Thank You for the Preclear script tip.  I am running it now on the last 1TB I have left to add to the array.  It is 75% after 8 hours,  but I see now why it is worth it.    I have my parity sync'd now and a cache disk. I have my photo's back and about 75% of my music. I have all of the music up in google music anyway so I still have access to all of the music.  I also had one of the earlier 1TB disks spit out 1 smart error on it.  I made a log of it and am monitoring that one closely.  There's good and bad to using all your old hard drives that are laying around.  But at the same time,  spinning rust can fail.  It happens.  I manage about 6PB of storage at work.  Enterprise grade but that is alot of spindles spinning.  So the mistakes I made on this data migration I can only chalk up to overconfidence and a bit lazy.  The same as a plumber plumbing his own home.  He might take a few shortcuts that he wouldn't do on the job.  Never again for me.  Cost me a weekend and the wife appreciation factor went down dramatically.  It messed up my mythtv docker as well so shows didn't record.  Maybe she will be more forgiving when I go get another hard drive. 

 

This is a great community here.  Hopefully the rest of my use of this platform will be smooth sailing. 

 

Thanks again  everyone.

 

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...