Jump to content

[Solved] Drive failure, need help


Recommended Posts

Posted

I've replaced failed drives before, but this is a first for me.

 

I was traveling, of course, when I got the emails saying drive 3 was disabled.  My wife reported a very loud noise coming from the server.  I have a spare drive waiting for these things, so when I got home I logged onto the gui, noted the serial number for the disabled drive.  I could hear it grinding away in the server so I know it's good and dead.  I pull what I think is the dead one, but the serial numbers don't match so I keep looking until I find drive 3 with matching serial numbers.

 

I swap the drive, and boot up.  The servers is still grinding badly so I shut it down and the truly failed drive is swapped instead.  So now I have disk3 which unraid flagged as disabled still in slot 3, and I have now swapped the drive with obvious bad bearings out (slot 2).  I start up unRaid it started without issue, and the array started up.

 

Now when I look at the array status, it still shows drive 3 as disabled, and the drive in slot 2 is listed as unassigned.  In the GUI the parity, and disks1-6 all show populated.  Does that mean disk7 is the disk that failed?   I looked for my backup copy of unraid, but yes, I moved it to the tower drive so I can't get it.  It will now reside on my wuala account after I get past this screw up.

 

So I'm confused as to what happened.  It looks like the disk in slot2 died, unRaid flagged the disk in slot3 and disabled it.  Does that mean I had two drive failures and I'm screwed?  The two drives in question, slot2 and slot3 are on the same IDE channel.  I"m hoping that the failed drive confused unRaid and it thought the channel failed or something.

 

Anyway for me to figure out where the drive in slot 2 (the one not showing up assigned at the moment, it is also the one I replaced, and it has been in the array before).  I do have my second unRaid flash drive.  These drives have been in my array since I started it years ago.  Maybe I could get something off the old flash drives?  I've also posted log files to the forum, if I were to find a post with my syslog, would that give me some help?

 

at the moment I have the unRaid powered on, but the array stopped.

 

EDIT:  I'm running 5.08d, and I did remember I do have a backup copy of my entire unRaid install on the flash drive.  so if there are files there that can help.  I have them.

 

Attached the notices that unRaid emailed me.

 

thanks for any and all help,

dave

Firt_failure_notice.txt

Latest_failure_notice.txt

Posted

OK after digging around a little more I figured some of this out.  It may not be as bad as I thought.

 

The drive that failed, bad bearings was the cache drive.  So it looks like the cache drive is gone, along with my crashplan setup.  Figures that I copy from the normal install point to the cache drive two days before the drive crashes.  I have the original install, hopefully that goes fine.

 

So could the failed cache drive cause the other drive on the same ide channel to be disabled?  Do I just need to go through and re-enable the disk3 to get it all up and running?

 

thanks,

dave

Posted

It's very possible on an IDE controller.

 

Try re-assigning the old disk3 and starting a rebuild on it. You may have to first start the array with it unassigned. It will get flagged and red-balled again if the disk is really bad and the rebuild fails.

 

Posted

I did rebuild with the same drive and it is working fine.  I guess I was just confused because unRaid disabled a perfectly fine drive, but on that same IDE channel there was a failed drive (the cache drive).

 

dave

Posted

As Peter said, it can and does happen.  I think I have seen 4 or 5 cases myself, where certain types of drive failure will cause the kernel to shutdown the channel instead of the failing drive itself, causing the other drive sharing the channel to disappear.  That is obviously highly undesirable, to have not just one drive fail, but 2 drives appearing to fail in a RAID system.  Thankfully, it is somewhat rare.  Rebooting restores the drive, but the confusion is already in place!

Posted

As a last note on this failure, and a good one at that :)

 

I've been using Crashplan to backup my shares on the system.  I recently moved my crashplan install to the cache drive with links in /var/lib and /usr/lib.  Unfortunately it was the cache drive that crashed, so I lost the entire install.  Since I had a copy of my install tar'd up on the /boot drive, I was able to re-install, restart, and crashplan sync'd all my files and is backing just fine.

 

So you can recover from a crash, re-install crashplan and have it see your rebuilt shares and it all works.  I had tested it some, but didn't ever force a crash to make sure it "really" worked.  I guess next step is to crash two drives at the same time and see if I can recover everything that gets lost :)  Just kidding  *knocks on wood*. Probably just jinxed myself. . . .

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...