ratmice Posted October 15, 2011 Posted October 15, 2011 So here's my story part of which is chronicled in the "Another disk is red-balled" thread. I recently had one drive drop out of service. I copied all the data off the drive, deassigned it, and started to pre-clear it to see if the drive really was bad, or needed replacing. I glanced at the server some hours later and noticed that the preclear had stopped because the remote host had dropped the connection. I tried to get back to the server and couldn't connect at all (web GUI dead, and telnet would say connected but not ask for password and was unresponsive), so I rebooted it. When it came up I now had the unassigned drive that was being precleared (disk3) not installed, and another drive (disk2) missing. Disk2 did not even appear in the dropdown list for drive assignment on the devices page. This drive hadn't given me any cause for concern previously. I shut down again, checked cables, controllers, power etc...and rebooted. Still 2 drives missing. To do a little troubleshooting I tried disk2 on a motherboard connector and the other SAS controller, as well. I was never able to get it to appear on the devices page for assignment to the array. It seems truly FUBAR. I did test another drive in the old bay that disk2 resided in and that one did show up on the device assignment page. So now all disks are back in their original location and disk2 is listed as "missing" and disk3 is "not installed". I am including 3 zipped syslogs, the one labelled "first" was immediately after the preclear had stopped, rebooted, and disk2 came up missing. The one labelled "last" is the most recent, and the one labelled "orig" is one from before the second disk went belly up. Also, please note that a 750GB disk that was in the server, but not part of the array has been removed as well. Anything for me to try here? My assessment is that disk2 really did crater completely and I have 2 failed disks and nothing to do but pull them both out of service and try to replace the data on disk2 (luckily, the disk3 data was backed up today). If that's true, and I am confident that the data on the remaining disks is good, do I just deassign those disks (2 and 3) and run initconfig and start from there? Any help would be greatly appreciated as I am in way over my head. Currently running 4.7. syslogs.zip
ratmice Posted October 15, 2011 Author Posted October 15, 2011 Nothing? A little update, the disk2 won't even show up in BIOS no matter what connector I use (MoBo, either SAS card). syslog shows: SRST failed (errno -16) which seems to indicate a completely dead drive. The only recourse at this time seems to be to remove the drive and isssue INITCONFIG to reinstate the array and rebuild parity with the remaining disks. Is this correct?
Joe L. Posted October 15, 2011 Posted October 15, 2011 Nothing? A little update, the disk2 won't even show up in BIOS no matter what connector I use (MoBo, either SAS card). syslog shows: SRST failed (errno -16) which seems to indicate a completely dead drive. The only recourse at this time seems to be to remove the drive and isssue INITCONFIG to reinstate the array and rebuild parity with the remaining disks. Is this correct? try a different power connector.
ratmice Posted October 15, 2011 Author Posted October 15, 2011 Nothing? A little update, the disk2 won't even show up in BIOS no matter what connector I use (MoBo, either SAS card). syslog shows: SRST failed (errno -16) which seems to indicate a completely dead drive. The only recourse at this time seems to be to remove the drive and isssue INITCONFIG to reinstate the array and rebuild parity with the remaining disks. Is this correct? try a different power connector. I'm in a Norco 4220 case. 4 of the Mobo SATA connectors are attached to one backplane and 2 backplanes are connected to each of the two SAS controllers. The power comes from the power supply to each of the backplanes. This particular disk is not seen no matter which backplane I attach it to (and thus using different power connectors). All other drives are powered up and working fine on all the other backplanes.
revco Posted October 15, 2011 Posted October 15, 2011 I think your assessment here is fairly well spot on. You've conducted isolation tests on the perceived failed drive and the bays that it's connected to. You could try putting the failed drive in another machine, which might help you to determine if it's spinning up correctly and could be accessed at all and this step will take all your hardware out of the equation. (I like external eSATA/USB drive bays like the Thermaltake BlacX for this purpose) If disk 2 is accessible at all, you could try to do a manual backup which could mitigate your data risk. You might try to fully power off for a bit? I'm guessing that you probably didn't rebuild parity with disk 3 out of the system, and thus parity is built on the original configuration with disk 3 AND disk 2, right? If so, you can't rebuild the failed drives. I don't think duct tape is going to solve this one... Although I'm new to unRAID, I've been around RAID5/6 a fair amount and dual failed drives have hit me before. That's the big reason I went with unRAID...because it doesn't stripe across volumes and thus you don't lose everything with an issue like this. Failed conditions just freak me out because any time it happens, you're exposed to data loss. Having a spare ready-to-go drive is worth it's weight in gold. As for your question about rebuilding, I'm not experienced with unRAID and failed conditions yet, so I'll forego providing advice.
ratmice Posted October 15, 2011 Author Posted October 15, 2011 Well, since Ive had no luck at all getting this drive recognized, and I am resolved to the data loss, I went ahead and intiialized a new configuratiuon. Array is up and running minus disks 2 and 3. Two new drives are preclearing now. The data loss is limited to disk2, as I had backed up everything on disk three yesterday, before the meltdown. I will try it (disk2) in my BlackX to see if it spins up at all. The problem there is the reiserfs since I'm on a Mac. I havent found any way to read this filesysystem. I did DL virtualbox and a Fedora live CD in case it does spin up on the Mac, but I am not relishing trying to get that to work quickly. Maybe I can try XP, with bootcamp, if the drive shows any signs of life. We'll see. p.s. drive was not noticed by OS X when attached via an external enclosure -fried it is.
revco Posted October 16, 2011 Posted October 16, 2011 Sorry to hear that. Hopefully you didn't lose anything that can't be easily recreated. Since the drive is dead, you might be interested in an antidote that worked for me once on a failed RAID drive. I gave a drive a couple good pats (OK, gentle slams) on a table top and it started working again for long enough for me to pull data. It's a long shot and rarely works, but what's there to lose? I learned that trick way back in the day with Quantum Bigfoot drives where they came with a little too much glue and it seeped onto the drive head.
liquidkaos Posted October 16, 2011 Posted October 16, 2011 If the drive is spinning, do not slap it on the desk .. or freeze it .. etc.
johnm160 Posted October 16, 2011 Posted October 16, 2011 Over the years I have rescued data off many drives by freezing them. Not ideal but if faced with no other options it's worth a shot.
ratmice Posted October 16, 2011 Author Posted October 16, 2011 The drive is spinning, and not making any really funny noises. Seems like a happy drive outwardly. I'm just not able to get it mounted on any machine I have.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.