[SOLVED]Parity disk died, then a few disks went offline


Recommended Posts

I recently replaced a disk that was having read io errors.  I replaced the disk and the data was rebuilt using parity.  When that was finished, I added a new precleared drive to the array.  Everything looked good and was working fine.  Then my monthly automatic parity check kicked off 1-2 days later on June 1st.  The parity drive had some kind of problem and got disabled.  The rest of the array looked ok.  I rebooted the system and the parity drive was detected but was empty.  I let it try to rebuild the parity...it was going strong (if slowly) and got to at least 10% with no problems.  When I woke up the next day, the parity drive was once again disabled.  I assumed this was the actual parity drive having a problem so I ordered a replacement.  I shut down the array to power it off...and I noticed that 3 data drives were now not detected.

 

Now I am a bit concerned.  I took a look at one of the "no device" drives and put it in my windows desktop machine.  I was able to use diskinternals linux reader to see the files on the drive were intact.  I put the drive back in my unraid system...and still it shows up as "no device".  If I hover over the "no device" menu, I see there are sde, sdf, and sdg drives in that drop down.  I can't select them though so I am wondering what my next step should be.  I don't want to do anything rash and trash the data on my drives...but I would like to get things restored to a point where I can bring the array online again.  I also wish I was using the current unraid software too, but I took a "if it isn't broke...don't mess with it" approach.  I don't think I should upgrade till I get things working again.

 

I suspect the drive itself is fine, but perhaps the system itself is having a hardware problem.  I might try moving one of the drives to a different slot and seeing if I can then add the drive back to the array.  If anyone has any suggestions I would love to hear them.

Link to comment

Thanks, I appreciate the input!  I have a Corsair TX650 power supply.  I did worry that the power supply might have not been able to handle the number of drives in the system.  I did do a bit of internet surfing and saw to budget about 10watts per drive.  If that is close to accurate even with 11-12 drives the 650 watt power supply should be able to handle it.  It does seem suspicious that this all happened after adding a new drive and the extra power draw seems like some kind of power issue would be the most likely culprit.  I did reseat the controllers but they were in there tight and screwed in.  None of the cables were loose either, they were all well routed and firmly attached.  The system isn't really moved or jostled where it is located.

 

I think a hardware issue of some sort is a reasonable explanation, but I am at a bit of a loss on how to proceed.  I still do not know what went wrong with the parity drive.  It was rebuilding fine and then 'disk disabled' when I woke up.  I suspect that drive is actually malfunctioning.  If that caused some kind of power spike/fluctuation that might explain the other drives suddenly having problems too.  I can't account for why I can access the data from my desktop though, if the drives work and I can read them...why can't unraid see them?  If my sata controllers are fried...then why can they see all the other drives?  Also switching slots would have probably impacted that.

 

My finances are a bit tight at the moment so replacing all those drives or the sata controllers would be challenging.  That being the case I am trying to think of ways to use my current hardware to narrow the field down a bit without trashing my data.  I have three drives full of data that I can access on my desktop unit.  If I were to divert my newly acquired 3TB parity drive into my desktop instead.  That would allow me to sort through the 5.5TB of data on the "dead" unraid drives and save what I can.

 

That still leaves me with an unraid system full of drives, but not enough drives to start the array or a parity drive.  I am really not sure if this is even possible, but I think if I go to the web ui's util tab and run the "new config" script...I could re-form my array with the existing drives, essentially accepting the loss of the missing drives and the data on those drives.  If that is the way it works, and it would work without a  parity drive installed (as I don't have any empty drives left)...I could then restore access to most of my data.  I will have to research what the "new config" script does though as I haven't got enough info to move forward.

 

The step after that would probably be to put one of the dead drives back in the unraid system and do a preclear on it.  That would exercise the drive, reset all the filesystem stuff too.  If unraid can see the drive after a preclear, then perhaps the drive didn't die but just had something messed up that prevented unraid from "seeing" it.  If the drive doesn't preclear...it is physically toast or perhaps my sata controllers are suspect.  That doesn't identify the root cause...which would bother the heck out of me...but it would give me a way to proceed without replacing everything. 

Link to comment

I am not sure if unraid disabled the disks purposefully or is unable to load them for some reason.  I think if unraid disabled the disks I would have a blue ball next to the disk in the web ui rather than the red ball.  That might not be the case with v5-rc5 though so it could still be that the drives are just disabled.  I am not sure how to go about re-enabling them.  There was a whole "trust your array" part of the wiki but the prerequisites for that are that the drives are not missing and that you have valid parity.  I don't even have a parity disk so I don't think that would work.  I have a 3TB drive I can use as a new parity, but I am still not sure how to get to a point where I can say to unraid "my disks are good, go ahead and use them".    After I get them detected in the unraid system I can run smart tests and decide if I should replace them or not.

 

I wanted to run the smart tests on the drives anyway but the drives are not in the /dev/sdX list so the commands to run the smart tests won't work till I figure out why unraid won't see the drives at all.  The drives function and let me see the data in my windows desktop machine so I figure there should be a way to get unraid to see them too...I am just not sure how to do that.  It could be my sata controller is hosed, but with the other drives working and using a different sata port having no effect, that seems less likely.

Link to comment

Trurl, thanks so much for your help!  I was able to get all my drives detected and my array is available again in degraded mode as my parity drive is not installed.  I am pretty sure one of my SATA controllers is malfunctioning.

 

I run my system in headless mode so it had not occurred to me to install a monitor and get into the bios.  I did just that and everything was detected in the bios and looked good.  I did see that when it continued to boot it would display some text for each sata controller.  It would list the bus and device number along with all the drives detected on that controller.  All my drives are 2TB drives except for two of the drives that had problems, they are a 3TB and a 1.5 TB drive.  I noticed that the first 2-3 controller text outputs were similar and that the last one happened to have 3 drives and two of them were definitely my "bad" drives.  I also noticed that when that was all finished and unraid os started to boot, some text that looked like errors flashed by and that it was retrying to do something to some of the disks....and the HD lights on the 3 bad drives were flashing when this happened.

 

I moved the drives around in my system and after some trial and error managed to get all of them onto a different controller...and then everything worked.  It looks like my sata controller is toast.  I am not sure if my card died and killed my parity disk or if my parity disk died and killed the controller card...either way I can access my data and figure what to do next without the threat of data extinction hanging over my head. :)  Again, thanks for all the help I appreciate it!

 

 

 

Link to comment

I moved the drives around in my system and after some trial and error managed to get all of them onto a different controller...and then everything worked.  It looks like my sata controller is toast.  I am not sure if my card died and killed my parity disk or if my parity disk died and killed the controller card...either way I can access my data and figure what to do next without the threat of data extinction hanging over my head. :)  Again, thanks for all the help I appreciate it!

 

Almost always, it's the controller that dies, almost never the drives.  But once a controller dies, then one or all of the attached drives *look* like they died, but are usually fine.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.