Tell it to me straight... How bad did i "F" my array?


Recommended Posts

I replaced a failed drive in my system with one that I did not preclear or format.  I have been fighting parity errors for the past week and every night when my cache drive trys to write to the array the system basically craps out.  Just when i thought i figured out the drive that was causing the issues, I now have another drive coming up as failed.  I disabled the new failed drive on the array and now the one that i replaced pops up saying that its an unformatted disk and asking me if I want to format it.  I am currently in the process of preclearing a drive to replace the one that just came up with a red ball.

 

So my question is what order should i do to help try to keep my data?  Do i swap the bad drive with the one i preclear first, then preclear the one that is asking for me to format and put it back in the array? or do I leave the dead one down and use this precleared drive for the one that is asking to format?

 

Sorry if my question is a bit jumbled, but my head is pretty jumbled right now too.

 

unraid version 5.0 beta 13

 

Thanks

 

Drabert

Link to comment

I replaced a failed drive in my system with one that I did not preclear or format.  I have been fighting parity errors for the past week and every night when my cache drive trys to write to the array the system basically craps out.  Just when i thought i figured out the drive that was causing the issues, I now have another drive coming up as failed.  I disabled the new failed drive on the array and now the one that i replaced pops up saying that its an unformatted disk and asking me if I want to format it.  I am currently in the process of preclearing a drive to replace the one that just came up with a red ball.

 

So my question is what order should i do to help try to keep my data?  Do i swap the bad drive with the one i preclear first, then preclear the one that is asking for me to format and put it back in the array? or do I leave the dead one down and use this precleared drive for the one that is asking to format?

 

Sorry if my question is a bit jumbled, but my head is pretty jumbled right now too.

 

unraid version 5.0 beta 13

 

Thanks

 

Drabert

our head is jumbled too.

 

Post a screen shot AND a syslog.  You are 1 mistake from loosing data.  (I hope you have backups of anything important, because if you do not, now is a good time to make them BEFORE you do anything with moving around disks.)  When replacing disks they NEVER have to be formatted, so that alone is an issue.  DO NOT FORMAT ANY DISK!!!!!  Not unless it is an ADDITIONAL drive being added to a working existing array.  (in other words, as an example, you had 7 drives, and are now adding an 8th to a working array with no problems)

 

DO NOT SET A NEW DISK CONFIGURATION EITHER...  That will immediately invalidate parity. (And right now, I think that is about the only thing keeping you from losing some data)

Link to comment

I am currently on the last step of the formatting and it seems like it will finish sometime later tonight.  Should i stop my array and use the new pre-cleared disk for Disk 9 on my system since it is coming up missing? After that i can start working on why the other disk is coming up as "not formated"

 

Drabert

Link to comment

I am currently on the last step of the formatting and it seems like it will finish sometime later tonight.  Should i stop my array and use the new pre-cleared disk for Disk 9 on my system since it is coming up missing? After that i can start working on why the other disk is coming up as "not formated"

 

Drabert

the preclear script DOES NOT format the disk.  It writes zeroes to it and then puts a small signature in the MBR that unRAID recognizes it has been zeroed. 

 

Your approach is as good as any.  Just do not format the disk that is coming up as "not formatted" unless you no longer want any of the data that was stored on it.

 

Joe L.

Link to comment

Ok, preclear finished and i have replaced the bad drive with the precleared one and it is now rebuilding.  Once this drive is finished rebuilding, what should my next course of action be? perform a parity check on the system? fail the drive that was showing unformated and have the system rebuild it with a fresh drive?

 

Over the past two years using unRaid, this is the first time i have had any issues like this.  I have been fighting through a bad chassis for the longest time, but now the WAF is very low since our entire library does not seem to want to stay available.  I am not sure if it is the mover script trying to write to the bad drive that causes everything to get horked, but at this point, i would rather lose a TB of data (its all TV/Movies) than bad WAF :)

 

added current screenshot

 

Drabert

ScreenShot_1_22_2013.JPG.b2351b90da10521cc89c7e39be3a4647.JPG

Link to comment

Ok my rebuild finished but now im getting write errors between the drive that was showing that it needed to be formatted and my parity drive.  I am also not able to access the share that has my media files on it.  It is showing all the folders as empty.  Is it safe to replace the drive that looks like it is failing with the hope that the parity drive will rebuild it?

 

New syslog attached

 

Thanks for the help guys.

 

Drabert

syslog-20130122-201523.zip

Link to comment

The system locked up again so i forced it to do a reboot and now another drive has failed.  I started to rebuild it with yet another drive, but im not sure when this is going to stop.  The drive i am replacing has not been written to yet so i was not worried about letting it get rebuilt.

 

After this drive is finished rebuilding should i shut down the raid and do a reboot before the system gets locked up?  It shows the errors are from disk md4, but i am not sure which one that is on my array.

 

Drabert

Link to comment

The system locked up again so i forced it to do a reboot and now another drive has failed.  I started to rebuild it with yet another drive, but im not sure when this is going to stop.  The drive i am replacing has not been written to yet so i was not worried about letting it get rebuilt.

 

After this drive is finished rebuilding should i shut down the raid and do a reboot before the system gets locked up?  It shows the errors are from disk md4, but i am not sure which one that is on my array.

 

Drabert

md4 = disk4
Link to comment

Well after the disk rebuilt, i did a restart on the system and another disk is now showing as a red ball.  This is the first one that has come up bad with data on it.  Should i just keep replacing the drives that are coming up red or is there anything else i can do since this drive was fine?  I tried another restart of the system but it is still coming up red.  I will try powering the system down for a few minutes to see if that clears something up.  These are all server grade Hitachi drives that keep failing.

 

So far this is the fourth drive that has come up with a red ball after a reboot.

 

As always, thank you for your help.

 

Drabert

Link to comment

Is there anything common among all these drives? Controller card or cables? It seems very fishy that they're all failing like this, any cooling failures recently?

 

I'd agree with that. Something is very wrong with the hardware for drives to just keep dropping out like you're seeing.

 

Sorry i missed that question from BLKMGK-

 

They are plugged into a LSI SAS9211-4i card with one SAS cable to a dell 2U chassis that has a back plane so the drives are swapable.  it looks like a dell r720xd.  I have not had any thermal issues since this summer on the original box.

Link to comment

Right now im just worried about data and not hardware (as long as it eventually stops)... I can swap out this drive with another spare and just RMA this one.  As of right now, i have not had a disk come back as "bad" just new ones everytime.

 

So I will swap this one out too and see which one fails next.  It does seem though like it is the hitachi drives that are failing on me now, not the Dell POS ones.

Link to comment

i have 2 Dell 750W redundant power supplies in the system.  Both are showing a green light for them as well.

A better question is then...  What is the capacity in amps of the 12 volt rail powering your disks?  With all the various disk errors, on different disks, a common cause is a power supply unable to keep up with the current demands.

 

Joe L.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.