Jump to content

Double Drive Failure - Need Advice


Pauven

Recommended Posts

Scary subject, I know.  My worst nightmare come true.

 

Recently my Original Parity drive began acting up.  It would work okay for a while, then go offline.  Typically a reboot would take care of things for days/weeks.  I believe the data on the Original Parity drive is good, but can't guarantee it will stay online long enough to make it through a Parity Check/Rebuild.  Could be just a flaky slot on my 24 bay server.

 

So I did the smart thing, and assigned a New Parity.  Here's the problem.  During the new Parity Build, or shortly thereafter, my Data Drive 3 went kaputz.  unRAID 6.0.1 indicates Data Drive 3 is Unmountable, and offers to format it.

 

I did something stupid and rebooted to see if Data Drive 3 would come back online.  I even tried a power cycle.  Unfortunately, it's still Unmountable.

 

To make stupid stupiderer, I didn't review or save my log before rebooting, so now I'm not sure if my New Parity drive finished the original build.  The New Parity drive has a green orb next to it, making it seem as if Parity was successfully built, but how do I know?  It was green before I rebooted.  Would this has still changed to green if Data Drive 3 when belly up in the middle of a Parity Build job?

 

Luckily, I haven't added/deleted/changed any data on my server since assigning the New Parity drive, so in theory the Original Parity Drive still has good data.

 

It seems like I have two options:  either Rebuild Data Drive 3 with the New Parity Drive (which is potentially incomplete), or Rebuild with the Original Parity Drive, which might take a nap halfway through the Rebuild job.

 

Oh, all of my drives are 3TB WD Reds in case you're wondering.  The Data Drive 3 was 90%+ full.

 

Any suggestions on best path forward?  Also, if I Rebuild one way and don't like the results, can I rebuild the other way?  Or is this a one shot deal?

 

Thanks for any help!

 

-Paul

Link to comment

I gather you don't have backups ...

 

... but the good news is you can do EITHER - or BOTH - of the rebuild techniques you've listed.

 

Assuming you're CERTAIN that you haven't written to the server since starting the parity rebuild, you're correct that you can use your old parity drive.

 

It sounds like your new parity drive completed the sync, but if you have any doubts I agree it's better to use the original drive ... and as long as it holds up through the rebuild you'll be fine.    If it doesn't, you can then repeat the process with the new parity drive and hope it was complete before the failure.

 

In either case, what you need to do is a New Config, assigning ALL of the data drives (including the bad #3) to the array along with the parity drive you're going to use; and checking the "Parity is already valid" box.    When you Start the array it will immediately start a parity check -- CANCEL it.    Then Stop the array, remove disk #3, and Start the array so it shows as missing.    Then you can Stop the array again; assign a replacement disk; and Start it back up for the rebuild.

 

Link to comment

Thanks Gary, I appreciate the advice.  I got a couple drives on order, I'll let you know how it goes in a few days.

 

Your right, no backup.  In all honesty, it's data I can replace or live without, but I'm realizing now that I have no record of what was on that particular drive, so it makes replacing it pretty challenging.

 

-Paul

Link to comment

Update on my [lack of] progress.

 

My new drives were delivered, so I began the recovery process.  I decided to try the Data drive rebuild using my new Parity drive, even though I was pretty sure it was incomplete, since it was already installed and ready to be used.

 

I can't tell if it finished successfully, as I'm having logging issues with 6.0.1 (my log was completely empty!!!), but the new drive is still flagged as Unmountable, and also I can't browse the contents of the drive, which leads me to believe my new Parity drive is incomplete.

 

I'm ready to try with the original Parity drive, and I'm at the step where I assign a New Config.  Tom has provided this verbiage with the tool:

This is a utility to reset the array disk configuration so that all disks appear as "New" disks, as if it were a fresh new server.

 

This is useful when you have added or removed multiple drives and wish to rebuild parity based on the new configuration.

 

DO NOT USE THIS UTILITY THINKING IT WILL REBUILD A FAILED DRIVE - it will have the opposite effect of making it impossible to rebuild an existing failed drive - you have been warned!

 

So now I'm second guessing myself...  thoughts?

 

Also, both the original failed Data drive, and the new Data drive, are flagged as 'Unmountable'.  I don't think I can proceed with the New Config process with Unmountable drives.  I'm guessing I can pre-clear a drive, mount it as part of the new config, remove it, then rebuild it on another drive.  Does that sound right?

 

-Paul

Link to comment

... I'm guessing I can pre-clear a drive, mount it as part of the new config, remove it, then rebuild it on another drive.  Does that sound right?

 

No!  Once you've pre-cleared it, it's all zeroes.

 

What you need to do is ...

 

(a)  Do a "New Config", assigning all of the ORIGINAL drives you had in the system to the same slots they were in.    i.e. the old parity drive; and all of the data drives -- including the failed one (data drive 3).    Check the "Parity is already valid" box; and then Start the array.  IMMEDIATELY stop the parity check that it will begin doing.

 

(b)  Stop the array and unassign drive #3.  Then Start it so it shows drive #3 is missing.

 

©  Stop the array and assign your NEW (replacement) drive in the #3 position.    You should now be able to Start the array and the rebuild should commence.

 

Link to comment

Actually, after I re-read your post and then considered what complications might prevent the process I just outlined from working, I realized what you were trying to do -- i.e. prevent drive #3 from being unmountable.

 

That might, in fact, work.    i.e. if you do what I just suggested, but instead of the original drive #3 include a pre-cleared drive [Do NOT allow UnRAID to format it -- that will invalidate your parity drive] ... and then do what I noted r.e. stopping;  unassigning; starting so it's seen as "missing"; stopping; assigning a new one; and then starting for the rebuild;  it should in fact rebuild the contents of your old, failed drive  :)

Link to comment

Okay, thanks.

 

My biggest concern is that an autostarted correcting parity check could destroy a few bytes of the parity drive before I get a chance to stop it.  The risk is greater if I used a pre-cleared drive as a substitute for my unmountable drive, since the zeroes on a pre-cleared drive would invalidate my parity results.

 

Anything I can do to prevent that parity check from starting, or from being a correcting version?

 

I'm actually thinking of trying to do a New Config without a pre-cleared drive first.  Since the bad drive is unmountable, it might allow me to go straight to the rebuild process without the additional steps of removing and re-adding the drive.

Link to comment

Unfortunately I don't know any way to preclude the parity check from starting (IMHO that's a flaw in UnRAID ... I think it should Suggest/Ask running one when you've used the "Parity is already valid" option -- but not automatically start it, as that can in fact cause a few changes that will result in a not completely accurate rebuild).

 

The best thing to do is use the original drive #3, since IF it can read the contents correctly on the first few sectors, you should be able to stop the parity check before anything's changed.

 

Link to comment

Good News!

 

I performed the 'New Config', with the original Parity drive, and the bad Data drive, and everything came up online, including the bad Data drive.

 

I can now see what was (erh, uhm, still is) on that drive. 

 

I still don't trust the drive, so I'll take a stab at rebuilding it.  But first I'll catalog what's on it... just in case.

 

Thanks for all the advice!

Link to comment

Good.  Now you can easily do a rebuild by just doing the Stop; unassign; Start; Stop; assign new drive; Start process  :)

 

I believe the start/stop in the middle isn't really needed. At least it worked for me recently to do  stop / unassign / assign / start (unRAID will ask you to do the rebuild as it detects a different drive)

 

Link to comment

Good.  Now you can easily do a rebuild by just doing the Stop; unassign; Start; Stop; assign new drive; Start process  :)

 

I believe the start/stop in the middle isn't really needed. At least it worked for me recently to do  stop / unassign / assign / start (unRAID will ask you to do the rebuild as it detects a different drive)

 

Correct.  I actually unplug/plug the drives in my hotswap case, so the system only sees the drives I'm working with - primarily this prevents me from making mistakes. 

 

So: Stop / Unplug & Plug / Assign / Start

Link to comment

Hey Gary, did you ever unplug your WD Reds and measure your wattage?  My other thread is still waiting on the results...  ;)

 

I'd forgotten all about it ... and actually the server hasn't been shutdown, opened, or otherwise modified in well over a year.    One of these days ...  :)    [i just stuck a post-it note on the case to remind me to do that the next time I shut it down.]

 

 

Link to comment

I haven't studied your situation or diagnostics, but the one thing that struck me is that I did not see a single bit of evidence that *anything* was wrong with either Parity drive.  Yes, you had fears it might be bad, but then you said it had a green ball!  And you never indicated it did not have a green ball anywhere, or that Parity was marked as Invalid.  From my standpoint, it looks like you only have one drive that *appeared* to fail, red-balled for reasons unknown, so it too may be good.

 

Sometimes, you just have to trust the data, not your feelings.  Maybe if I have time, or some else does, we should re-evaluate the situation.  I can't just now.

Link to comment

Hi Rob,

 

Sorry, didn't realize that my Parity drive issues were in question or needed proof.  I've attached an image, hopefully that will suffice.  This Parity drive would sometimes make it through a check and/or daily activity, and sometimes just go offline.  I could reboot the server and get the drive back online, and it might be good for several weeks.  This past week I decided enough was enough, and proceeded to replace the parity drive. 

 

During the new parity build, a data drive apparently got disabled and marked unmountable, but my log file was inaccessible in the GUI so I couldn't tell for sure.  I then tried to rebuild the data drive (on a new drive) using the new parity drive.  While the operation completed, the new drive was still in a disabled state, indicating that the new parity was no good.  One of my points of contention is that the new parity drive was green after the data drive was disabled.  I think it should have been a big red exclamation mark in my face telling me that the parity build job failed.

 

I just tried a rebuild using my original, flaky parity drive.  My rebuild ultimately failed, as my original parity drive once again hiccuped during the rebuild, and is now offline.  Convenient only in that it allowed me to grab this screenshot. 

 

Again, my system log is corrupted (blank in the GUI).  This time I browsed out to /var/log and found a syslog.1 file that is 132MB in size, so it does seem that the log file is filling up extremely fast, and it attempts to roll over to a new file, but since file space is exhausted, it stops there.

 

Actually, looking at my log file it looks like errors cropped up around 10:45pm, and I remember thinking the rebuild would finish around 11:00pm. So close.

 

I think I'll re-install the bad data drive (which maybe isn't so bad) and do another New Config, then try to build a new parity from that.  Basically back to the step where everything went downhill.  I don't think there's any harm in trying.

 

Oh, and I feel I have been trusting both the data and my feelings.  What I'm not trusting is the unRAID GUI.  I don't think it has been designed to correctly handle multiple drive failures in all scenarios - at least not the scenario I've endured.

unRAID-Parity-Drive-Failure-Proof.jpg.73fddbf6928f8954adc3f63a68070940.jpg

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...