"Parity is already valid" checkbox?


Recommended Posts

Hi!

 

I just got my new unraid server (v. 6.1.9) up and running with shares, and I copied a bunch of data onto it.  I have a hot-swappable rack server with 4 8TB drives, so I thought I should test to make sure unraid does what it says it does, so I popped out the drive that had the most data on it.  Everything runs fine!  Woo!

 

I put the drive back in and now I want to bring it back online using the "Trust my Array" procedure described here: http://lime-technology.com/wiki/index.php/Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

 

It appears that I just need to stop the array and select the "parity is already valid" checkbox next to the start button and start it up. 

 

I see no checkbox there (or anywhere on the web gui).  The disk is still marked with a red X (device is disabled). 

 

What do I need to do to bring my array fully back online? 

 

I've stopped and started the array a few times, restarted the server.  I don't see this option anywhere.

 

Link to comment

In order to get the trust parity checkbox, you must New Config and reassign all drives, being sure to not assign a data drive to the parity slot.

 

I don't recommend you do this though. The correct way to handle a disabled disk is to rebuild it. Your description is not sufficiently detailed to assure me that your parity is actually in sync with the missing disk.

 

And since you have rebooted, there won't be anything in your diagnostics that tell us what happened.

 

While the disk has been missing, it is possible that unRAID wrote data to the emulated disk. This data will be part of the parity array, but is actually missing from the disk.

 

I recommend you rebuild the data disk.

Link to comment

I suppose another approach would be New Config, Trust Parity, then do a non-correcting parity check. If there are no parity errors then you are good. If there are parity errors then (maybe? probably?) the reason is that data disk is not in sync with parity. Then you could rebuild the disk. If you want to take that approach then if you discover that you do have parity errors you can repeat your "experiment" and rebuild the disk.

Link to comment

Hi Trurl.  Thanks for your answer.

 

Well that sucks :(  A day of data copying lost with a simple 10 second test. 

 

Searching for the word 'rebuild' in the V6 manual delivers no usable results.

 

There is a section "Replace a Failed Disk", but that makes it sound as if just rebooting the machine and restarting the array with a new disk will automatically trigger a rebuild.  That was not the case with my existing disk.  How do I trigger this rebuild?  Is there documentation somewhere for this?

Link to comment

Hi trurl, thanks again for the response.

 

I chose to do a new config and the parity drive began rebuilding right away.

 

When it was finished I did the same test again.  This time when I did the new config, the "parity is already valid" checkbox showed up for me and I was able to assign the drives and restart right away.  I'm guessing there were uncommitted writes in the queue yesterday, so unraid didn't give me the option?  I waited 20 minutes after the writes to do the test, but apparently that wasn't enough time. 

 

Now I just need to re-copy my last job from yesterday, since some data was likely lost.

 

Thanks for your help!

Link to comment

... When it was finished I did the same test again.  This time when I did the new config, the "parity is already valid" checkbox showed up for me and I was able to assign the drives and restart right away.  I'm guessing there were uncommitted writes in the queue yesterday, so unraid didn't give me the option?  I waited 20 minutes after the writes to do the test, but apparently that wasn't enough time. 

...

 

The time you waited after the writes doesn't matter.  What you described in your initial post is NOT the same as doing a New Config -- i.e. you said "... It appears that I just need to stop the array and select the "parity is already valid" checkbox next to the start button and start it up.    I see no checkbox there (or anywhere on the web gui).  The disk is still marked with a red X (device is disabled)."

 

That's because you simply replaced the disk and then looked for a "parity is already valid" checkbox => but did NOT do a New Config.    The "parity is already valid" choice ONLY applies to new configurations ... and should only be used if you're CERTAIN that's the case.    In your post you noted that with the disk out, "... Everything runs fine!  Woo!", and that you had rebooted a few times, etc.    Based on those comments trurl correctly noted that "... Your description is not sufficiently detailed to assure me that your parity is actually in sync ...".    If you had definitely NOT done any writes to the array while the disk was out then you could in fact have simply done a New Config and checked the "already valid" box and all would have been fine.

 

You basically repeated the test today ... but this time you in fact did a New Config -- so you got the "already valid" choice.

How long you waited after the writes is completely irrelevant ... you could have done it immediately after you finished.

 

 

 

Link to comment

In order to get the trust parity checkbox, you must New Config and reassign all drives

 

Howdy trurl and Gary,

 

Somehow I thought there was a way to get the "Parity is already valid" checkbox *without* having to do a complete New Config and reenter all disks.  Am I wrong?  I recently rewrote the 'Trust My Array' wiki page to update it for all versions, and I added a shorter procedure based on that idea.  I know I recently devised a better method to convert all disks to XFS, and never had to use New Config, which may be what misled me.  If I'm wrong, I'll fix the page ASAP, don't want to mislead anyone else.

 

There is a section "Replace a Failed Disk", but that makes it sound as if just rebooting the machine and restarting the array with a new disk will automatically trigger a rebuild.  That was not the case with my existing disk.

 

It sounds like my recent wiki page rewrite may have misled you, causing additional work, and if so, I deeply apologize.  Can you give me a link to the "Replace a Failed Disk" info you found?  I want to check it too.  Feel free to make any suggestions, for improvement in the docs.

Link to comment

RobJ,

 

The Parity is already valid checkbox will only appear when a new configuration is selected, in other words the user has initiated New Config from Tools.

 

Sorry, I am not trurl or Gary  :D

 

Ps. True for both version 6.1 and 6.2

 

I believe this didn't work on 6.1 but on 6.2 there's another way:

 

-unassign parity(s)

-start array

-stop array

-add parity(s) and checkbox will appear (in this step it's also possible to re-arrange the disks if using single parity, not with dual parity, or Q parity will not stay valid)

 

Link to comment

Hi trurl, thanks again for the response.

 

I chose to do a new config and the parity drive began rebuilding right away.

 

When it was finished I did the same test again.  This time when I did the new config, the "parity is already valid" checkbox showed up for me and I was able to assign the drives and restart right away.  I'm guessing there were uncommitted writes in the queue yesterday, so unraid didn't give me the option?  I waited 20 minutes after the writes to do the test, but apparently that wasn't enough time. 

 

Now I just need to re-copy my last job from yesterday, since some data was likely lost.

 

Thanks for your help!

Have you done a parity check since your "experiment"? If not you should just to make sure parity and that drive were actually in sync. As I said

I suppose another approach would be New Config, Trust Parity, then do a non-correcting parity check. If there are no parity errors then you are good. If there are parity errors then (maybe? probably?) the reason is that data disk is not in sync with parity. Then you could rebuild the disk. If you want to take that approach then if you discover that you do have parity errors you can repeat your "experiment" and rebuild the disk.

 

One other thing about all this, though. You seem to think that just copying the files again are all that would be required to make everything all right. Possibly but that may not always be the case. There is a lot more that may be involved when a disk is disabled. It might be that some files were incomplete. It might even be the case that updates to the disks filesystem were missed in which case you could have filesystem corruption. This is why in general the correct way to deal with a disabled disk is to rebuild the disk, not to rebuild parity, and not to tell the system that parity is to be trusted.

 

Link to comment

Hi garycase,

 

Thanks for the clarification, and thanks to RobJ for updating the documentation!  I was just confused by the checkbox not showing up according to what I'd read.

 

Now that my int test was successful, I feel comfortable about unraid functioning the way I'd expected.

 

@trurl: I will do a rebuild of the drive.  Like I said though, the use case of rebuilding an existing drive is not covered in the V6 manual, so my only way of knowing how to do it was from this thread.

 

Thanks everyone for the help.  I'm used to watching tumbleweeds roll by when I post somewhere.  This community rocks!

Link to comment

... When it was finished I did the same test again.  This time when I did the new config, the "parity is already valid" checkbox showed up for me and I was able to assign the drives and restart right away.  I'm guessing there were uncommitted writes in the queue yesterday, so unraid didn't give me the option?  I waited 20 minutes after the writes to do the test, but apparently that wasn't enough time. 

...

 

The time you waited after the writes doesn't matter.  What you described in your initial post is NOT the same as doing a New Config -- i.e. you said "... It appears that I just need to stop the array and select the "parity is already valid" checkbox next to the start button and start it up.    I see no checkbox there (or anywhere on the web gui).  The disk is still marked with a red X (device is disabled)."

 

That's because you simply replaced the disk and then looked for a "parity is already valid" checkbox => but did NOT do a New Config.    The "parity is already valid" choice ONLY applies to new configurations ... and should only be used if you're CERTAIN that's the case.    In your post you noted that with the disk out, "... Everything runs fine!  Woo!", and that you had rebooted a few times, etc.    Based on those comments trurl correctly noted that "... Your description is not sufficiently detailed to assure me that your parity is actually in sync ...".    If you had definitely NOT done any writes to the array while the disk was out then you could in fact have simply done a New Config and checked the "already valid" box and all would have been fine.

 

You basically repeated the test today ... but this time you in fact did a New Config -- so you got the "already valid" choice.

How long you waited after the writes is completely irrelevant ... you could have done it immediately after you finished.

 

So as a newer user of Unraid without a lot of background in software I would like to ask this question. Or maybe I clearly don't understand his method he described of removing the HDD.

 

"Hi!

 

I just got my new unraid server (v. 6.1.9) up and running with shares, and I copied a bunch of data onto it.  I have a hot-swappable rack server with 4 8TB drives, so I thought I should test to make sure unraid does what it says it does, so I popped out the drive that had the most data on it.  Everything runs fine!  Woo!"

 

He has a hot -swappable rack server and he states he just pulls a data drive.

He does this without taking the system off-line or shutting it down ?

Is this correct method of pulling a drive ?

How can the system be accessed without the data drive ?

Link to comment

...

He has a hot -swappable rack server and he states he just pulls a data drive.

He does this without taking the system off-line or shutting it down ?

Is this correct method of pulling a drive ?

How can the system be accessed without the data drive ?

 

Pulling a drive out is effectively the same as if the drive had failed.    With a hot-swap setup it's fine to do that.

 

r.e. "... How can the system be accessed without the data drive ? " ==>  That's the whole idea of fault tolerance.    UnRAID sees that the drive is missing, so it reads all of the other drives plus the parity drive to reconstruct the data that is on the "missing" drive.    You can still read from the missing drive; write to it; etc. -- the operations are just a bit more complex and slower.    Obviously you don't want to run like this for any length of time.    When you have no further fault tolerance it's referred to as "running at risk" => i.e. any subsequent failure will result in data loss.    Had the drive actually failed, you'd want to replace it as soon as possible and let UnRAID rebuild the failed drive onto the new one.

 

Clearly if you have another failure before the drive is rebuilt you'll lose data.    That's the idea of a dual parity system (e.g. v6.2) => you're than protected against two failures, so if a drive fails during a rebuild the rebuild will still complete successfully -- and then you can replace the 2nd failed drive and rebuild it as well.

 

Link to comment

Hi garycase,

 

Thanks for the clarification, and thanks to RobJ for updating the documentation!  I was just confused by the checkbox not showing up according to what I'd read.

 

Now that my int test was successful, I feel comfortable about unraid functioning the way I'd expected.

 

@trurl: I will do a rebuild of the drive.  Like I said though, the use case of rebuilding an existing drive is not covered in the V6 manual, so my only way of knowing how to do it was from this thread.

 

Thanks everyone for the help.  I'm used to watching tumbleweeds roll by when I post somewhere.  This community rocks!

Don't do anything without first answering my question:

Have you done a parity check since your "experiment"? If not you should just to make sure parity and that drive were actually in sync.

Make it a non-correcting parity check.

 

If you have already done a parity check, and it was a correcting parity check, then it's too late to rebuild the drive.

Link to comment

Hi trurl,

 

I did a non-correcting parity check after your post and it did turn up two parity errors:

 

kernel: md: parity incorrect, sector=0

kernel: md: parity incorrect, sector=8590376744

 

I put off dealing with that because I wanted to swap out the high decibel fans that came with my XCase build.  After endless reboots and tweaks the fans are now working fine. 

 

I did another non-correcting parity check with the same results.  I then ran the extended SMART tests on all drives, but saw no problems there.

 

So, from what you said I guess my understanding of how the parity works is incomplete.  My assumption was that when the automatic parity rebuild happened, it fully rebuilt the parity, thus removing any previous parity information.

 

My question is which should I rebuild, parity or data, and why?  Any enlightenment would be appreciated.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.