[FIXED in 6.1.5] Change in Automatic Parity Check with "Trust Parity" option


Recommended Posts

If you do a New Config and check the "Parity is already valid" box  [i.e. what's often called the "Trust Parity" option], the system immediately starts a parity check when the array is started.

 

In situations where the goal of the New Config is to do a drive rebuild due to some user error and a previous configuration is being reconstructed to take this action, that parity check can do some "corrections" that will invalidate the parity disk and not allow a proper reconstruction of the drive.    The check can be quickly Canceled => but ANY "corrections" already done will cause errors in the reconstructed drive.

 

I'd like to see one of the following changes ...

 

(1)  Make this specific check a non-correcting check, so nothing is changed on the parity disk

 

OR

 

(2)  Instead of automatically starting one, post a note advising it ... e.g. "A Parity Check should be run to confirm the parity disk is valid".    You could post this note every time the array starts until a check is run.

 

OR

 

(3)  When you check the "Parity is already Valid" box, display a 2nd box that says "Automatically check parity when the array starts"  => and only start the automated check if that box is checked.

 

Any of those would eliminate the potential changes to parity when the array starts that are NOT desirable.

 

=================================

Great news => This is fixed in v6.1.5 !!

 

"Trust Parity" now means exactly that -- the array will Start and NOT initiate a parity check.

 

Not helpful for those who have already lost what would have been recoverable data (e.g. archivist) ... but nice to know that it's no longer an issue.

 

Link to comment

The option to default to a non-correcting check would indeed resolve this (note it was one of the 3 alternatives I mentioned).

 

The simple fact is that the "Trust Parity" option (i.e. checking the "Parity is already valid" box) allows reconfiguring an array when you KNOW that parity is already good => but to then automatically start a parity check will wipe out that good parity if the reason you re-created the array is to rebuild another drive that has been corrupted if ANY of the corrupted sectors are encountered before you can CANCEL the automatic check [since the corrupted data will cause parity to be "corrected" to match the corrupted data].

 

I think the best option is to NOT do an automatic check in this circumstance => the system's been told to "Trust Parity" ... so it should TRUST it !!

 

Link to comment
if the reason you re-created the array is to rebuild another drive that has been corrupted

 

In this case you would not use the 'trust parity' box.  Instead you restore the device configuration you had before then type a command at the console or telnet/ssh session:

 

mdcmd set invalidslot <N>

 

where <N> is the disk number that you want disabled.

 

After typing this command, you click 'Start' back on the webGui (without doing an intervening browser refresh).

 

The array will come up with that disk disabled, and if a device has been assigned, it will kick off the reconstruct.

 

[The way the 'trust parity' box is implemented is, if checked, emhttp will execute 'mdcmd set invalidslot 99' just before starting driver.]

Link to comment

if the reason you re-created the array is to rebuild another drive that has been corrupted

 

In this case you would not use the 'trust parity' box.  Instead you restore the device configuration you had before then type a command at the console or telnet/ssh session:

 

mdcmd set invalidslot <N>

 

where <N> is the disk number that you want disabled.

 

After typing this command, you click 'Start' back on the webGui (without doing an intervening browser refresh).

 

The array will come up with that disk disabled, and if a device has been assigned, it will kick off the reconstruct.

 

[The way the 'trust parity' box is implemented is, if checked, emhttp will execute 'mdcmd set invalidslot 99' just before starting driver.]

Wish invalidslot was better documented. All I have ever seen are old posts for old versions. It is one of those things that seems more like unreliable folklore than recommended procedures.

 

I think adding it to the GUI would just be asking for trouble, but something in the wiki so that more "motivated" users could know exactly how to use it and what to expect would be great.

 

Maybe somebody with a test server could work through testing a detailed procedure to follow and document it.

 

Would certainly be better than the "procedure" we have come up with that prompted this feature request.

Link to comment

if the reason you re-created the array is to rebuild another drive that has been corrupted

 

In this case you would not use the 'trust parity' box.  Instead you restore the device configuration you had before then type a command at the console or telnet/ssh session:

 

mdcmd set invalidslot <N>

 

where <N> is the disk number that you want disabled.

 

After typing this command, you click 'Start' back on the webGui (without doing an intervening browser refresh).

 

The array will come up with that disk disabled, and if a device has been assigned, it will kick off the reconstruct.

 

[The way the 'trust parity' box is implemented is, if checked, emhttp will execute 'mdcmd set invalidslot 99' just before starting driver.]

 

I’m trying to test this procedure but I think I’m doing something wrong.

 

This is what I did:

 

  • Created a parity + 2 disk array, xfs formatted, copied some data to disk1
  • Stoped array
  • New config
  • Selected same parity and disk 2
  • Selected different disk as disk 1
  • Without starting array typed on console: mdcmd set invalidslot 1
  • Started array
     

 

Instead of rebuild Disk 1, it appears as unmountable and unraid starts doing a parity sync

 

Also tried invalidslot 2 with same result as I was not sure if slot 1 is parity or disk 1

 

Can anyone see what I’m doing wrong?

 

Link to comment

if the reason you re-created the array is to rebuild another drive that has been corrupted

 

In this case you would not use the 'trust parity' box.  Instead you restore the device configuration you had before then type a command at the console or telnet/ssh session:

 

mdcmd set invalidslot <N>

 

where <N> is the disk number that you want disabled.

 

After typing this command, you click 'Start' back on the webGui (without doing an intervening browser refresh).

 

The array will come up with that disk disabled, and if a device has been assigned, it will kick off the reconstruct.

 

[The way the 'trust parity' box is implemented is, if checked, emhttp will execute 'mdcmd set invalidslot 99' just before starting driver.]

 

Not sure what you're saying here.    The cases where this was needed were instances where somebody had already done a New Config, but had made some significant error -- e.g. incorrectly assigning a data drive as parity -- and when they started the system it corrupted a drive.  The users almost certainly do NOT have a copy of their previous config from the flash drive; so to "get back" to the configuration they HAD requires a New Config.

 

As I understand it, if they do this, UnRAID will NOT recognize that the parity drive is valid unless the "Parity is already valid" box is checked.    Is that not correct?    The problem is that if you check the box UnRAID immediately starts a parity check when you start the array.    If there's a "known bad" disk in the system at that point, then that parity check will almost certainly do a bunch of "corrections" to the parity drive -- which kills the opportunity to rebuild that bad disk.

 

I guess my question is WHY doesn't the "Trust Parity" option result in parity actually being trusted instead of starting that parity check?

 

Link to comment

OK, I misunderstood, restore device configuration is restoring from a flash backup, not doing a new config.

 

This is what once happened to me once and why I’d like this feature request:

 

Server was all green

I started to upgrade my parity drive

Forgot to make a flash backup of old config, yes I know I should have.

During the parity sync one of the data disks redballed with read errors.

So I had to put the old parity back, did a new config and started array, thankfully the problem disk errors were not in the beginning and I was able to stop it without invalidating my parity and then rebuilded the problem disk.

 

If the disk was completely dead I believe I could not have recovered from this without this feature request.

 

Link to comment

...I started to upgrade my parity drive...During the parity sync one of the data disks redballed with read errors...

Slightly OT, but I thought that a redball only results from a write error.

 

Here is what I thought happened with a read error, and that might produce a redball:

 

If the data cannot be read, then unRAID will "reconstruct" the data from the other disks + parity, and then attempt to write that data back to the disk. If that write fails then you get a redball.

 

But if parity is being built to a new disk how can it reconstruct the data that can't be read?

 

Is this a special case for redballing, or have I got it all wrong?

Link to comment

...I started to upgrade my parity drive...During the parity sync one of the data disks redballed with read errors...

Slightly OT, but I thought that a redball only results from a write error.

 

Here is what I thought happened with a read error, and that might produce a redball:

 

If the data cannot be read, then unRAID will "reconstruct" the data from the other disks + parity, and then attempt to write that data back to the disk. If that write fails then you get a redball.

 

But if parity is being built to a new disk how can it reconstruct the data that can't be read?

 

Is this a special case for redballing, or have I got it all wrong?

 

I think you’re right, it was some time ago, I think what I got were several read errors.

Link to comment

If you do a New Config and check the "Parity is already valid" box  [i.e. what's often called the "Trust Parity" option], the system immediately starts a parity check when the array is started.

 

In situations where the goal of the New Config is to do a drive rebuild due to some user error and a previous configuration is being reconstructed to take this action, that parity check can do some "corrections" that will invalidate the parity disk and not allow a proper reconstruction of the drive.    The check can be quickly Canceled => but ANY "corrections" already done will cause errors in the reconstructed drive.

 

I'd like to see one of the following changes ...

 

(1)  Make this specific check a non-correcting check, so nothing is changed on the parity disk

 

OR

 

(2)  Instead of automatically starting one, post a note advising it ... e.g. "A Parity Check should be run to confirm the parity disk is valid".    You could post this note every time the array starts until a check is run.

 

OR

 

(3)  When you check the "Parity is already Valid" box, display a 2nd box that says "Automatically check parity when the array starts"  => and only start the automated check if that box is checked.

 

Any of those would eliminate the potential changes to parity when the array starts that are NOT desirable.

+1...

Link to comment
  • 2 weeks later...

This is REALLY a necessary change => when you need to "Trust Parity", the system should in fact trust it ... and NOT start an automatic check that COULD, depending on the reason for reconstituting the array, actually destroy the ability to do a rebuild.

 

Tom => Note that you once noted that it was NOT your intent for this to happen (way back in v5 days).    Somewhere along the way it has started doing it again !!

 

Yes I've noticed that a "trust parity" operation fires up a parity-check.  This is not the intent.  I intended that if you Start the array with "Parity is already valid" box checked, that no parity check is automatically started.  This has been fixed in 5.0.

 

Link to comment

Tom => There's been yet-another casualty because of the automatic parity check.    That's now the 4th case I can recall in the past few months where a drive could have easily been recovered after it was accidentally assigned as parity (and thus corrupted) IF the "Trust Parity" option actually TRUSTED it ... but the automatic parity check destroyed the ability to do a rebuild.

 

Here's the latest thread:  http://lime-technology.com/forum/index.php?topic=44022.msg420282#msg420282

Link to comment

Related to this would be an option in the New Config to leave all current assignments in place (as though one had done a New Config and then assigned all drives as they were before the New Config) so that one can now make any desired changes.  As well as being a convenience, this would also dramatically reduce the chance of anyone accidentally assigning the drives incorrectly when doing the New Config.  I even think it should be the default behaviour with a checkbox added labelled something like "Clear all current assignments"

Link to comment
Related to this would be an option in the New Config to leave all current assignments in place (as though one had done a New Config and then assigned all drives as they were before the New Config) so that one can now make any desired changes.  As well as being a convenience, this would also dramatically reduce the chance of anyone accidentally assigning the drives incorrectly when doing the New Config.  I even think it should be the default behaviour with a checkbox added labelled something like "Clear all current assignments"

 

Something I grumble about every time I hit that "New Config" button.

Link to comment

Related to this would be an option in the New Config to leave all current assignments in place (as though one had done a New Config and then assigned all drives as they were before the New Config) so that one can now make any desired changes.  As well as being a convenience, this would also dramatically reduce the chance of anyone accidentally assigning the drives incorrectly when doing the New Config.  I even think it should be the default behaviour with a checkbox added labelled something like "Clear all current assignments"

 

Something I grumble about every time I hit that "New Config" button.

Me too! +1 to that as well...

But I agree that changing the default behavior needs to be don't do a correcting parity check is the first priority.

Link to comment
  • 2 weeks later...

Great news => This is fixed in v6.1.5 !!

 

"Trust Parity" now means exactly that -- the array will Start and NOT initiate a parity check.

 

Not helpful for those who have already lost what would have been recoverable data (e.g. archivist) ... but nice to know that it's no longer an issue.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.