UnRaid 6.2 Dual Parity - error correction options?


cybrnook

Recommended Posts

Now that 6.2 (official) is released, I have a question in regards to dual parity (which is the first thing I enabled  :) :) :) :) :) :) :) )

 

With dual parity, we can better triangulate where a potential data corruption exists during an array scan. However, in the gui we still have only the check box for "write corrections to parity", OR NOT.

 

Since we have dual parity now, and can better pin point where the issue lies (Parity 1 and data good, but parity 2 bad) or  (parity 1 good, parity 2 good, data bad) etc...

Would it not be more advantageous to add a few more options on how the array should handle a found fault? Because, of course, if my array finds an error, I want to fix the error where it lies, not "assume" we need to correct it in parity, if parity is not at fault OR if one of the two parities is at fault. I don't want the fault to traverse BOTH my parities if the data is at fault.

 

I hope this makes sense?? I confused myself while trying to explain it  8)

Link to comment

What I mean above, is would it be possible to potentially add an option that says:

 

(if) DATA on DISK* matches PARITY1 (but) does not match PARITY2, (then) correct PARITY2

(if) DATA on DISK* matches PARITY2 (but) does not match PARITY1, (then) correct PARITY1

(if) DATA on DISK* does not match PARITY1 (or) PARITY2, (but) PARITY1 matches PARITY2, (then) correct DATA* from PARITY*

 

Is this even possible?

 

Or is it just Fix DATA on DISK from PARITY, or adjust (BOTH) PARITY DISKS from DATA?

 

I would think with dual parity, we can now triangulate where the issue is? Can we act on that too?

 

Link to comment

What I mean above, is would it be possible to potentially add an option that says:

 

(if) DATA on DISK* matches PARITY1 (but) does not match PARITY2, (then) correct PARITY2

(if) DATA on DISK* matches PARITY2 (but) does not match PARITY1, (then) correct PARITY1

(if) DATA on DISK* does not match PARITY1 (or) PARITY2, (but) PARITY1 matches PARITY2, (then) correct DATA* from PARITY*

 

Is this even possible?

 

Or is it just Fix DATA on DISK from PARITY, or adjust (BOTH) PARITY DISKS from DATA?

 

I would think with dual parity, we can now triangulate where the issue is? Can we act on that too?

i think I read somewhere that the logic applied is already something like that.
Link to comment

I guess that confuses me a bit on having dual parity.

 

Tom's post was clear, but leaves me wondering the effectiveness of implementing dual-parity if we are still treating errors the same way we would with a single parity disk?

 

Initially I was super excited to roll out dual parity, and have done so. But if my only correction options are still correct data or correct parity, and as Tom suggests, we default to leaving the corrections to parity. Then If I have a data corruption on disk, EVEN THOUGH I should have the technical ability to know it's the data that's corrupted and not parity, but these defaults rules I am now going to adjust my correct parity to now account for a data corruption. Thus, when/if I ever need to rebuild that disk I will be recovering a corruption. Instead, when we had the chance, could have corrected the bad parity (assuming 1 of the 2 parities was bad).

 

I guess... what's dual parity really doing for me if I can't influence how it acts and still treat it as single parity as far as my monthly scan's are concerned?

Link to comment

The big thing dual parity gives you is the ability to recover two failed disks.  The most likely time to hit such a scenario is when the second drive fails while recovering the first failed drive.

 

Although there might be a possible theoretical way to identify which drive has, I think it is a non-trivial computation.  I am sure that if it was easy it would commonly be done.

Link to comment

What I mean above, is would it be possible to potentially add an option that says:

 

(if) DATA on DISK* matches PARITY1 (but) does not match PARITY2, (then) correct PARITY2

(if) DATA on DISK* matches PARITY2 (but) does not match PARITY1, (then) correct PARITY1

(if) DATA on DISK* does not match PARITY1 (or) PARITY2, (but) PARITY1 matches PARITY2, (then) correct DATA* from PARITY*

 

That is almost exactly what was discussed earlier (don't remember where) concerning the development of dual parity, and what we hoped should be possible, except your third one.  If the third condition is true, that would indicate the error is in the data, but it still could not be corrected, since you don't know *which* data drive has the error.  We hoped that the first 2 conditions would automatically cause a correction of the parity disk in error.

 

But it's all too new yet, and parity errors are so rare, I don't think anyone has seen one yet, to know if that is what happens, and what it actually looks like.

 

I personally do still feel that bit errors that occur are much more likely to be on the parity drives, and that even if one occurs on a data drive, there's practically nothing you can do about it, so correcting the parity is still the right action.

Link to comment

If you read Tom's explanation, you'll see that the concern is that if there is more than one error in a parity stripe, attempting a correction can result in adding additional errors ... so the parity check errs on the side of caution and simply corrects the parity bit every time, for both P & Q.

 

It is possible mathmatically to correct the actual error IF there is only a single bit in error in the parity stripe.  This could be determined by iteratively toggling the data bit from each disk; then seeing if that (a) fixes the parity issue with parity #1; and (b) results in parity #2 being good for EVERY corresponding bit on every other disk in that specific parity stripe for parity #1.    But this would require an involved set of computations every time a parity error was found ... and if there was no bit found that resulted in (b) showing all good checks, then the only alternative would be to simply correct the parity bit(s) (as is done now).

 

Parity errors are, generally speaking, very rare ... so what I'd suggest is to simply do a complete check of your data anytime you've had a parity error.    Time consuming ... but it's almost entirely "computer time" and not "your time."  [Just kick off a checksum validation and then see what the results are.]    This will identify any file(s) that may have been corrupted ... and you can then simply replace them from your backups.

 

 

Link to comment

@garycase,

 

Are you generating checksums against the data being stored on your array? If so, what are you doing to achieve this, and are you keeping the checksum values somewhere as well on the array (in a folder, assuming something like an .md5)? Then in the event a checksum mismatch happens after a potential failed parity check, you are just flagging the problem child (after again generating "new" checksums against the stored data), then comparing the checksum vales to the old to identify where the error is?

 

@all, thanks for all the info we adding to this thread. I feel I am getting answers to my questions, even though we may not be where I would hope us to be (if feasible at all).

Link to comment

I use Corz to generate the checksums (via a Windows client), but some folks prefer the Dynamix File Integrity plugin, which is a bit more automated.

 

My checksums are stored in the same folder as the files they're protecting.  In the event a checksum file can't be read, then clearly that folder is corrupted, so there's nothing to check.    If I have unreadable checksums; or a file with a bad checksum; I simply replace the data from my backups.

 

Link to comment

Thanks for the hint Gary, I installed the dynamix plugin last night and generated my first run. Tonight when I get home, I will work on exporting them out.

 

I find this a good middle ground, thanks for the suggestion.

 

P.S. don't you find generating your checksums from a windows host to be a rather large overhead? Wouldn't the checksum be running on your windows client (using local hardware), while scanning a file/folder on your unraid box, making everything happen over your network? Unless of course you are on 10Gbe at home, and not 1Gbe like most of us :-

 

Looking at the metrics of my checksum run last night, using BLAKE2 (But seems I may be switching to SHA2 for consistency), I was peaking at about 140MBps, which would have saturated my LAN if I ran from a local windows client.

Link to comment

...

P.S. don't you find generating your checksums from a windows host to be a rather large overhead? Wouldn't the checksum be running on your windows client (using local hardware), while scanning a file/folder on your unraid box, making everything happen over your network? Unless of course you are on 10Gbe at home, and not 1Gbe like most of us :-

 

Looking at the metrics of my checksum run last night, using BLAKE2 (But seems I may be switching to SHA2 for consistency), I was peaking at about 140MBps, which would have saturated my LAN if I ran from a local windows client.

 

In most cases, I generate checksums BEFORE I copy content to UnRAID [i.e. for any new media files].  For content where that's not the case, yes, when I have Corz generate checksums for any new data on the server, it's limited by the Gb network.  I simply don't find that an issue -- (a) it's still quite fast; and (b) it's not taking "my time" ... just "computer time" => so other than a few seconds for me to start it, it's no big deal.

 

Doing a full validation does indeed take a long time ... but again, it's a 10-second process for me [Highlight the share I want to check; right-click; and select "Verify checksums"] ... then Corz runs until it's finished.  The fact it's running on my main Windows box isn't an issue ... it's on 24/7 anyway.

 

 

Link to comment
  • 4 months later...

Its practically a 'parody'! :)

What I mean above, is would it be possible to potentially add an option that says:

 

(if) DATA on DISK* matches PARITY1 (but) does not match PARITY2, (then) correct PARITY2

(if) DATA on DISK* matches PARITY2 (but) does not match PARITY1, (then) correct PARITY1

(if) DATA on DISK* does not match PARITY1 (or) PARITY2, (but) PARITY1 matches PARITY2, (then) correct DATA* from PARITY*

 

Is this even possible?

 

Or is it just Fix DATA on DISK from PARITY, or adjust (BOTH) PARITY DISKS from DATA?

 

I would think with dual parity, we can now triangulate where the issue is? Can we act on that too?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.