UnRaid 6.2 Dual Parity - error correction options?

cybrnook · September 17, 2016

Now that 6.2 (official) is released, I have a question in regards to dual parity (which is the first thing I enabled :) :) :) )

With dual parity, we can better triangulate where a potential data corruption exists during an array scan. However, in the gui we still have only the check box for "write corrections to parity", OR NOT.

Since we have dual parity now, and can better pin point where the issue lies (Parity 1 and data good, but parity 2 bad) or (parity 1 good, parity 2 good, data bad) etc...

Would it not be more advantageous to add a few more options on how the array should handle a found fault? Because, of course, if my array finds an error, I want to fix the error where it lies, not "assume" we need to correct it in parity, if parity is not at fault OR if one of the two parities is at fault. I don't want the fault to traverse BOTH my parities if the data is at fault.

I hope this makes sense?? I confused myself while trying to explain it

cybrnook · September 21, 2016

What I mean above, is would it be possible to potentially add an option that says:

(if) DATA on DISK* matches PARITY1 (but) does not match PARITY2, (then) correct PARITY2

(if) DATA on DISK* matches PARITY2 (but) does not match PARITY1, (then) correct PARITY1

(if) DATA on DISK* does not match PARITY1 (or) PARITY2, (but) PARITY1 matches PARITY2, (then) correct DATA* from PARITY*

Is this even possible?

Or is it just Fix DATA on DISK from PARITY, or adjust (BOTH) PARITY DISKS from DATA?

I would think with dual parity, we can now triangulate where the issue is? Can we act on that too?

ijuarez · September 21, 2016

i hope you get an answer to this, cause that is some serious parity what if's (LOL)

itimpi · September 21, 2016

What I mean above, is would it be possible to potentially add an option that says:

(if) DATA on DISK* matches PARITY1 (but) does not match PARITY2, (then) correct PARITY2

(if) DATA on DISK* matches PARITY2 (but) does not match PARITY1, (then) correct PARITY1

(if) DATA on DISK* does not match PARITY1 (or) PARITY2, (but) PARITY1 matches PARITY2, (then) correct DATA* from PARITY*

Is this even possible?

Or is it just Fix DATA on DISK from PARITY, or adjust (BOTH) PARITY DISKS from DATA?

I would think with dual parity, we can now triangulate where the issue is? Can we act on that too?

i think I read somewhere that the logic applied is already something like that.

ljm42 · September 21, 2016

unRAID always assumes the parity bits are bad. Took me a while to find it, but here is Tom's explanation:

https://lime-technology.com/forum/index.php?topic=47875.msg460754#msg460754

cybrnook · September 22, 2016

I guess that confuses me a bit on having dual parity.

Tom's post was clear, but leaves me wondering the effectiveness of implementing dual-parity if we are still treating errors the same way we would with a single parity disk?

Initially I was super excited to roll out dual parity, and have done so. But if my only correction options are still correct data or correct parity, and as Tom suggests, we default to leaving the corrections to parity. Then If I have a data corruption on disk, EVEN THOUGH I should have the technical ability to know it's the data that's corrupted and not parity, but these defaults rules I am now going to adjust my correct parity to now account for a data corruption. Thus, when/if I ever need to rebuild that disk I will be recovering a corruption. Instead, when we had the chance, could have corrected the bad parity (assuming 1 of the 2 parities was bad).

I guess... what's dual parity really doing for me if I can't influence how it acts and still treat it as single parity as far as my monthly scan's are concerned?

itimpi · September 22, 2016

The big thing dual parity gives you is the ability to recover two failed disks. The most likely time to hit such a scenario is when the second drive fails while recovering the first failed drive.

Although there might be a possible theoretical way to identify which drive has, I think it is a non-trivial computation. I am sure that if it was easy it would commonly be done.

RobJ · September 22, 2016

What I mean above, is would it be possible to potentially add an option that says:

(if) DATA on DISK* matches PARITY1 (but) does not match PARITY2, (then) correct PARITY2

(if) DATA on DISK* matches PARITY2 (but) does not match PARITY1, (then) correct PARITY1

(if) DATA on DISK* does not match PARITY1 (or) PARITY2, (but) PARITY1 matches PARITY2, (then) correct DATA* from PARITY*

That is almost exactly what was discussed earlier (don't remember where) concerning the development of dual parity, and what we hoped should be possible, except your third one. If the third condition is true, that would indicate the error is in the data, but it still could not be corrected, since you don't know *which* data drive has the error. We hoped that the first 2 conditions would automatically cause a correction of the parity disk in error.

But it's all too new yet, and parity errors are so rare, I don't think anyone has seen one yet, to know if that is what happens, and what it actually looks like.

I personally do still feel that bit errors that occur are much more likely to be on the parity drives, and that even if one occurs on a data drive, there's practically nothing you can do about it, so correcting the parity is still the right action.

garycase · September 22, 2016

If you read Tom's explanation, you'll see that the concern is that if there is more than one error in a parity stripe, attempting a correction can result in adding additional errors ... so the parity check errs on the side of caution and simply corrects the parity bit every time, for both P & Q.

It is possible mathmatically to correct the actual error IF there is only a single bit in error in the parity stripe. This could be determined by iteratively toggling the data bit from each disk; then seeing if that (a) fixes the parity issue with parity #1; and (b) results in parity #2 being good for EVERY corresponding bit on every other disk in that specific parity stripe for parity #1. But this would require an involved set of computations every time a parity error was found ... and if there was no bit found that resulted in (b) showing all good checks, then the only alternative would be to simply correct the parity bit(s) (as is done now).

Parity errors are, generally speaking, very rare ... so what I'd suggest is to simply do a complete check of your data anytime you've had a parity error. Time consuming ... but it's almost entirely "computer time" and not "your time." [Just kick off a checksum validation and then see what the results are.] This will identify any file(s) that may have been corrupted ... and you can then simply replace them from your backups.

cybrnook · September 22, 2016

@garycase,

Are you generating checksums against the data being stored on your array? If so, what are you doing to achieve this, and are you keeping the checksum values somewhere as well on the array (in a folder, assuming something like an .md5)? Then in the event a checksum mismatch happens after a potential failed parity check, you are just flagging the problem child (after again generating "new" checksums against the stored data), then comparing the checksum vales to the old to identify where the error is?

@all, thanks for all the info we adding to this thread. I feel I am getting answers to my questions, even though we may not be where I would hope us to be (if feasible at all).

garycase · September 22, 2016

I use Corz to generate the checksums (via a Windows client), but some folks prefer the Dynamix File Integrity plugin, which is a bit more automated.

My checksums are stored in the same folder as the files they're protecting. In the event a checksum file can't be read, then clearly that folder is corrupted, so there's nothing to check. If I have unreadable checksums; or a file with a bad checksum; I simply replace the data from my backups.

cybrnook · September 23, 2016

Thanks for the hint Gary, I installed the dynamix plugin last night and generated my first run. Tonight when I get home, I will work on exporting them out.

I find this a good middle ground, thanks for the suggestion.

P.S. don't you find generating your checksums from a windows host to be a rather large overhead? Wouldn't the checksum be running on your windows client (using local hardware), while scanning a file/folder on your unraid box, making everything happen over your network? Unless of course you are on 10Gbe at home, and not 1Gbe like most of us :-

Looking at the metrics of my checksum run last night, using BLAKE2 (But seems I may be switching to SHA2 for consistency), I was peaking at about 140MBps, which would have saturated my LAN if I ran from a local windows client.

garycase · September 23, 2016

...

P.S. don't you find generating your checksums from a windows host to be a rather large overhead? Wouldn't the checksum be running on your windows client (using local hardware), while scanning a file/folder on your unraid box, making everything happen over your network? Unless of course you are on 10Gbe at home, and not 1Gbe like most of us :-

Looking at the metrics of my checksum run last night, using BLAKE2 (But seems I may be switching to SHA2 for consistency), I was peaking at about 140MBps, which would have saturated my LAN if I ran from a local windows client.

In most cases, I generate checksums BEFORE I copy content to UnRAID [i.e. for any new media files]. For content where that's not the case, yes, when I have Corz generate checksums for any new data on the server, it's limited by the Gb network. I simply don't find that an issue -- (a) it's still quite fast; and (b) it's not taking "my time" ... just "computer time" => so other than a few seconds for me to start it, it's no big deal.

Doing a full validation does indeed take a long time ... but again, it's a 10-second process for me [Highlight the share I want to check; right-click; and select "Verify checksums"] ... then Corz runs until it's finished. The fact it's running on my main Windows box isn't an issue ... it's on 24/7 anyway.

daquint · January 28, 2017

Its practically a 'parody'!

What I mean above, is would it be possible to potentially add an option that says:

(if) DATA on DISK* matches PARITY1 (but) does not match PARITY2, (then) correct PARITY2

(if) DATA on DISK* matches PARITY2 (but) does not match PARITY1, (then) correct PARITY1

(if) DATA on DISK* does not match PARITY1 (or) PARITY2, (but) PARITY1 matches PARITY2, (then) correct DATA* from PARITY*

Is this even possible?

Or is it just Fix DATA on DISK from PARITY, or adjust (BOTH) PARITY DISKS from DATA?

I would think with dual parity, we can now triangulate where the issue is? Can we act on that too?

UnRaid 6.2 Dual Parity - error correction options?

Recommended Posts

cybrnook

Link to comment

cybrnook

Link to comment

ijuarez

Link to comment

itimpi

Link to comment

ljm42

Link to comment

cybrnook

Link to comment

itimpi

Link to comment

RobJ

Link to comment

garycase

Link to comment

cybrnook

Link to comment

garycase

Link to comment

cybrnook

Link to comment

garycase

Link to comment

daquint

Link to comment

Join the conversation