[SOLVED] Parity Check should not write parity when data disk has a URE

August 14, 201510 yr

The more I think about this, the more I realise this is actually a very serious bug.

During any "Write Corrections" Parity Check (excluding the initial build parity) should a data disk produce a URE, the parity bit should not be altered corresponding to that position, the reasons are very simple, (a) you can't calculate parity anyway and (b) parity is probably already correct.

If you overwrite it, there is a 50% chance of getting it wrong, this is compounded with multiple URE's and good parity is inevitably destroyed along with the chance of recovering the disk with the URE's.

Quote

August 14, 201510 yr

I think ... you are confusing a correcting Parity Check, which assumes data is correct and parity is wrong

with

a URE triggered red-ball followed by a Data Rebuild, which assumes parity is correct and data is wrong.

Quote

August 14, 201510 yr

Author

No, I am meaning a correcting parity check

It is not possible to calculate parity with a data disk URE during a correcting parity check. Therefore, changing the bit on the parity drive corresponding to that position is not a valid thing to do. It will destroy parity. As soon as you hit a URE on a data drive you can no longer assume data is correct and parity is wrong.

Quote

August 14, 201510 yr

I don't think your statement of what happens is correct. I believe that Tom has stated that what happens is the block from parity and all other drives is used to calculate the correct value for the drive that cannot be read, and the calculated block is then written to the block in error, to overwrite it and attempt to correct it.

Quote

August 14, 201510 yr

A 'parity check' reads all devices in the array. If no URE's (unrecoverable read errors), then Parity and all Data are xor'ed together. If the result is not equal "all zeros", then parity is re-calculated by xor'ing the Data again, and written to the parity disk (provided "Write corrections to parity disk" checkbox is checked).

If any single disk reports URE then of course we don't have valid data for that disk. In this case we reconstruct the missing data by xor'ing parity with all 'other' data disks, result is then written to the disk which got the URE (regardless of whether "Write corrections to parity disk" is checked or not).

If any write fails this will disable the device.

This will be the same behavior with P+Q parity, only now there is both P and Q to check, and there can be one or two devices written.

Quote

August 14, 201510 yr

I don't think your statement of what happens is correct. I believe that Tom has stated that what happens is the block from parity and all other drives is used to calculate the correct value for the drive that cannot be read, and the calculated block is then written to the block in error, to overwrite it and attempt to correct it.

Yes that is exactly right.

Quote

August 14, 201510 yr

Author

Excellent, so is there any reason to choose a non-correcting parity check over a correcting parity check?

Quote

August 14, 201510 yr

Excellent, so is there any reason to choose a non-correcting parity check over a correcting parity check?

I have seen some (garycase) advocate a non-correcting parity check immediately after a data disk rebuild since the point in that case is to test the rebuild rather than testing parity.

Quote

August 14, 201510 yr

Excellent, so is there any reason to choose a non-correcting parity check over a correcting parity check?

After you rebuild a disk and check parity would be the only time it would make a difference. I'm a control freak I guess I want to control when things are written so even though I end up doing the correcting parity check immediately after doing a non-correcting one. I still do ONLY non-correcting checks unless an error happens.

Quote

August 14, 201510 yr

I'm a control freak I guess I want to control when things are written

This. I would like another maintenance mode option where the entire array including parity was mounted read only for those times when I want to recover data without the possibility of accidentally overwriting something.

Quote

August 14, 201510 yr

I'm a control freak I guess I want to control when things are written

This. I would like another maintenance mode option where the entire array including parity was mounted read only for those times when I want to recover data without the possibility of accidentally overwriting something.

Like this myself.

Quote

August 14, 201510 yr

Author

Since I got thousands of URE's on a disk during a correcting parity check, I have always run non-correcting parity checks.

The problem was a faulty cable, I do find it odd that with a faulty cable I would have got thousands of URE's but no write errors as, presumably the URE was attempted to be corrected and I would have expected to see the disk red balled and the parity check cease.

With the behaviour confirmed by limetech, I am struggling to see any valid logical reason to choose a non-correcting parity check (other than satisfying control freak tendencies)

Quote

August 14, 201510 yr

Since I got thousands of URE's on a disk during a correcting parity check, I have always run non-correcting parity checks.

The problem was a faulty cable, I do find it odd that with a faulty cable I would have got thousands of URE's but no write errors as, presumably the URE was attempted to be corrected and I would have expected to see the disk red balled and the parity check cease.

With the behaviour confirmed by limetech, I am struggling to see any valid logical reason to choose a non-correcting parity check (other than satisfying control freak tendencies)

I very much wish a non-correcting check would log all of the sync errors to a special file, and that those sync errors could be reverified and optionally corrected through a new process that only checks the sectors in that file. In this way, a correcting check doesn't have to reprocess the entire array, but only the sectors that mismatched in the non-correcting check. With such a feature, all parity checks could become non-correcting and notify the user if there are mismatches to decide what to do about them. Obviously, if the owner noticed odd log messages pointing to a problematic drive, they may want to reseat cables, etc. and rerun the non-correcting check, rather than apply the corrections to parity.

Quote

August 14, 201510 yr

I very much wish a non-correcting check would log all of the sync errors to a special file, and that those sync errors could be reverified and optionally corrected through a new process that only checks the sectors in that file. In this way, a correcting check doesn't have to reprocess the entire array, but only the sectors that mismatched in the non-correcting check. With such a feature, all parity checks could become non-correcting and notify the user if there are mismatches to decide what to do about them. Obviously, if the owner noticed odd log messages pointing to a problematic drive, they may want to reseat cables, etc. and rerun the non-correcting check, rather than apply the corrections to parity.

Great idea, but in order to fully implement it as written, you would need a non-array drive the size of the parity disk, in order to account for the possibility of every bit being wrong. Or you could specify a max number of errors to log before giving up. I suppose if more than a relative few errors were found, you have more to worry about and probably shouldn't complete the check anyway.

Quote

August 14, 201510 yr

Great idea, but in order to fully implement it as written, you would need a non-array drive the size of the parity disk, in order to account for the possibility of every bit being wrong. Or you could specify a max number of errors to log before giving up. I suppose if more than a relative few errors were found, you have more to worry about and probably shouldn't complete the check anyway.

Agreed. But I was not trying to lay out the entire design.

A lot of details would have to be worked out.

Quote

[SOLVED] Parity Check should not write parity when data disk has a URE

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)