August 14, 201510 yr The more I think about this, the more I realise this is actually a very serious bug. During any "Write Corrections" Parity Check (excluding the initial build parity) should a data disk produce a URE, the parity bit should not be altered corresponding to that position, the reasons are very simple, (a) you can't calculate parity anyway and (b) parity is probably already correct. If you overwrite it, there is a 50% chance of getting it wrong, this is compounded with multiple URE's and good parity is inevitably destroyed along with the chance of recovering the disk with the URE's.
August 14, 201510 yr I think ... you are confusing a correcting Parity Check, which assumes data is correct and parity is wrong with a URE triggered red-ball followed by a Data Rebuild, which assumes parity is correct and data is wrong.
August 14, 201510 yr Author No, I am meaning a correcting parity check It is not possible to calculate parity with a data disk URE during a correcting parity check. Therefore, changing the bit on the parity drive corresponding to that position is not a valid thing to do. It will destroy parity. As soon as you hit a URE on a data drive you can no longer assume data is correct and parity is wrong.
August 14, 201510 yr I don't think your statement of what happens is correct. I believe that Tom has stated that what happens is the block from parity and all other drives is used to calculate the correct value for the drive that cannot be read, and the calculated block is then written to the block in error, to overwrite it and attempt to correct it.
August 14, 201510 yr A 'parity check' reads all devices in the array. If no URE's (unrecoverable read errors), then Parity and all Data are xor'ed together. If the result is not equal "all zeros", then parity is re-calculated by xor'ing the Data again, and written to the parity disk (provided "Write corrections to parity disk" checkbox is checked). If any single disk reports URE then of course we don't have valid data for that disk. In this case we reconstruct the missing data by xor'ing parity with all 'other' data disks, result is then written to the disk which got the URE (regardless of whether "Write corrections to parity disk" is checked or not). If any write fails this will disable the device. This will be the same behavior with P+Q parity, only now there is both P and Q to check, and there can be one or two devices written.
August 14, 201510 yr I don't think your statement of what happens is correct. I believe that Tom has stated that what happens is the block from parity and all other drives is used to calculate the correct value for the drive that cannot be read, and the calculated block is then written to the block in error, to overwrite it and attempt to correct it. Yes that is exactly right.
August 14, 201510 yr Author Excellent, so is there any reason to choose a non-correcting parity check over a correcting parity check?
August 14, 201510 yr Excellent, so is there any reason to choose a non-correcting parity check over a correcting parity check? I have seen some (garycase) advocate a non-correcting parity check immediately after a data disk rebuild since the point in that case is to test the rebuild rather than testing parity.
August 14, 201510 yr Excellent, so is there any reason to choose a non-correcting parity check over a correcting parity check? After you rebuild a disk and check parity would be the only time it would make a difference. I'm a control freak I guess I want to control when things are written so even though I end up doing the correcting parity check immediately after doing a non-correcting one. I still do ONLY non-correcting checks unless an error happens.
August 14, 201510 yr I'm a control freak I guess I want to control when things are written This. I would like another maintenance mode option where the entire array including parity was mounted read only for those times when I want to recover data without the possibility of accidentally overwriting something.
August 14, 201510 yr I'm a control freak I guess I want to control when things are written This. I would like another maintenance mode option where the entire array including parity was mounted read only for those times when I want to recover data without the possibility of accidentally overwriting something. Like this myself.
August 14, 201510 yr Author Since I got thousands of URE's on a disk during a correcting parity check, I have always run non-correcting parity checks. The problem was a faulty cable, I do find it odd that with a faulty cable I would have got thousands of URE's but no write errors as, presumably the URE was attempted to be corrected and I would have expected to see the disk red balled and the parity check cease. With the behaviour confirmed by limetech, I am struggling to see any valid logical reason to choose a non-correcting parity check (other than satisfying control freak tendencies)
August 14, 201510 yr Since I got thousands of URE's on a disk during a correcting parity check, I have always run non-correcting parity checks. The problem was a faulty cable, I do find it odd that with a faulty cable I would have got thousands of URE's but no write errors as, presumably the URE was attempted to be corrected and I would have expected to see the disk red balled and the parity check cease. With the behaviour confirmed by limetech, I am struggling to see any valid logical reason to choose a non-correcting parity check (other than satisfying control freak tendencies) I very much wish a non-correcting check would log all of the sync errors to a special file, and that those sync errors could be reverified and optionally corrected through a new process that only checks the sectors in that file. In this way, a correcting check doesn't have to reprocess the entire array, but only the sectors that mismatched in the non-correcting check. With such a feature, all parity checks could become non-correcting and notify the user if there are mismatches to decide what to do about them. Obviously, if the owner noticed odd log messages pointing to a problematic drive, they may want to reseat cables, etc. and rerun the non-correcting check, rather than apply the corrections to parity.
August 14, 201510 yr I very much wish a non-correcting check would log all of the sync errors to a special file, and that those sync errors could be reverified and optionally corrected through a new process that only checks the sectors in that file. In this way, a correcting check doesn't have to reprocess the entire array, but only the sectors that mismatched in the non-correcting check. With such a feature, all parity checks could become non-correcting and notify the user if there are mismatches to decide what to do about them. Obviously, if the owner noticed odd log messages pointing to a problematic drive, they may want to reseat cables, etc. and rerun the non-correcting check, rather than apply the corrections to parity. Great idea, but in order to fully implement it as written, you would need a non-array drive the size of the parity disk, in order to account for the possibility of every bit being wrong. Or you could specify a max number of errors to log before giving up. I suppose if more than a relative few errors were found, you have more to worry about and probably shouldn't complete the check anyway.
August 14, 201510 yr Great idea, but in order to fully implement it as written, you would need a non-array drive the size of the parity disk, in order to account for the possibility of every bit being wrong. Or you could specify a max number of errors to log before giving up. I suppose if more than a relative few errors were found, you have more to worry about and probably shouldn't complete the check anyway. Agreed. But I was not trying to lay out the entire design. A lot of details would have to be worked out.
Archived
This topic is now archived and is closed to further replies.