Uncorrectional parity check records mismatch to apply them in a second step


Recommended Posts

In the past bad cables or connectors did create huge garbage during correctional parity checks. I learned a lot. Since years I start with an uncorrectional parity check. After that I do look at the results. If there are just a handful of sync errors I start a correctional parity check in a second step. If there are lots of sync errors, I recheck my hardware first and restart with a second uncorrectional parity check.

 

This does result in two lengthy parity checks if sync errors are detected.

 

My idea is: Let the user start an uncorrectional parity check. During that step the blocks, that did result in sync errors, are logged. At the end the user has the option to apply those blocks or not. Applying the logged bad blocks will start a correctional parity check for those blocks only.

 

Thanks for listening.

 

Link to comment

Great idea in theory, however, where do you keep the list of blocks that needs to be corrected? It can't be on the array, for obvious reasons. Then once you overcome that hurdle, the next obstacle is making the possibly dangerous assumption that the list of blocks is all inclusive up to the address you saved. Do you skip checking everything except what's on the list? In many cases, rogue parity errors aren't consistent, because they aren't real, just a corrupt stick of RAM or CPU or HBA causing havoc.

 

If you had perfect confidence that the parity errors are real and consistent, and you had a spare disk the same size as parity to record the misses, then you could implement it.

 

It's best to do non-correcting checks, and only do a correcting check when you are sure you know you want to commit the writes to parity.

Link to comment

I would store the position (not the content) of the blocks that had sync errors in memory or flash - up to a limit - say 3. This enhancement is not meant for thousands of sync errors. Thousands of sync errors need further investigation.

 

After an uncorrectional parity check with - say - two sync errors all that needs to be done then is to build parity from two blocks for all drives. 18 hours for a correctional parity check would be saved (in my case).

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.