hawihoney Posted August 23, 2018 Share Posted August 23, 2018 In the past bad cables or connectors did create huge garbage during correctional parity checks. I learned a lot. Since years I start with an uncorrectional parity check. After that I do look at the results. If there are just a handful of sync errors I start a correctional parity check in a second step. If there are lots of sync errors, I recheck my hardware first and restart with a second uncorrectional parity check. This does result in two lengthy parity checks if sync errors are detected. My idea is: Let the user start an uncorrectional parity check. During that step the blocks, that did result in sync errors, are logged. At the end the user has the option to apply those blocks or not. Applying the logged bad blocks will start a correctional parity check for those blocks only. Thanks for listening. Quote Link to comment
JonathanM Posted August 23, 2018 Share Posted August 23, 2018 Great idea in theory, however, where do you keep the list of blocks that needs to be corrected? It can't be on the array, for obvious reasons. Then once you overcome that hurdle, the next obstacle is making the possibly dangerous assumption that the list of blocks is all inclusive up to the address you saved. Do you skip checking everything except what's on the list? In many cases, rogue parity errors aren't consistent, because they aren't real, just a corrupt stick of RAM or CPU or HBA causing havoc. If you had perfect confidence that the parity errors are real and consistent, and you had a spare disk the same size as parity to record the misses, then you could implement it. It's best to do non-correcting checks, and only do a correcting check when you are sure you know you want to commit the writes to parity. Quote Link to comment
hawihoney Posted August 23, 2018 Author Share Posted August 23, 2018 I would store the position (not the content) of the blocks that had sync errors in memory or flash - up to a limit - say 3. This enhancement is not meant for thousands of sync errors. Thousands of sync errors need further investigation. After an uncorrectional parity check with - say - two sync errors all that needs to be done then is to build parity from two blocks for all drives. 18 hours for a correctional parity check would be saved (in my case). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.