Shrinking the Array: Any drawbacks/risks, doing it via parity check/correct instead of a full parity rebuild?

csb · July 8

I'm thinking of removing a slow and small drive from my significantly larger Array.

The official documentation lists two methods for shrinking the Array:

https://docs.unraid.net/unraid-os/manual/storage-management/

The officially supported method:

"New Config" -> remove drive -> rebuild parity in full

The second, documented, but not officially supported, method:

Zero the to-be-removed drive via an unsupported, no longer maintained and no longer compatible/functional userscript (alternative: manually dd the drive to zero and/or manually fix the broken userscript) -> New Config -> remove the drive -> claim parity is still valid

The first method leaves the Array fully unprotected during the parity rebuild (for multiple days, depending on HDD size and speed), the unsupported method relies on one's skill and self-confidence to zero the correct drive via dd and/or trust in an old and broken userscript and one's ability to correctly fix it.

But wouldn't there be a third possible method as a theoretically-safer-than-the-official-method middle ground?

Proposal: "New Config" -> remove drive -> check "parity already valid" (even though it, partially, isn't) -> check and correct parity.

This would list and correct sync errors until the size of the removed disk has been reached, but beyond that point parity should be valid and safe, right?

That means if I remove e.g. a 4Tb drive from an Array with 12Tb of parity data, the parity would only be in an unsafe/invalid state for the first 4Tb of the check and correction process - which would mean that parity protection already kicks back in after a couple of hours instead of after a couple of days.

Does anyone see an obvious flaw in this method or my understanding of Unraid?

itimpi · July 8

The problem is that finding and correcting parity errors tends to be significantly slower than rebuilding parity so this ‘third’ method could well take longer anyway. The way the check works you always have to run it to completion anyway.

csb · July 8

The total time doesn't really matter, though, does it?

The only thing that would matter is the time it would take to rebuild or rather "correct" the 4Tb of parity data that's actually mismatched vs the full parity rebuild. If the check runs on beyond those 4Tb that's fine and desirable, but it shouldn't find any more errors to correct at that point. Once the 4Tb of parity data have been "corrected", the Array is properly redundant again.

As to the performance difference, it would have to be significantly more than three times slower for method one to be preferable.

In my previous experience, the difference in time for a check and a rebuild has been minimal at best - but, admittedly, a check never had to correct more than a couple of sync errors as a result of a dirty shutdown, so it's certainly possible that correcting that much data would be much slower than rebuilding it.

itimpi · July 9

Building the 4TB of parity will always be faster if you rsync it rather than apply lots (millions) of corrections. Your comments about no changes happening to the parity drive after the 4TB point apply to both methods one and three. Since Unraid will not consider parity valid and mark the array as protected until it has done the whole parity drive in both cases you might as well get the first 4TB done as fast as possible.

csb · July 9

Sorry, I'm struggling to follow that comment.

Quote

Building the 4TB of parity will always be faster if you rsync it rather than apply lots (millions) of corrections.

What should I rsync here? This is about the removal of an empty (but not zeroed) drive, can you explain how I can use rsync to restore parity? Is there any documentation you could point me towards?

Quote

Since Unraid will not consider parity valid and mark the array as protected until it has done the whole parity drive in both cases

But ... that's simply not true, or is it?

That's the whole point of method three: With a full rebuild, parity will be marked as invalid until all 12Tb of parity have been rewritten, with check and correct it will always remain marked valid and will actually be valid after the 4Tb have been corrected. Or is there some upper sync error limit that invalidates parity that I'm not aware of?

What I understand from the documentation so far:

Sync errors explicitly do not invalidate parity.

If the checkbox is checked, sync errors are corrected immediately (even before logging them).

There's an upper limit for logging sync errors (100), so syslog flooding shouldn't be an issue (and there's a decent chance that performance picks up as well).

Parity shouldn't be marked invalid at any point.

Technically, I could cancel the check/correct run after the first 4TB and parity would be valid and Unraid would happily accept it as valid, but it's sensible to finish the entire check.

What am I not seeing here?

Shrinking the Array: Any drawbacks/risks, doing it via parity check/correct instead of a full parity rebuild?

Recommended Posts

csb

Link to comment

itimpi

Link to comment

csb

Link to comment

itimpi

Link to comment

csb

Link to comment

Join the conversation