2nd Parity Procedure

grandprix · January 27, 2017

I've tried desperately to search using the forum search feature, Google with "unraid" as keyword, using the wiki, etc. and still I cannot find "procedures" on adding a second parity (likely just stopping array, assigning the drive, then viola)... but. Should a parity check be performed first? If so, a correcting one? (oddly enough in my search for an answer I found a post from Tom that implied if not explicitly stated we should be doing correcting parity checks -- so I may have been doing things wrong all these years however, that's another topic I think)

Also, what can we expect when a 2nd parity is added? Will it be like adding an initial parity drive/data drive that is simply built from the ground up or?

John_M · January 27, 2017

Yes. It's as simple as assigning a disk to the Parity 2 slot and starting the array then letting it build parity.

You don't need to do a parity check first but it does no harm. Personally, I would let Parity 2 build and then run a parity check - which would check both at the same time.

Again speaking personally, I schedule monthly non-correcting checks so that on the rare occasions errors are found I can investigate the possible cause before correcting them. In practice, even with dual parity, you have to assume that any error is in the parity - which is the point that Tom was making.

Adding Parity 2 is just like adding Parity to an array that never had it. The whole of every data disk is read and the result of the computation written to the Parity 2 disk - it's just a different, more complex, calculation.

grandprix · January 27, 2017

Good stuff John, thank you very much. I too do a monthly parity check and it is non-correcting. It's good to hear that I haven't been doing it wrong all these years. I've only once (knock on wood) had errors on the parity and it was a breakout cable gone awry.

Then I'll just plan to do a 1st of the month parity check, then immediately assign the 2nd parity to the array then. Just to be extra safe/anal. Appreciate the response.

SSD · January 27, 2017

There is no different between a non-correcting and correcting parity check if no parity errors are found. So if you have a solid server and good disks, it is quite likely that you can run parity checks and NEVER have a sync error. That is what you want.

If you do a non-correcting, and there IS a parity error, what are you going to do? If the answer is - I would run it again in correcting mode, then there is no need to ever run a non-correcting check. You're just wasting time. If the answer is - I just replaced a drive and rebuilt it, and I may have jiggled a cable loose, and if it found a parity error I'd like to replace a cable and try again to see if it was real. Well in that case, I'd run a non-correcting.

Other times a non-correcting check are useful, is if you are having issues burning in your server and every parity check is finding sync errors. Running in non-correcting mode can help isolate the problem until you are getting consistent results. Or if you are in a state and not sure if parity is accurate or not. And if parity is not accurate, you'd like to try something else, a non-correcting check is helpful. Or after migrating an array from one server to another (I'd have run a last parity check on the old server to make sure it was good), I would run a non correcting check on the new server, knowing that parity is 100% accurate. If the new server found sync errors I would know that it was a problem on the new server that I had to figure out - and I wouldn't have polluted the parity. If you run a correcting and then get sync errors, and then fix a problem, and run another correcting - it will detect exactly the same sync errors - down to the parity block number. But the first one messed them up, and the second one put them right again. People don't seem to understand that and say - shucks I thought I fxed it but its doing the same thing. Running non-correcting checks helps avoid that confusion.

But for my monthly checks in my steady state array, I go with the correcting variety. If if found a parity error I'd want to know, and track where on the disk they were, but both a non-correcting or correcting will tell me that.

I would never take a parity error lightly. Even if you have only one, and can't attribute it to a dirty shutdown, you've got a problem on your hands.

JorgeB · January 27, 2017

There is no different between a non-correcting and correcting parity check if no parity errors are found. So if you have a solid server and good disks, it is quite likely that you can run parity checks and NEVER have a sync error. That is what you want.

I agree with you in principle but more than once on this forum I've seen parity get corrupted by being incorrectly updated after a disk read error during a correcting check, I did some testing a while ago and I believe I proved it can happen, this is why all my scheduled checks are non correcting.

I also agree that in normal circumstances there should never be a single sync error, I don't remember the last time I got an unexpected sync error, but if there is it will give me the option to decide how to deal with it, based on if it looks like a disk problem or not.

SSD · January 27, 2017

There is no different between a non-correcting and correcting parity check if no parity errors are found. So if you have a solid server and good disks, it is quite likely that you can run parity checks and NEVER have a sync error. That is what you want.

I agree with you in principle but more than once on this forum I've seen parity get corrupted by being incorrectly updated after a disk read error during a correcting check, I did some testing a while ago and I believe I proved it can happen, this is why all my scheduled checks are non correcting.

I also agree that in normal circumstances there should never be a single sync error, I don't remember the last time I got an unexpected sync error, but if there is it will give me the option to decide how to deal with it, based on if it looks like a disk problem or not.

Good point. Thanks for linking that post! You are absolutely right!! If a disk starts sputtering during the parity check, it can and will corrupt parity. May need to rethink my correcting check approach! Now that you think mention it, I think that was one of my own use cases in advocating for that feature back in the day (it was my request and a bunch of community support that triggered that feature). Guess I got amnesia or lulled into a sense of security by my well-behaved array.

BTW, there is another gotcha I don't think people know about while a disk is sputtering. I'll call it the "Disk Kicked Copy Bug". If you are copying a file or set of files to or within an array, and in the middle the target data disk gets kicked from the array, the copy (or move) that is in progress will appear to work properly. However, on the data block that literally triggered the failed write and the kicking of the disk from the array, PARITY IS NOT UPDATED CORRECTLY. UnRAID will kick the disk, and unRAID will continue to update parity only (the disk has been kicked) correctly on all subsequent blocks, but that one block where the kick occurred, parity is wrong. As a user doing the copy, you will have no knowledge that the copy was not successful unless you happen to notice or get informed that a disk dropped from the array. If you were copying a bunch of files, you'll have no way to know which file it happened on without comparing. If you were moving data, you are SOL or going to a backup to compare, because you'd have no way to know which file was messed up. The only way around this is to use something like Teracopy with checksums, or rsync with the verification settings. And I recommend that when moving or copying data for this reason. If you are copying without this safety net, and notice a sudden delay in the copy and then it resumes, maybe faster than before, you might check to see if the target disk got kicked. Or just check the unRAID console after the copy completes before deleting the source files.

I was having a stubborn problem with a disk that would not stay in a test array, and documented this a long while back. I asked and was told that there was no way around this issue. Users should understand that this idea that a dropped disk is not completely transparent, and the idea you can rebuild a perfect copy is not quite true. You loose one block.

(I know dual parity may have resulted in some fundamental changes to parity tracking, and that this may have been fixed. But I believe this is one layer beneath those changes, so doubt this problem was fixed.)

JorgeB · January 27, 2017

BTW, there is another gotcha I don't think people know about while a disk is sputtering. I'll call it the "Disk Kicked Copy Bug".

Funny you should mention that, it happened on one of my servers during the conversion from Reiser to XFS, I posted about it on the 2nd page of the conversion thread, one of the disks redbaled during the move (I was using --remove-source-files), I remembered reading about that bug from a post of yours, thankfully I already had cheksums for all my files so it was easy to find the affected one.

I believe that about a year after that Tom posted about some changes to improve handling those situations and and I did some testing by disconnecting a disk during a copy operation, checked the checksum for the files I was copying on the emulated disk and they were all OK, so maybe this was improved/fixed but since you can't prove a negative it's difficult to say for sure.

IMO these type of situations just reinforce the importance of having cheksums for all files.

2nd Parity Procedure

Recommended Posts

grandprix

Link to comment

John_M

Link to comment

grandprix

Link to comment

SSD

Link to comment

JorgeB

Link to comment

SSD

Link to comment

JorgeB

Link to comment

Archived