Jump to content

Performed Swap-Disable now Parity Sync Errrors


abnersnell

Recommended Posts

I performed the following steps:

 

1.  Performed parity check with zero errors.

2.  Removed 1.5tb data disk and added new 3tb disk.

3.  Assigned new 3tb disk as Parity.

4.  Assigned old parity 2tb in missing slot for 1.5tb drive removed in step 2.

5.  Copy option - copy parity to new 3tb parity drive

6.  Data rebuild for 2tb drive.

7.  Parity check (correcting) slowed at 2tb mark and reported millions of sync errors. Zero errors before 2tb.

 

Smart report no errors all drives.  Syslog attached. 

 

I did not pre-clear new parity disk.

 

Any thoughts as to why I now see sync errors would be greatly appreciated.

 

Unraid 6.0.1

unraid-syslog-20150818-1857.zip

Link to comment

Was a drive listed as missing or disabled at any point? I thought I remembered that you had to force a drive to show disabled by starting the array with it missing before you could start the swap-DISABLE process. Maybe that requirement has been changed in 6.01?

 

Other than the parity sync errors, is there anything else noticeably wrong? Did all the data on the 1.5TB drive get rebuilt successfully on the 2TB?

Link to comment

Thanks for the quick response.

 

I forced the copy option by not assigning anything to the slot of the 1.5tb drive I removed and starting the array.  Stopped the array and assigned the old parity drive to the open slot and was presented the copy option.

 

Everything looks great on the 2tb data disk(former parity).  New parity drive looks great and parity check was humming along until 2tb mark.

Link to comment

Sounds like you may have uncovered a bug. I think someone at limetech should at least look this over to see if there is something that should be handled differently.

 

My speculation is that the new parity drive didn't get the remainder filled with zeroes after the copy process moved the 2TB of parity information to the new drive.

 

Did the correcting parity check start automatically, or did you initiate it?

Link to comment

After the parity check is complete and before you restart the server for any reason, pull another diagnostic and post it.

 

Also, please email limetech and give a brief synopsis of what happened, and include a link to this thread. I'm not comfortable saying everything is ok at this point. It sounds like your data is fine right now, but it doesn't sound like unraid handled something properly with the parity upgrade.

Link to comment

Did you preclear the new disk?
I did not pre-clear new parity disk.
Besides, preclear is not an official limetech thing, it shouldn't be necessary to use it to perform a swap-disable parity upgrade that IS part of the official limetech release.

And there is never a requirement to clear a parity disk anyway, since building parity will overwrite any clearing that may have been done.

 

But in this case, it is necessary for those portions of the new parity disk that are beyond the size of the old parity disk to be written with the correct parity data. I guess the only question is whether that should have taken place as part of the parity-copy portion of the parity-swap, or should it happen after the replacement data drive is installed.

 

And then there is the question of whether the replacement data drive was precleared.

 

However this is implemented, parity is going to have to be written. I guess it's just a matter of whether that happens before the procedure has completed, or happens on the next parity check.

 

I don't know what the answer is but if it needs to happen on the next parity check then that should be documented.

Link to comment

If the OP had not done a correcting parity check, and had subsequently added a 3TB drive to the array, unraid would have cleared the new data drive. Until a correcting check was done, a drive rebuild of the 3TB data drive would show successful completion, but the data beyond the 2TB original parity point would be corrupt because parity would still be wrong even though unraid thought it was correct.

 

If everything is truly as it seems here, there is a bug with the swap-disable process that needs to be squashed. Documenting the need for a correcting parity check after a swap disable is not the correct way of handling this. I suppose a quick and dirty fix by limetech would be to automatically kick off a correcting check after the swap disable is done, but that feels wrong to me. I think the correct fix is that any free space on the new parity drive after the original parity data is copied needs to be zeroed as part of the copy process. I feel sure that limetech's code normally does this, or we would have seen more reports of problems by now, but some circumstance caused this not to happen in this particular case, which is why I think limetech should at least look at it.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...