Weird Behavior with Parity Swap


lishpy

Recommended Posts

I initiated a parity swap yesterday following this procedure: https://docs.unraid.net/legacy/FAQ/parity-swap-procedure/

 

I checked on the progress this morning and it was at 96% for the copy procedure, so I expected things to be done shortly there after and initiate the disk rebuild.

 

I just checked progress in the array again, and the array operation page is now back to where I started, saying I can initiate the copy again. According to the documentation, that COPY button should be a START button to start the array and initialize the data rebuild.

 

What's even more confusing, checking my server notifications, about the time I expected the copy function to complete, I have a notification saying the following:

 

"Disk 4, is being reconstructed and is available for normal operation"

 

Disk 4 is the old parity drive that is being turned into a data disk.

 

I can't see any indication that disk 4 is actually being reconstructed though, I have no ability to start the array, and the array devices page shows the new parity drive and new disk as new devices.

 

Did this somehow not complete successfully and I need to run the copy again? I don't see any indication of any failures and the notification message this morning makes me think it completed successfully, but now I don't know what state the array is in. I don't see anything in the logs out of the ordinary right now, the disks are spun down.

 

Log says the copy completed successfully, but below says slot 0 is wrong import.

 

Sep 24 12:29:29 Tower emhttpd: copy: disk4 to disk0 completed
Sep 24 12:29:55 Tower kernel: md: unRAID driver removed
Sep 24 12:29:55 Tower emhttpd: shcmd (39876): /sbin/modprobe md-mod super=/boot/config/super.dat
Sep 24 12:29:55 Tower kernel: md: unRAID driver 2.9.27 installed

 

Sep 24 12:29:55 Tower kernel: mdcmd (1): import 0 sdf 64 13672382412 0 WDC_WUH721414ALE6L1_Y6G3NGSC
Sep 24 12:29:55 Tower kernel: md: import disk0: (sdf) WDC_WUH721414ALE6L1_Y6G3NGSC size: 13672382412
Sep 24 12:29:55 Tower kernel: md: import_slot: 0 wrong
Sep 24 12:29:55 Tower kernel: mdcmd (2): import 1 sdd 64 7814026532 0 WDC_WD80EDAZ-11TA3A0_VGGYLR9G
Sep 24 12:29:55 Tower kernel: md: import disk1: (sdd) WDC_WD80EDAZ-11TA3A0_VGGYLR9G size: 7814026532
Sep 24 12:29:55 Tower kernel: mdcmd (3): import 2 sdc 64 7814026532 0 WDC_WD80EFAX-68LHPN0_7SGLHX6C
Sep 24 12:29:55 Tower kernel: md: import disk2: (sdc) WDC_WD80EFAX-68LHPN0_7SGLHX6C size: 7814026532
Sep 24 12:29:55 Tower kernel: mdcmd (4): import 3 sdh 64 3907018532 0 ST4000DM000-1F2168_Z304H4Z2
Sep 24 12:29:55 Tower kernel: md: import disk3: (sdh) ST4000DM000-1F2168_Z304H4Z2 size: 3907018532
Sep 24 12:30:05 Tower kernel: mdcmd (5): import 4 sdg 64 7814026532 0 WDC_WD80EFAX-68LHPN0_7SGLWD9C
Sep 24 12:30:05 Tower kernel: md: import disk4: (sdg) WDC_WD80EFAX-68LHPN0_7SGLWD9C size: 7814026532
Sep 24 12:30:05 Tower kernel: md: import_slot: 4 replaced

 

Also probably unrelated, but if you hover over the orange triangle on the array operation page, it says "Started, array unprotected" but on the dashboard I see all disks as offline.

 

Looking for next steps to help troubleshoot.

 

Array operation: https://imgur.com/id9jLUc

Array disks: https://imgur.com/hukGTbI

Link to comment
1 hour ago, dboonthego said:

Do another copy and it will probably resolve itself.  I couldn't reproduce the issue for someone else.  Did you power down in step5?  I don't think it matters, but wondering if you did.


I just kicked off a new copy so we'll see, I'll report back.
 

I did power down at Step 5 because I pulled the data drive I'm replacing and put the new parity in it's drive bay.

Link to comment
11 hours ago, itimpi said:

That could explain your issue.    The parity swap procedure is meant to run to completion without the system being shutdown or rebooted.   Maybe this needs to be clarified in the instructions?

I ran through a test yesterday and also shutdown simply because I followed the steps.  Difference for me was I already had an unassigned disk larger than parity in the system and didn't physically alter the hardware.  I had no issue performing the parity swap/data rebuild.

Link to comment

The copy completed and now I was given the ability to start the data rebuild.
 

The SOP should be updated to reflect the reboot in step 5 being an issue if that's truly the cause. Since it's before the copy procedure is initiated it's still not clear that's the issue.

 

Otherwise there's a bug here considering the two threads that followed the SOP to a T and failed in the same way only to be successful on the redo.

Link to comment
2 hours ago, lishpy said:

The SOP should be updated to reflect the reboot in step 5 being an issue if that's truly the cause. Since it's before the copy procedure is initiated it's still not clear that's the issue.

I followed the same steps and did not experience the same behavior you and the other guy did.  When I shut down in step5, I simply powered back up.  I didn't physically change any disks as I already had them connected. 

 

Most people don't have a warm spare ready which is probably why it's written to shutdown.  Not sure what caused this, but I highly doubt it's related to step5.

Edited by dboonthego
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.