[6.8.0] Extremely slow Parity-Sync/Data-Rebuild


jedimstr

Recommended Posts

After upgrading to 6.8.0, I replaced my parity drives and some of my data drives with 16TB Exos from previous 12TB and 10TB Exos & IronWolfs.  Initial replacement of one parity went fine with full rebuild completing in normal fashion (a little over 1 day).  For the second parity, I saw that I could replace it and one of the data drives at the same time, so went ahead and did that with the pre-cleared 16TB drives.  The parity-sync/data-rebuild started off pretty normal with expected speeds over 150+MBs most of the time until it hit around 36.5% where the rebuild dramatically dropped in speed to between 27/KBs to 44/KBs.  It's been running at that speed for over 2 days now.

 

At first I thought this was somehow related to the 6.8.0 known issues/errata notes that mentioned an issue with slow parity syncs on wide 20+ arrays (I have 23 data drives and 2 parity), but my speeds are much slower than those reported in the bug report for that issue by an order of magnitude.

 

Here's what I'm seeing now and my diagnostics attached.

slow parity.png

holocron-diagnostics-20191215-0604.zip

Edited by jedimstr
Link to comment

Not seeing anything that explains that, and the rebuild looks to be completely stalled now, there are some DMA read issues but seem unrelated.

 

I would try a reboot, if it stalls again try to take notice of the time it happens and post new diags.

 

You can also run the diskspeed docker to check all disks are performing normally.

Link to comment

 

4 hours ago, johnnie.black said:

Not seeing anything that explains that, and the rebuild looks to be completely stalled now, there are some DMA read issues but seem unrelated.

 

I would try a reboot, if it stalls again try to take notice of the time it happens and post new diags.

 

You can also run the diskspeed docker to check all disks are performing normally.

 

Thanks, I rebooted and I see Parity run at better speed now.  Started from scratch and still slower than usual but at least its in the 3 digit MB range.

slow parity 2.png

 

There was also an Ubuntu VM I had running that often accesses a share that's isolated to one of the drives being rebuilt, so just in case that has anything to do with it, I shutdown that VM.

 

 

I'm not the only one seeing this slow to a crawl issue though.  Another user on Reddit posted this: 

 

Edited by jedimstr
Link to comment

To update, I was eventually able to complete the rebuild after a reboot.

 

But then I have more disk replacements to do, so I'm in my second data drive replacement now on 6.8.0 and it slowed to a crawl again after a day.  I rebooted the server again, which of course restarted the rebuild from scratch, but this time I saw slowdowns again down to the dual digit KBs range.  This time I just left it running and eventually it bumped back up to around 45MBs, and a day later up to 96.3MBs... still crazy slow but better than the KB range.  Hope the general slow parity/rebuild issue gets resolved.

Link to comment
1 minute ago, jedimstr said:

I saw slowdowns again down to the dual digit KBs range.  

This almost invariably turns out to be the disk is continually resetting for some reason.   It is often a cabling issue but since it recovered when left alone there my just be a dodgy area on the drive.   It might be worth running an extended SMART test on the drive to see what that reports.

Link to comment
  • 4 weeks later...
On 12/21/2019 at 9:59 AM, itimpi said:

This almost invariably turns out to be the disk is continually resetting for some reason.   It is often a cabling issue but since it recovered when left alone there my just be a dodgy area on the drive.   It might be worth running an extended SMART test on the drive to see what that reports.

That particular drive ended up having multiple read errors and just dying even on a new pre-clear pre-read. Ended up RMA'ing it.

After taking that drive out of the equation, I still get relatively slow parity syncs/rebuilds, but never as slow as with the RMA'd drive.  Slowest now is in the dual digit MBs range (but it goes back up to the high 80's or 90's again).

Link to comment
  • 1 year later...
12 minutes ago, bfeist said:

I'm experiencing this right now, attempting to rebuild to an older drive that I successfully precleared. @jedimstr, are you saying that the drive being rebuilt to might be bad and is causing this?

it could actually be any of your drives that could be dying not just the one you’re rebuilding.  Parity operations use all your array drives and are limited by your slowest drive.  So if any of your drives are dying or have other issues it’ll slow any parity operation like a rebuild. 

Link to comment
  • 5 months later...

Update on this for the general information of anyone reading this thread:

It turned out that one of my drives which wasn't showing as failing at all was getting itself into a very very slow (bytes per second) state when writing files to it. This would appear whenever my mover script decided to write to that drive. No SMART errors appeared and unraid didn't handle the situation at all, everything just slowed to a crawl. I could eventually stop the array and reboot. This would put everything back to normal for possibly weeks--until the mover decided to write to that one drive again. I finally decided to just replace it to see what happens. Since replacing it I have done several data operations across the whole array (upgraded my dual parity drives to 18TB drives) with no issues.

No clue what's wrong with that one drive. I ran a preclear on it just for fun and it worked with no problems and threw no errors and reported no reallocated sectors. No idea why any of this happened but hey.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.