jedimstr Posted December 15, 2019 Share Posted December 15, 2019 (edited) After upgrading to 6.8.0, I replaced my parity drives and some of my data drives with 16TB Exos from previous 12TB and 10TB Exos & IronWolfs. Initial replacement of one parity went fine with full rebuild completing in normal fashion (a little over 1 day). For the second parity, I saw that I could replace it and one of the data drives at the same time, so went ahead and did that with the pre-cleared 16TB drives. The parity-sync/data-rebuild started off pretty normal with expected speeds over 150+MBs most of the time until it hit around 36.5% where the rebuild dramatically dropped in speed to between 27/KBs to 44/KBs. It's been running at that speed for over 2 days now. At first I thought this was somehow related to the 6.8.0 known issues/errata notes that mentioned an issue with slow parity syncs on wide 20+ arrays (I have 23 data drives and 2 parity), but my speeds are much slower than those reported in the bug report for that issue by an order of magnitude. Here's what I'm seeing now and my diagnostics attached. holocron-diagnostics-20191215-0604.zip Edited December 15, 2019 by jedimstr Quote Link to comment
JorgeB Posted December 15, 2019 Share Posted December 15, 2019 Not seeing anything that explains that, and the rebuild looks to be completely stalled now, there are some DMA read issues but seem unrelated. I would try a reboot, if it stalls again try to take notice of the time it happens and post new diags. You can also run the diskspeed docker to check all disks are performing normally. Quote Link to comment
jedimstr Posted December 15, 2019 Author Share Posted December 15, 2019 (edited) 4 hours ago, johnnie.black said: Not seeing anything that explains that, and the rebuild looks to be completely stalled now, there are some DMA read issues but seem unrelated. I would try a reboot, if it stalls again try to take notice of the time it happens and post new diags. You can also run the diskspeed docker to check all disks are performing normally. Thanks, I rebooted and I see Parity run at better speed now. Started from scratch and still slower than usual but at least its in the 3 digit MB range. There was also an Ubuntu VM I had running that often accesses a share that's isolated to one of the drives being rebuilt, so just in case that has anything to do with it, I shutdown that VM. I'm not the only one seeing this slow to a crawl issue though. Another user on Reddit posted this: Edited December 15, 2019 by jedimstr Quote Link to comment
jedimstr Posted December 21, 2019 Author Share Posted December 21, 2019 To update, I was eventually able to complete the rebuild after a reboot. But then I have more disk replacements to do, so I'm in my second data drive replacement now on 6.8.0 and it slowed to a crawl again after a day. I rebooted the server again, which of course restarted the rebuild from scratch, but this time I saw slowdowns again down to the dual digit KBs range. This time I just left it running and eventually it bumped back up to around 45MBs, and a day later up to 96.3MBs... still crazy slow but better than the KB range. Hope the general slow parity/rebuild issue gets resolved. Quote Link to comment
itimpi Posted December 21, 2019 Share Posted December 21, 2019 1 minute ago, jedimstr said: I saw slowdowns again down to the dual digit KBs range. This almost invariably turns out to be the disk is continually resetting for some reason. It is often a cabling issue but since it recovered when left alone there my just be a dodgy area on the drive. It might be worth running an extended SMART test on the drive to see what that reports. Quote Link to comment
jedimstr Posted January 14, 2020 Author Share Posted January 14, 2020 On 12/21/2019 at 9:59 AM, itimpi said: This almost invariably turns out to be the disk is continually resetting for some reason. It is often a cabling issue but since it recovered when left alone there my just be a dodgy area on the drive. It might be worth running an extended SMART test on the drive to see what that reports. That particular drive ended up having multiple read errors and just dying even on a new pre-clear pre-read. Ended up RMA'ing it. After taking that drive out of the equation, I still get relatively slow parity syncs/rebuilds, but never as slow as with the RMA'd drive. Slowest now is in the dual digit MBs range (but it goes back up to the high 80's or 90's again). Quote Link to comment
bfeist Posted December 29, 2021 Share Posted December 29, 2021 I'm experiencing this right now, attempting to rebuild to an older drive that I successfully precleared. @jedimstr, are you saying that the drive being rebuilt to might be bad and is causing this? Quote Link to comment
jedimstr Posted December 29, 2021 Author Share Posted December 29, 2021 12 minutes ago, bfeist said: I'm experiencing this right now, attempting to rebuild to an older drive that I successfully precleared. @jedimstr, are you saying that the drive being rebuilt to might be bad and is causing this? it could actually be any of your drives that could be dying not just the one you’re rebuilding. Parity operations use all your array drives and are limited by your slowest drive. So if any of your drives are dying or have other issues it’ll slow any parity operation like a rebuild. Quote Link to comment
bfeist Posted December 29, 2021 Share Posted December 29, 2021 That's what I was worried about. This is very unusual. I just did a parity verification a week or so ago and it ran at full speed with no errors. Quote Link to comment
bfeist Posted May 31, 2022 Share Posted May 31, 2022 Update on this for the general information of anyone reading this thread: It turned out that one of my drives which wasn't showing as failing at all was getting itself into a very very slow (bytes per second) state when writing files to it. This would appear whenever my mover script decided to write to that drive. No SMART errors appeared and unraid didn't handle the situation at all, everything just slowed to a crawl. I could eventually stop the array and reboot. This would put everything back to normal for possibly weeks--until the mover decided to write to that one drive again. I finally decided to just replace it to see what happens. Since replacing it I have done several data operations across the whole array (upgraded my dual parity drives to 18TB drives) with no issues. No clue what's wrong with that one drive. I ran a preclear on it just for fun and it worked with no problems and threw no errors and reported no reallocated sectors. No idea why any of this happened but hey. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.