Al313 Posted October 1, 2020 Share Posted October 1, 2020 A few days ago, I had a power outage. System shut down normally (auto UPS). Yesterday I powered up and disk 10 was missing. It appeared that I had a hard drive failure so I installed a new drive and started data-rebuild. I noticed that the system was running slow 1-5 MB/sec. The new drive I had just installed ran for about 15 minutes then failed and unraid halted. I replaced the problem drive with another new drive and restarted data-rebuild. The system seemed to be running fine but the rebuild speed was still very slow, around 1 to 2 MB/sec. I let it run overnight and is now at 20 hours plus and is still running between 1-2 MB/sec with only 2% progress. At this rate, the system is telling me it will take 2 months to finish data rebuild. This is a simple system, just a bunch of disks. Nothing fancy. The system is running version 6.8.2 and has a total of 22 drives ( including 2 parity drives) and the largest disks are 6 TB. This system has been running well for several years. As I recall, the last time I swapped a 2 TB drive with a 6 TB drive the data rebuild took 26-28 hours or so. I just ran diagnostics (while the data rebuild is active) and they are attached to this post. I looked at the dashboard and all seems normal i.e. cpu usage, memory. The only possible issue I could see is disk 5 has a "thumbs down" for smart status. Could disk 5 be the cause of the slowdown? If this is a possibility, would it be better to stop the current data rebuild of disk 10, restart unraid with disk 10 contents emulated, and replace disk 5 and rebuild that drive before going back to fix the disk 10 issue? I would appreciate suggestions on how best to proceed. Thanks. mserver1-diagnostics-20200930-1939.zip Quote Link to comment
Vr2Io Posted October 1, 2020 Share Posted October 1, 2020 I haven't check diagnostics. If you have dual parity then you can build two disk in same time, you can run disk speed test to identify disk5 ( or all disk ) does abnormal slow. Quote Link to comment
JorgeB Posted October 1, 2020 Share Posted October 1, 2020 Constant ATA errors on disk10, looks more like a cable problem, check/replace cables and try again. Quote Link to comment
Al313 Posted October 2, 2020 Author Share Posted October 2, 2020 The problem is at the server drive backplane. I did a workaround and solved the connection problem. My data rebuild is now running at 57-58 MB/sec. Thanks to JorgeB for pointing me in the right direction. I thought I would post the steps I took to narrow down the cause of my problem in case this would be helpful to someone else. Per JorgeB's suggestion, I first swapped the cable for a new one. Problem persisted. Next I removed and reseated the controller and rechecked the connections. Problem persisted. I have three 8 port controllers, and only have 22 drives populated (24 drive server with a hot swap backplane). So next I routed the drive to a different controller that had two open ports. Problem persisted. At that point, I figured the problem could be at the backplane connection. So I decided to place the drive at one of the open slots but first, reconnect the drive to the original controller and original port. Problem Solved! I haven't looked for the specific cause of my intermittent at the backplane connection yet. I'll wait until my data rebuild is complete before I do that. Anyway, that's how I got to the bottom of this problem and I hope this may help someone else. 1 Quote Link to comment
bfeist Posted December 29, 2021 Share Posted December 29, 2021 How did you determine which drive was the problem, or are you talking about the drive that was being rebuilt? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.