[SOLVED] Data Rebuild at ~1.5 MBps


Recommended Posts

A couple of days ago I noticed that one of my disks redballed. I figured it was likely bad because of errors that I'd seen and decided to replace it. The drive that I replaced was an Hitachi 2TB and I replaced it with a HGST 4TB Coolspin. And proceeded to rebuild the drive. I'm only seeing between 1 and 2 MB per second rebuild speed. At this rate, it will take about 25 - 30 days to rebuild the drive. That doesn't seem normal.

 

Here is my system info:

ASUS M4A785-M Motherboard
AMD Sempron 145 CPU
1GB DDR2 RAM
Corsair CX500 PSU
UnRAID v5.0.6
Parity: Seagate ST4000DM000 4TB
Cache: None
Array: 10 2TB WD Green, 1 2TB Hitachi, 2 4TB HGST Coolspin (including the new one that's being rebuilt)

 

The new drive being rebuilt is connected to the motherboard. I checked the data and power connections to the drives when I replaced the drive and they seemed OK.

 

I'm seeing this repeatedly in the log:

/usr/bin/tail -f /var/log/syslog
Mar 2 08:57:22 Tower kernel: ata2.00: configured for UDMA/33
Mar 2 08:57:22 Tower kernel: ata2: EH complete
Mar 2 08:57:22 Tower kernel: ata2.00: exception Emask 0x50 SAct 0x0 SErr 0x90a00 action 0xe frozen
Mar 2 08:57:22 Tower kernel: ata2.00: irq_stat 0x01400000, PHY RDY changed
Mar 2 08:57:22 Tower kernel: ata2: SError: { Persist HostInt PHYRdyChg 10B8B }
Mar 2 08:57:22 Tower kernel: ata2.00: failed command: READ DMA EXT
Mar 2 08:57:22 Tower kernel: ata2.00: cmd 25/00:00:60:10:d8/00:04:1b:00:00/e0 tag 0 dma 524288 in
Mar 2 08:57:22 Tower kernel: res 50/00:00:5f:10:d8/00:00:1b:00:00/e0 Emask 0x50 (ATA bus error)
Mar 2 08:57:22 Tower kernel: ata2.00: status: { DRDY }
Mar 2 08:57:22 Tower kernel: ata2: hard resetting link
Mar 2 08:57:29 Tower kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 2 08:57:29 Tower kernel: ata2.00: configured for UDMA/33
Mar 2 08:57:29 Tower kernel: ata2: EH complete
Mar 2 08:57:30 Tower kernel: ata2.00: exception Emask 0x50 SAct 0x0 SErr 0x90a00 action 0xe frozen
Mar 2 08:57:30 Tower kernel: ata2.00: irq_stat 0x01400000, PHY RDY changed
Mar 2 08:57:30 Tower kernel: ata2: SError: { Persist HostInt PHYRdyChg 10B8B }
Mar 2 08:57:30 Tower kernel: ata2.00: failed command: READ DMA EXT
Mar 2 08:57:30 Tower kernel: ata2.00: cmd 25/00:00:60:50:d8/00:04:1b:00:00/e0 tag 0 dma 524288 in
Mar 2 08:57:30 Tower kernel: res 50/00:00:5f:50:d8/00:00:1b:00:00/e0 Emask 0x50 (ATA bus error)
Mar 2 08:57:30 Tower kernel: ata2.00: status: { DRDY }
Mar 2 08:57:30 Tower kernel: ata2: hard resetting link

 

Perhaps an issue with the data connection? Any ideas on what to check or how to fix this?

syslog-2017-03-02.txt

Edited by bw1
solved
Link to comment
9 minutes ago, bw1 said:

Perhaps an issue with the data connection?

Probably. Your syslog has rotated. The one you posted is just full of those errors. Without more I can't even tell which disk it is referring to. Can you get us the older logs? They are in /var/log.

 

Once you get this straight you really should consider upgrading. It is difficult to support this old version since most haven't used it in years.

Link to comment
1 minute ago, trurl said:

Probably. Your syslog has rotated. The one you posted is just full of those errors. Without more I can't even tell which disk it is referring to. Can you get us the older logs? They are in /var/log.

 

Once you get this straight you really should consider upgrading. It is difficult to support this old version since most haven't used it in years.

 

I attached the one that I created when I first started the rebuild. Hope that helps.

 

I haven't checked here in a while. The 6.x version that I downloaded and have actually used on a test server was still beta. I'll definitely be looking into upgrading.

syslog-2017-02-28.txt

Link to comment
8 minutes ago, trurl said:

According to that older syslog, ata2 is disk2, but I don't think we can trust that since I assume you rebooted since that log was taken. Have you looked in /var/log for other logs?

 

Did you test the new disk with preclear or anything?

 

Yes, the new disk was precleared 3 times and it's been in storage for a while (about 3 years ago). I assume the disk won't go bad in storage.

 

 

syslog.1

syslog.2

Link to comment
30 minutes ago, bw1 said:

Well those attached files don't look very readable! And I didn't zip them. :(

I could use them OK. Looks like ata2 is disk2, but the disk you are rebuilding is disk3. Stop, shutdown and recheck the connections. The disk3 rebuild isn't going to be good if disk2 can't be read reliably.

Link to comment
43 minutes ago, trurl said:

I could use them OK. Looks like ata2 is disk2, but the disk you are rebuilding is disk3. Stop, shutdown and recheck the connections. The disk3 rebuild isn't going to be good if disk2 can't be read reliably.

 

OK, I cancelled the rebuild, stopped, shutdown and the connections looked fine, but I pulled the first 3 data connectors from the SS-500 5-in-3 enclosure and reconnected them.

 

Now I'm only getting 200-300 KB/s, so I think I made it worse.

syslog-2017-03-02-2.zip

Link to comment

Now disk2 and disk4 are resetting connections, so it is worse. Make sure both SATA and power connections are good at both ends. SATA connections should be square on the connector. If you have bundled your cables you may be putting stress on the connection.

Link to comment
56 minutes ago, trurl said:

Now disk2 and disk4 are resetting connections, so it is worse. Make sure both SATA and power connections are good at both ends. SATA connections should be square on the connector. If you have bundled your cables you may be putting stress on the connection.

 

Thanks for your help.

 

I'll have to check the connections again, but I'll have to do that later. I do have another power supply that I can try and I also have a motherboard, if I need to swap that out. I'll have to check and see if I have more SATA cables.

 

BTW, when I went to shut down, I still had Windows File Explorer connected to the flash share and I was getting errors unmounting the drives. I had shutdown my desktop computer that was previously connected and restart that and then reconnect the browser to the Tower and then I noticed the Parity drive was missing. So I definitely have some kind of connection problem.

Link to comment

Those errors are symptomatic of a loose connection, drive disappearing then reappearing, with line corruption.  That could be loose connectors or bad power, and bad power is my best guess.  It's possible your power supply is failing, or there are too many drives on this power rail.

Link to comment
2 hours ago, RobJ said:

Those errors are symptomatic of a loose connection, drive disappearing then reappearing, with line corruption.  That could be loose connectors or bad power, and bad power is my best guess.  It's possible your power supply is failing, or there are too many drives on this power rail.

 

I thought the CX500 was good for 15+ drives. I only have 14 and they're low power drives. But thanks that will be one thing I will check since I have another PSU available that has a higher output.

Link to comment
11 minutes ago, bw1 said:

 

I thought the CX500 was good for 15+ drives. I only have 14 and they're low power drives. But thanks that will be one thing I will check since I have another PSU available that has a higher output.

 

If that is a Corsair CX500, the Corsair CX series of power supply do NOT have a good reputation!  The advice has generally been to buy any Corsair power supply but the CX series.  The higher end Corsairs are decent.  I'm almost shocked that you have been able to run very long with 14 drives.  Bad power supplies fail quicker, and often fail to maintain correct voltages under load.  You might try a PSU tester, they're fairly inexpensive.

Link to comment
On Thursday, March 02, 2017 at 2:29 PM, RobJ said:

 

If that is a Corsair CX500, the Corsair CX series of power supply do NOT have a good reputation!  The advice has generally been to buy any Corsair power supply but the CX series.  The higher end Corsairs are decent.  I'm almost shocked that you have been able to run very long with 14 drives.  Bad power supplies fail quicker, and often fail to maintain correct voltages under load.  You might try a PSU tester, they're fairly inexpensive.

 

Yes, it is the Corsair CX500. Like I said when I selected it, it was one of the recommended drives here for up to 15 drives. But maybe it has gone bad.

 

I swapped the PSU out for a Seasonic X-650 and that seems to have fixed it:

 

Data-Rebuild in progress.
Total size:     4     TB
Current position:     326.82     GB (8%)
Estimated speed:     100.22     MB/sec
Estimated finish:     611     minutes

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.