Skip to content

View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS

Tap the Share icon in Safari
Scroll the menu and tap Add to Home Screen.
Tap Add in the top-right corner.

To install this app on Android

Tap the 3-dot menu (⋮) in the top-right corner of the browser.
Tap Add to Home screen or Install app.
Confirm by tapping Install.

Unraid Unleash Your Hardware

Unraid Summer Sale in live: 20% off Starter and Unleashed + buy one, get one 50% off

Unraid

Unraid Unleash Your Hardware

Trouble rebuilding data

August 18, 201015 yr

I'm currently trying to rebuild data onto a new hard drive. Twice now it has started out fine, but eventually resulted in "kernel panic" messages saying it was out of memory with no killable processes. Both times I couldn't access the syslog. Started rebuild again and have been paying closer attention to it. The rebuild started automatically and was originally transfering at around 17,000 Kb, but has now slowed to 1,700 Kb. At this rate it will take about ten days to finish. I opened the syslog and see that there is definitely an issue:

Aug 4 21:35:38 Tower emhttp: shcmd (11): killall -HUP smbd

Aug 4 21:35:38 Tower emhttp: shcmd (12): /etc/rc.d/rc.nfsd restart >/dev/null

Aug 4 21:48:00 Tower kernel: hda: dma_timer_expiry: dma status == 0x20

Aug 4 21:48:00 Tower kernel: hda: DMA timeout retry

Aug 4 21:48:00 Tower kernel: hda: timeout waiting for DMA

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: error=0x04 { DriveStatusError }

Aug 4 21:48:00 Tower kernel: ide: failed opcode was: unknown

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: error=0x04 { DriveStatusError }

Aug 4 21:48:00 Tower kernel: ide: failed opcode was: unknown

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: error=0x04 { DriveStatusError }

Aug 4 21:48:00 Tower kernel: ide: failed opcode was: unknown

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: error=0x04 { DriveStatusError }

Aug 4 21:48:00 Tower kernel: ide: failed opcode was: unknown

Aug 4 21:48:01 Tower kernel: ide0: reset: success

Aug 4 21:48:29 Tower kernel: hda: dma_timer_expiry: dma status == 0x20

Aug 4 21:48:29 Tower kernel: hda: DMA timeout retry

Aug 4 21:48:29 Tower kernel: hda: timeout waiting for DMA

Aug 4 21:48:57 Tower kernel: hda: dma_timer_expiry: dma status == 0x20

Aug 4 21:48:57 Tower kernel: hda: DMA timeout retry

Aug 4 21:48:57 Tower kernel: hda: timeout waiting for DMA

Aug 4 21:49:23 Tower kernel: hda: dma_timer_expiry: dma status == 0x20

Aug 4 21:49:23 Tower kernel: hda: DMA timeout retry

Aug 4 21:49:23 Tower kernel: hda: timeout waiting for DMA

Aug 4 21:50:00 Tower kernel: hda: lost interrupt

Aug 4 21:51:00 Tower last message repeated 2 times

Aug 4 21:54:41 Tower last message repeated 2 times

Aug 4 22:06:57 Tower in.telnetd[1350]: connect from 192.168.1.2 (192.168.1.2)

Aug 4 22:07:02 Tower login[1351]: ROOT LOGIN on `pts/0' from `192.168.1.2'

Any ideas? Full syslog attached. Thanks in advance.

August 18, 201015 yr

You best back up a bit and fill us in on what you are doing, and why.

From your syslog it appears as if you are trying to replace disk2. (/dev/sdc)

From your syslog, you have hardware problems where the OS keeps resetting their disk controllers in an attempt to talk to the disks.

Those messages mostly seem to involve /dev/hda (disk4)

Aug 4 21:48:00 Tower kernel: hda: dma_timer_expiry: dma status == 0x20

Aug 4 21:48:00 Tower kernel: hda: DMA timeout retry

Aug 4 21:48:00 Tower kernel: hda: timeout waiting for DMA

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: error=0x04 { DriveStatusError }

Aug 4 21:48:00 Tower kernel: ide: failed opcode was: unknown

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: error=0x04 { DriveStatusError }

Aug 4 21:48:00 Tower kernel: ide: failed opcode was: unknown

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: error=0x04 { DriveStatusError }

Aug 4 21:48:00 Tower kernel: ide: failed opcode was: unknown

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

Aug 4 21:48:00 Tower kernel: hda: task_in_intr: error=0x04 { DriveStatusError }

Aug 4 21:48:00 Tower kernel: ide: failed opcode was: unknown

Aug 4 21:48:01 Tower kernel: ide0: reset: success

Aug 4 21:48:29 Tower kernel: hda: dma_timer_expiry: dma status == 0x20

Aug 4 21:48:29 Tower kernel: hda: DMA timeout retry

Aug 4 21:48:29 Tower kernel: hda: timeout waiting for DMA

Aug 4 21:48:57 Tower kernel: hda: dma_timer_expiry: dma status == 0x20

Aug 4 21:48:57 Tower kernel: hda: DMA timeout retry

Aug 4 21:48:57 Tower kernel: hda: timeout waiting for DMA

Aug 4 21:49:23 Tower kernel: hda: dma_timer_expiry: dma status == 0x20

Aug 4 21:49:23 Tower kernel: hda: DMA timeout retry

Aug 4 21:49:23 Tower kernel: hda: timeout waiting for DMA

Aug 4 21:50:00 Tower kernel: hda: lost interrupt

When enough of these messages are written to the syslog, you'll use up all available RAM and crash, exactly as you have experienced.

Basically, disk 4 is not working. It might be bad, it might be a bad cable, it could be a bad power connection or splitter, or even a marginal power supply.

When initially booting, almost all your disks report hardware problems:

Aug 4 21:35:28 Tower kernel: ata1: softreset failed (device not ready)

Aug 4 21:35:28 Tower kernel: ata1: failed due to HW bug, retry pmp=0

Aug 4 21:35:28 Tower kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Aug 4 21:35:28 Tower kernel: ata1.00: HPA detected: current 2930277168, native 18446744072344861488

Aug 4 21:35:28 Tower kernel: ata1.00: ATA-7: SAMSUNG HD154UI, 1AG01118, max UDMA7

Aug 4 21:35:28 Tower kernel: ata1.00: 2930277168 sectors, multi 16: LBA48 NCQ (depth 31/32)

Aug 4 21:35:28 Tower kernel: ata1.00: configured for UDMA/133

Aug 4 21:35:28 Tower kernel: ata2: softreset failed (device not ready)

Aug 4 21:35:28 Tower kernel: ata2: failed due to HW bug, retry pmp=0

Aug 4 21:35:28 Tower kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Aug 4 21:35:28 Tower kernel: ata2.00: ATA-8: WDC WD10EADS-65L5B1, 01.01A01, max UDMA/133

Aug 4 21:35:28 Tower kernel: ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)

Aug 4 21:35:28 Tower kernel: ata2.00: configured for UDMA/133

Aug 4 21:35:28 Tower kernel: ata3: softreset failed (device not ready)

Aug 4 21:35:28 Tower kernel: ata3: failed due to HW bug, retry pmp=0

Aug 4 21:35:28 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Aug 4 21:35:28 Tower kernel: ata3.00: HPA detected: current 2930277168, native 18446744072344861488

Aug 4 21:35:28 Tower kernel: ata3.00: ATA-7: SAMSUNG HD154UI, 1AG01118, max UDMA7

Aug 4 21:35:28 Tower kernel: ata3.00: 2930277168 sectors, multi 16: LBA48 NCQ (depth 31/32)

Aug 4 21:35:28 Tower kernel: ata3.00: configured for UDMA/133

Aug 4 21:35:28 Tower kernel: ata4: softreset failed (device not ready)

Aug 4 21:35:28 Tower kernel: ata4: failed due to HW bug, retry pmp=0

Aug 4 21:35:28 Tower kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Aug 4 21:35:28 Tower kernel: ata4.00: ATA-8: WDC WD10EADS-00L5B1, 01.01A01, max UDMA/133

Aug 4 21:35:28 Tower kernel: ata4.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)

Aug 4 21:35:28 Tower kernel: ata4.00: configured for UDMA/133

now, did disk2 fail? Or are you just trying to upgrade it? Your big problem seems to be disk4.

Can you get SMART report on disk4

smartctl -d ata -a /dev/hda

August 19, 201015 yr

Author

I was replacing disk2 since it looked like it had failed, but now I'm wondering if the problem was actually from something else. The web console displayed a red ball next to disk2, so I removed it from the tower and placed it in an external device I had. I tried to check the contents through windows using some software I had downloaded (I think it was called DiskInternals), and it couldn't locate the drive, so I figured it had died. Maybe I was too hasty in diagnosing the problem on my own. I'm definitely in over my head at this point.

One thing that has me suspicious is that my power supply had very few SATA connectors, and I ended up buying a bunch of cheap IDE to SATA adaptors. I had an extra one lying around so I switched the one that connected to disk4. I figure if I'm lucky, that will solve the problem.

I ran the smartctl command as you suggested, but I'm not sure what I should be looking for. I'm uploading that and a new syslog. So far things appear to be going well, and data is transfering at about 58,000 KB/sec, which is much faster than before.

syslog-2010-08-18.txt

August 20, 201015 yr

Author

It appears that the IDE to SATA adaptor may have been the problem. When I woke up this morning the terminal showed "id c1 respawning too fast", same thing with ID's c2, c3, c4. I couldn't log in, so I had to do a hard power down. When I restarted, I found that the server completed the rebuild over the night. It began a parity check this morning, which is now completed. It says parity is valid and all drives are operational, although the parity check found 26988143 errors. Is that something I should be concerned about?

August 20, 201015 yr

It appears that the IDE to SATA adaptor may have been the problem. When I woke up this morning the terminal showed "id c1 respawning too fast", same thing with ID's c2, c3, c4. I couldn't log in, so I had to do a hard power down. When I restarted, I found that the server completed the rebuild over the night. It began a parity check this morning, which is now completed. It says parity is valid and all drives are operational, although the parity check found 26988143 errors. Is that something I should be concerned about?

You should do another parity check. a few million errors indicates a huge issue. Expect NO parity errors on this next "check" If you do have errors, you have a hardware issue of some kind.

August 20, 201015 yr

Author

I ran another parity check and this one resulted in 0 errors. I'm not convinced I'm totally out of the woods, and plan to keep a close eye on the system. Thank you very much for your help.

3 weeks later...

September 5, 201015 yr

Author

Having trouble again, and it appears disk4 has failed. I've tried a number of different IDE to SATA adaptors and it constantly displays a red ball. Before going out and buying a replacement drive, I figured I would upload the syslog and smart capture. Any thoughts?

syslog-2010-09-05.txt

September 5, 201015 yr

The smart report looks quite normal. I don't think the drive has died at all.

It will stay as "red" until the array thinks it is replaced. (It will not automatically go back online if you fixed a bad connection or replaced a bas SATA-IDE adapter)

To get it back onlline and to have it re-construct the contents onto itself perform these steps:

Stop the array

Un-assign disk4.

Start the array with disk4 un-assigned (This will cause the array to forget the model/serial number of the "failed" disk4 drive)

Stop the array once more

Re-assign disk4

Start the array once more. This time it will begin the process of re-constructing disk4. (remember, it was taken off-line when a write to it failed, so it needs to be re-constructed)

Do NOT press the button labeled as "restore" as it is actually a "Initialize Configuration and Immediately Invalidate Parity based on any Prior Configuration" button. It would cause you to lose the data on disk4 it it was really defective.

Let's hope it was just the SATA/IDE adapter.

Joe L.

Archived

This topic is now archived and is closed to further replies.

Go to topic listing

Where:

Search:

Date Created:

Use:

Last Updated:

Chrome (Android)

Tap the lock icon next to the address bar.
Tap Permissions → Notifications.
Adjust your preference.

Chrome (Desktop)

Click the padlock icon in the address bar.
Select Site settings.
Find Notifications and adjust your preference.