DMA Errors

March 6, 200620 yr

Any idea how to handle this? Performance grinds to a halt and I see this in the logs:

hdk: dma_timer_expiry: dma status == 0x60
hdk: timeout waiting for DMA
PDC202XX: Secondary channel reset.
hdk: timeout waiting for DMA
hdk: (__ide_dma_test_irq) called while not waiting
hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hdk: drive not ready for command
hdk: status timeout: status=0xd1 { Busy }

hdl: DMA disabled
PDC202XX: Secondary channel reset.
hdk: drive not ready for command
ide5: reset: success
blk: queue c0326c44, I/O limit 4095Mb (mask 0xffffffff)
hdk: dma_timer_expiry: dma status == 0x20
hdk: timeout waiting for DMA
PDC202XX: Secondary channel reset.
hdk: timeout waiting for DMA
hdk: (__ide_dma_test_irq) called while not waiting
hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hdk: drive not ready for command
hdk: status timeout: status=0xd1 { Busy }

PDC202XX: Secondary channel reset.
hdk: drive not ready for command
ide5: reset: success
hdk: dma_timer_expiry: dma status == 0x20
hdk: timeout waiting for DMA
PDC202XX: Secondary channel reset.
hdk: timeout waiting for DMA
hdk: (__ide_dma_test_irq) called while not waiting
hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hdk: drive not ready for command
hdk: status timeout: status=0xd1 { Busy }

PDC202XX: Secondary channel reset.
hdk: drive not ready for command
ide5: reset: success
hdk: dma_timer_expiry: dma status == 0x20
hdk: timeout waiting for DMA
PDC202XX: Secondary channel reset.
hdk: timeout waiting for DMA
hdk: (__ide_dma_test_irq) called while not waiting
hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest }

hdk: drive not ready for command
hdk: status timeout: status=0xd1 { Busy }

PDC202XX: Secondary channel reset.
hdk: drive not ready for command
ide5: reset: success

I'm using Promise 133 controllers in this, could that be the problem? I have four Unraid boxes running and this is the first one I've seen this on.

Quote

March 6, 200620 yr

smeehrrr,

Make sure that your Promise 133 TX2 PATA PCI cards have the latest firmware installed.

You can check for and get the latest firmware off of the Promise website.

Just a thought.

Regards,

TCIII

Quote

March 6, 200620 yr

Tom,

Got a few questions for you:

A) How do I check the current version on the cards? (Install them on a windows machine?)

B) I assume I will need to update them in another machine as my Coolermaseter Stacker has not floppy or CD Drives installed. Am I correct on this?

Thanks

DeathtoToasters

Quote

March 6, 200620 yr

Author

smeehrrr,

Make sure that your Promise 133 TX2 PATA PCI cards have the latest firmware installed.

You can check for and get the latest firmware off of the Promise website.

Yup, they're current. I have a floppy I use when building these machines that updates the BIOS and then updates the IDE firmware.

Quote

March 6, 200620 yr

This is the DMA issue that I and many others have suffered with, have you read the threads about this. Basically the UnRaid OS in it's current state does not handle DMA errors well at all. The grind to a halt you describe to many of us looked like total lockups. You have one or more drives that UnRaid does not like at all, until Tom provides a fix for this your only option is to pull the suspect drive(s) out of the system. Looking at the log it looks like HDk is the culprit which is Disk6 or the 7th counting the parity drive as #1 . For reference:

hda = parity

hdb = disk1

hdc = disk2

hdd = disk3

hdi = disk4

hdj = disk5

hdk = disk6

hdl = disk7

hde = disk8

hdf = disk9

hdg = disk10

hdh = disk11

If I were you I would pull that drive out, reset the array and see if the problem goes away.

Quote

March 6, 200620 yr

DeathtoToasters,

The version of the firmware on the Promise card is usually on a label attached to or stenciled on to the LSI controller chip.

As smeehrrr said in a previous post, you will need a machine that has a floppy disk as updating the Promise controller firmware requires operating in the DOS domain.

Regards,

TCIII

Quote

March 6, 200620 yr

I has some success in moving the drive with DMA errors from the promise controlled to the motherboard controller. If you happen to have a slot open on the motherboard controller you could try moving it.

You are supposed to be able to transparently swap drives around the UnRaid system, and have it recognize them and continue, but I don't think this works currently.

Quote

March 7, 200620 yr

...

You are supposed to be able to transparently swap drives around the UnRaid system, and have it recognize them and continue, but I don't think this works currently.

This works in the current s/w as long as the data disks occupy the same set of slots. For example, suppose you have parity, disk1, disk2, and disk3. You can swap around the drives amoung disk1, disk2, and disk3 slots. But you can not put, say, disk1 in disk4 slot (leaving disk1 slot empty). This will be fixed in a future release.

Quote

March 7, 200620 yr

The DMA error problem is replied to here:

http://lime-technology.com/forum/index.php?topic=13.msg33#msg33

Short answer: "We're working on it."

Quote

March 8, 200620 yr

Author

Out of curiosity, what drives are people hitting this one? I'm only seeing it on Seagate 400GB drives, 7200.8 series.

Quote

March 8, 200620 yr

Tom,

Not sure if these links will give you any clues, but I did a bit of searching with google and perhaps one of them might help.

This following message from a Linux kernel mailing list seems to describe the issue some are having with lockups. It specifically mentions the 20265 and 20267 chipset on the Promise controller, but the file it is being applied to is named pcd202XX so it may apply to the 20268 chipset on the current Promise cards.

http://marc.theaimsgroup.com/?l=linux-kernel&m=104250818527780&w=2

This following link is to a very long thread of messages that describe a similar patch for the 20265 thru 20270 chipsets

http://kerneltrap.org/node/3040 It contains links to various patches including proposed patches to the 2.4 and 2.6 kernels.

Another thread describes how the Promise card ends up in an infinite loop, waiting for an interrupt that never occurs because interrupts have been disabled.

http://www.uwsg.iu.edu/hypermail/linux/kernel/0102.0/1334.html

It suggests replacing a "while" loop with a call to a timer.

This

unsigned long timeout = jiffies + ((HZ + 19)/20) + 1;
while (0 < (signed long)(timeout - jiffies));

gets replaced with this

mdelay(50);

Notice nothing in the "while loop" decrements the counter. Apparently, the counter is decremented in an interrupt driver routine. If the interrupt never occurs, the loop runs forever. (These new 3GHz machines run infinite loops reallllllly fast, but they still seem to take forever to run :()

It mentions that if you have the NMI Watchdog enabled, you will get nice "oopses."

NMI (non-maskable interrupt) watchdog can be enabled as shown on this link http://slacksite.com/slackware/nmi.html

by adding "nmi_watchdog=1" to the boot parameters in the grub menu.

This does not fix the Promise controller bug, but it will apparently reboot the system if it gets hung. Might this be something an unRaid user could try. Do we have support for the NMI watchdog in our kernels?

Joe L.

Quote

March 8, 200620 yr

Thanks Joe. I've seen those threads and am investigating. Some of the patches don't apply to our kernel, some are slightly different, some have already been applied

Using the NMI watchdog to at least reboot is an interesting idea. It's currently not enabled, but I'll look into this.

Quote

April 4, 200620 yr

I *may* have stumbled across a potential "fix" for the UDMA errors!!! Mind you I'm not ever seeing them but reading up on another Linux based NAS I stumbled across a thread where these guys had the same sorts of issues. They figured out a way to stop it by reducing the DMA access.

http://sourceforge.net/forum/forum.php?thread_id=1458967&forum_id=507589

Aparrently there's a way to reduce the DMA access level to solve this using utilities. The drive manufacturers have utilities and there's a utlity in BSD too so maybe something exists in Linux too however the manufacturer stuff is better since it sets it and sticks.

Hope that helps some!

Quote

April 6, 200620 yr

One of the issues we've had in nailing down this problem is that we don't have any disks which consistenly exhibit it. However, we have an identical set of 12 Hitachi HDS722525VLAT80's, of which one occasionally fails with DMA error. After several weeks of testing, here are our conclusions:

1. The suspect disk never fails if connected to either connector of the on-board IDE controllers.

2. The disk will only fail if on the Secondary connector of a Promise controller.

3. Doesn't matter if using a "short" (18") or "long" (24") cable.

4. When it does fail, it does hang the system completely.

It does appear however, that "slowing the disk down" solves the problem (or at least works around it). The disk normally wants to operate in "Ultra DMA mode 5", but if forced to run in "Ultra DMA mode 4" it no longer fails with DMA error and system is only very slightly slowed down.

To set a drive to a specific Ultra DMA mode, you need to edit the "go" file on the Flash. You will see a series of "hdparm" commands used to set up each disk. For example, for disk9, there's:

hdparm -c1d1a0m8A1W1u1  /dev/hdf

to force this disk to use Ultra DMA mode 4, change the line to this:

hdparm -c1d1a0m8A1W1u1 -Xudma4  /dev/hdf

We haven't given up on solving this problem "correctly" however.

Quote

April 11, 200620 yr

Tom,

Didn't you say that the new kernel was less DMA error prone that the last one and the 324 upgrade would fix the DMA issue...? My 2 drives that were killing me have not be removed so I can't tell if running 324 (which I am) has solved the problem or not. I have one of the 2 drives still, maybe I should stick it back in and see for myself.

Quote

DMA Errors

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)