smeehrrr Posted March 6, 2006 Share Posted March 6, 2006 Any idea how to handle this? Performance grinds to a halt and I see this in the logs: hdk: dma_timer_expiry: dma status == 0x60 hdk: timeout waiting for DMA PDC202XX: Secondary channel reset. hdk: timeout waiting for DMA hdk: (__ide_dma_test_irq) called while not waiting hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest } hdk: drive not ready for command hdk: status timeout: status=0xd1 { Busy } hdl: DMA disabled PDC202XX: Secondary channel reset. hdk: drive not ready for command ide5: reset: success blk: queue c0326c44, I/O limit 4095Mb (mask 0xffffffff) hdk: dma_timer_expiry: dma status == 0x20 hdk: timeout waiting for DMA PDC202XX: Secondary channel reset. hdk: timeout waiting for DMA hdk: (__ide_dma_test_irq) called while not waiting hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest } hdk: drive not ready for command hdk: status timeout: status=0xd1 { Busy } PDC202XX: Secondary channel reset. hdk: drive not ready for command ide5: reset: success hdk: dma_timer_expiry: dma status == 0x20 hdk: timeout waiting for DMA PDC202XX: Secondary channel reset. hdk: timeout waiting for DMA hdk: (__ide_dma_test_irq) called while not waiting hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest } hdk: drive not ready for command hdk: status timeout: status=0xd1 { Busy } PDC202XX: Secondary channel reset. hdk: drive not ready for command ide5: reset: success hdk: dma_timer_expiry: dma status == 0x20 hdk: timeout waiting for DMA PDC202XX: Secondary channel reset. hdk: timeout waiting for DMA hdk: (__ide_dma_test_irq) called while not waiting hdk: status error: status=0x58 { DriveReady SeekComplete DataRequest } hdk: drive not ready for command hdk: status timeout: status=0xd1 { Busy } PDC202XX: Secondary channel reset. hdk: drive not ready for command ide5: reset: success I'm using Promise 133 controllers in this, could that be the problem? I have four Unraid boxes running and this is the first one I've seen this on. Quote Link to comment
TCIII Posted March 6, 2006 Share Posted March 6, 2006 smeehrrr, Make sure that your Promise 133 TX2 PATA PCI cards have the latest firmware installed. You can check for and get the latest firmware off of the Promise website. Just a thought. Regards, TCIII Quote Link to comment
DeathtoToasters Posted March 6, 2006 Share Posted March 6, 2006 Tom, Got a few questions for you: A) How do I check the current version on the cards? (Install them on a windows machine?) B) I assume I will need to update them in another machine as my Coolermaseter Stacker has not floppy or CD Drives installed. Am I correct on this? Thanks DeathtoToasters Quote Link to comment
smeehrrr Posted March 6, 2006 Author Share Posted March 6, 2006 smeehrrr, Make sure that your Promise 133 TX2 PATA PCI cards have the latest firmware installed. You can check for and get the latest firmware off of the Promise website. Yup, they're current. I have a floppy I use when building these machines that updates the BIOS and then updates the IDE firmware. Quote Link to comment
rharvey Posted March 6, 2006 Share Posted March 6, 2006 This is the DMA issue that I and many others have suffered with, have you read the threads about this. Basically the UnRaid OS in it's current state does not handle DMA errors well at all. The grind to a halt you describe to many of us looked like total lockups. You have one or more drives that UnRaid does not like at all, until Tom provides a fix for this your only option is to pull the suspect drive(s) out of the system. Looking at the log it looks like HDk is the culprit which is Disk6 or the 7th counting the parity drive as #1 . For reference: hda = parity hdb = disk1 hdc = disk2 hdd = disk3 hdi = disk4 hdj = disk5 hdk = disk6 hdl = disk7 hde = disk8 hdf = disk9 hdg = disk10 hdh = disk11 If I were you I would pull that drive out, reset the array and see if the problem goes away. Quote Link to comment
TCIII Posted March 6, 2006 Share Posted March 6, 2006 DeathtoToasters, The version of the firmware on the Promise card is usually on a label attached to or stenciled on to the LSI controller chip. As smeehrrr said in a previous post, you will need a machine that has a floppy disk as updating the Promise controller firmware requires operating in the DOS domain. Regards, TCIII Quote Link to comment
lovingHDTV Posted March 6, 2006 Share Posted March 6, 2006 I has some success in moving the drive with DMA errors from the promise controlled to the motherboard controller. If you happen to have a slot open on the motherboard controller you could try moving it. You are supposed to be able to transparently swap drives around the UnRaid system, and have it recognize them and continue, but I don't think this works currently. Quote Link to comment
limetech Posted March 7, 2006 Share Posted March 7, 2006 ... You are supposed to be able to transparently swap drives around the UnRaid system, and have it recognize them and continue, but I don't think this works currently. This works in the current s/w as long as the data disks occupy the same set of slots. For example, suppose you have parity, disk1, disk2, and disk3. You can swap around the drives amoung disk1, disk2, and disk3 slots. But you can not put, say, disk1 in disk4 slot (leaving disk1 slot empty). This will be fixed in a future release. Quote Link to comment
limetech Posted March 7, 2006 Share Posted March 7, 2006 The DMA error problem is replied to here: http://lime-technology.com/forum/index.php?topic=13.msg33#msg33 Short answer: "We're working on it." Quote Link to comment
smeehrrr Posted March 8, 2006 Author Share Posted March 8, 2006 Out of curiosity, what drives are people hitting this one? I'm only seeing it on Seagate 400GB drives, 7200.8 series. Quote Link to comment
Joe L. Posted March 8, 2006 Share Posted March 8, 2006 Tom, Not sure if these links will give you any clues, but I did a bit of searching with google and perhaps one of them might help. This following message from a Linux kernel mailing list seems to describe the issue some are having with lockups. It specifically mentions the 20265 and 20267 chipset on the Promise controller, but the file it is being applied to is named pcd202XX so it may apply to the 20268 chipset on the current Promise cards. http://marc.theaimsgroup.com/?l=linux-kernel&m=104250818527780&w=2 This following link is to a very long thread of messages that describe a similar patch for the 20265 thru 20270 chipsets http://kerneltrap.org/node/3040 It contains links to various patches including proposed patches to the 2.4 and 2.6 kernels. Another thread describes how the Promise card ends up in an infinite loop, waiting for an interrupt that never occurs because interrupts have been disabled. http://www.uwsg.iu.edu/hypermail/linux/kernel/0102.0/1334.html It suggests replacing a "while" loop with a call to a timer. This unsigned long timeout = jiffies + ((HZ + 19)/20) + 1; while (0 < (signed long)(timeout - jiffies)); gets replaced with this mdelay(50); Notice nothing in the "while loop" decrements the counter. Apparently, the counter is decremented in an interrupt driver routine. If the interrupt never occurs, the loop runs forever. (These new 3GHz machines run infinite loops reallllllly fast, but they still seem to take forever to run :() It mentions that if you have the NMI Watchdog enabled, you will get nice "oopses." NMI (non-maskable interrupt) watchdog can be enabled as shown on this link http://slacksite.com/slackware/nmi.html by adding "nmi_watchdog=1" to the boot parameters in the grub menu. This does not fix the Promise controller bug, but it will apparently reboot the system if it gets hung. Might this be something an unRaid user could try. Do we have support for the NMI watchdog in our kernels? Joe L. Quote Link to comment
limetech Posted March 8, 2006 Share Posted March 8, 2006 Thanks Joe. I've seen those threads and am investigating. Some of the patches don't apply to our kernel, some are slightly different, some have already been applied Using the NMI watchdog to at least reboot is an interesting idea. It's currently not enabled, but I'll look into this. Quote Link to comment
BLKMGK Posted April 4, 2006 Share Posted April 4, 2006 I *may* have stumbled across a potential "fix" for the UDMA errors!!! Mind you I'm not ever seeing them but reading up on another Linux based NAS I stumbled across a thread where these guys had the same sorts of issues. They figured out a way to stop it by reducing the DMA access. http://sourceforge.net/forum/forum.php?thread_id=1458967&forum_id=507589 Aparrently there's a way to reduce the DMA access level to solve this using utilities. The drive manufacturers have utilities and there's a utlity in BSD too so maybe something exists in Linux too however the manufacturer stuff is better since it sets it and sticks. Hope that helps some! Quote Link to comment
limetech Posted April 6, 2006 Share Posted April 6, 2006 One of the issues we've had in nailing down this problem is that we don't have any disks which consistenly exhibit it. However, we have an identical set of 12 Hitachi HDS722525VLAT80's, of which one occasionally fails with DMA error. After several weeks of testing, here are our conclusions: 1. The suspect disk never fails if connected to either connector of the on-board IDE controllers. 2. The disk will only fail if on the Secondary connector of a Promise controller. 3. Doesn't matter if using a "short" (18") or "long" (24") cable. 4. When it does fail, it does hang the system completely. It does appear however, that "slowing the disk down" solves the problem (or at least works around it). The disk normally wants to operate in "Ultra DMA mode 5", but if forced to run in "Ultra DMA mode 4" it no longer fails with DMA error and system is only very slightly slowed down. To set a drive to a specific Ultra DMA mode, you need to edit the "go" file on the Flash. You will see a series of "hdparm" commands used to set up each disk. For example, for disk9, there's: hdparm -c1d1a0m8A1W1u1 /dev/hdf to force this disk to use Ultra DMA mode 4, change the line to this: hdparm -c1d1a0m8A1W1u1 -Xudma4 /dev/hdf We haven't given up on solving this problem "correctly" however. Quote Link to comment
rharvey Posted April 11, 2006 Share Posted April 11, 2006 Tom, Didn't you say that the new kernel was less DMA error prone that the last one and the 324 upgrade would fix the DMA issue...? My 2 drives that were killing me have not be removed so I can't tell if running 324 (which I am) has solved the problem or not. I have one of the 2 drives still, maybe I should stick it back in and see for myself. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.