lovingHDTV Posted April 15, 2006 Share Posted April 15, 2006 I was saving a file to the tower and it hung. Windows gave an error about not being able to close the file. When I checked the tower it seemed to be OK, but I could not cd into the offending directory and syslog said: Apr 15 20:09:58 Tower kernel: hdi: lost interrupt I could also not remove the directory that had the broken file. I tried to shutdown and it hung, though another window I tried to reboot, and that hung. I then power cycled the machine and now I get the following errors upon rebooting: Apr 15 20:07:58 Tower kernel: md0: parity incorrect: 2308504 Apr 15 20:07:58 Tower kernel: md0: parity incorrect: 2308512 Apr 15 20:07:58 Tower kernel: md0: parity incorrect: 2308520 Apr 15 20:07:58 Tower kernel: md0: parity incorrect: 2308528 Apr 15 20:07:58 Tower kernel: md0: parity incorrect: 2308928 Apr 15 20:07:58 Tower kernel: md0: parity incorrect: 2314960 Apr 15 20:07:58 Tower kernel: md0: parity incorrect: 2314968 Ideas? Quote Link to comment
lovingHDTV Posted April 15, 2006 Author Share Posted April 15, 2006 Well, it did come back up and started a parity check. It then hung during the parity check. I've now removed the harddrive and am running unprotected. The drive never has had any issues with it up till now, it is about 4 months old. I took the drive out, rebooted, deleted the file I had the original trouble with, shutdown, put drive back in, rebooted. It is now rebuilding the server, and will be done in 1682 minutes! Wow that is a long time! Hopefully all will work out in the end. Quote Link to comment
lovingHDTV Posted April 16, 2006 Author Share Posted April 16, 2006 While monitoring the rebuild process I've seen this error a couple times: Apr 15 19:06:14 Tower kernel: hdi: lost interrupt hdi is the drive being rebuilt. Quote Link to comment
lovingHDTV Posted April 17, 2006 Author Share Posted April 17, 2006 Well 1600+ minutes later the tower did finally finish the rebuild. Is this a typical time frame to rebuild a 320GB drive? On the bright side there were no DMA errors during that entire rebuild time. That is amazing considering the issues I had prior to upgrading to the new version. Appears that the new version has made significant improvements for me. Quote Link to comment
TCIII Posted April 17, 2006 Share Posted April 17, 2006 lovingHDTV, 1600+ minutes is about four times longer than it takes my three Terabyte unRaid array to do a parity check. Typically, it takes my array around 360 minutes or less now with the new OS. I would say that you definitely have some kind of problem. Regards, TCIII Quote Link to comment
lovingHDTV Posted April 17, 2006 Author Share Posted April 17, 2006 My parity checks take ~3hours to complete also. That is why I was surprised at the rebuild time. I could see a 2x increase in time to rebuild over parity check, but what I saw was much longer. thanks, Quote Link to comment
limetech Posted April 17, 2006 Share Posted April 17, 2006 Well 1600+ minutes later the tower did finally finish the rebuild. Is this a typical time frame to rebuild a 320GB drive? No, that is definately not typical... unless perhaps you have an old, slow disk in there. On the bright side there were no DMA errors during that entire rebuild time. That is amazing considering the issues I had prior to upgrading to the new version. Appears that the new version has made significant improvements for me. Glad to hear that, but perhaps what's happening is there are still DMA errors occurring, just not hanging the system. At the present time, I am out of town until Thursday, Apr. 20. When I get back, I'll be able to give you some things to try to test the health of your system. Quote Link to comment
lovingHDTV Posted April 18, 2006 Author Share Posted April 18, 2006 Digging through my /var/log/messages file I noticed this line: Apr 15 18:33:55 Tower kernel: hdi: DMA disabled Does that mean my hdi is running in PIO mode? Quote Link to comment
Joe L. Posted April 19, 2006 Share Posted April 19, 2006 That might explain your very long parity rebuild time... Quote Link to comment
lovingHDTV Posted April 19, 2006 Author Share Posted April 19, 2006 I stopped the array and rebooted. Now I see this in the /var/log/messages file: Apr 19 00:26:00 Tower kernel: ide4: BM-DMA at 0xbc00-0xbc07, BIOS settings: hdi:pio, hdj:pio Apr 19 00:26:00 Tower kernel: ide5: BM-DMA at 0xbc08-0xbc0f, BIOS settings: hdk:pio, hdl:pio Apr 19 00:26:00 Tower kernel: hda: 625142448 sectors (320073 MB) w/8192KiB Cache, CHS=38913/255/63, UDMA(100) Apr 19 00:26:00 Tower kernel: hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=14593/255/63, UDMA(100) Apr 19 00:26:00 Tower kernel: hdi: 625142448 sectors (320073 MB) w/8192KiB Cache, CHS=38913/255/63, UDMA(100) Does this mean that the BIOS is trying to set this drive to pio mode, but the kernal overrode it this time to UDMA 100? I ran my bitanalyzer (a program my brother wrote for me that reads/writes a file and reports the throughput) and I am getting better performance than before. Before I was getting ~10Mb/s now I am getting 14Mb/s for writes. Reads were ~80Mb/s. So I think this means that I'm truly running UDMA again. Quote Link to comment
limetech Posted April 21, 2006 Share Posted April 21, 2006 The on-board IDE controller bios do set up DMA; the Promise controller bios comes up in PIO mode. Regardless, linux will enable the highest DMA mode supported by the controller/drive upon boot up. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.