September 12, 200619 yr Author here is the next log. What happened here, I had the array without the parity drive installed. and copied a 175gb file to all the drives. Sep 11 20:16:36 Tower kernel: PDC202XX: Secondary channel reset. Sep 11 20:16:36 Tower kernel: hdl: timeout waiting for DMA Sep 11 20:16:36 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 11 20:16:36 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 11 20:16:36 Tower kernel: Sep 11 20:16:36 Tower kernel: hdl: drive not ready for command Sep 11 20:16:36 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 11 20:16:36 Tower kernel: Sep 11 20:16:36 Tower kernel: PDC202XX: Secondary channel reset. Sep 11 20:16:36 Tower kernel: hdl: drive not ready for command Sep 11 20:16:37 Tower kernel: ide5: reset: success Sep 11 20:16:37 Tower kernel: blk: queue c033cca0, I/O limit 4095Mb (mask 0xffff ffff) Sep 11 20:16:37 Tower kernel: hdl: dma_intr: status=0x51 { DriveReady SeekComple te Error } Sep 11 20:16:37 Tower kernel: hdl: dma_intr: error=0x84 { DriveStatusError BadCR C } Sep 11 20:16:37 Tower kernel: hdl: dma_intr: status=0x51 { DriveReady SeekComple te Error } Sep 11 20:16:37 Tower kernel: hdl: dma_intr: error=0x84 { DriveStatusError BadCR C } Sep 11 20:16:37 Tower kernel: hdl: dma_intr: status=0x51 { DriveReady SeekComple te Error } Sep 11 20:16:37 Tower kernel: hdl: dma_intr: error=0x84 { DriveStatusError BadCR C } Sep 11 20:16:57 Tower kernel: hdl: dma_timer_expiry: dma status == 0x40 Sep 11 20:16:57 Tower kernel: hdl: timeout waiting for DMA Sep 11 20:16:57 Tower kernel: PDC202XX: Secondary channel reset. Sep 11 20:16:57 Tower kernel: hdl: timeout waiting for DMA Sep 11 20:16:57 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 11 20:16:57 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 11 20:16:57 Tower kernel: Sep 11 20:16:57 Tower kernel: hdl: drive not ready for command Sep 11 20:16:57 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 11 20:16:57 Tower kernel: Sep 11 20:16:57 Tower kernel: PDC202XX: Secondary channel reset. Sep 11 20:16:57 Tower kernel: hdl: drive not ready for command Sep 11 20:16:57 Tower kernel: ide5: reset: success Sep 11 20:17:18 Tower kernel: hdl: dma_timer_expiry: dma status == 0x40 Sep 11 20:17:18 Tower kernel: hdl: timeout waiting for DMA Sep 11 20:17:18 Tower kernel: PDC202XX: Secondary channel reset. Sep 11 20:17:18 Tower kernel: hdl: timeout waiting for DMA Sep 11 20:17:18 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 11 20:17:18 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 11 20:17:18 Tower kernel: Sep 11 20:17:18 Tower kernel: hdl: drive not ready for command Sep 11 20:17:18 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 11 20:17:18 Tower kernel: Sep 11 20:17:18 Tower kernel: PDC202XX: Secondary channel reset. Sep 11 20:17:18 Tower kernel: hdl: drive not ready for command Sep 11 20:17:18 Tower kernel: ide5: reset: success Sep 11 20:17:18 Tower kernel: hdl: dma_intr: status=0x51 { DriveReady SeekComple te Error } Sep 11 20:17:18 Tower kernel: hdl: dma_intr: error=0x84 { DriveStatusError BadCR C } Sep 11 20:17:38 Tower kernel: hdl: dma_timer_expiry: dma status == 0x40 Sep 11 20:17:38 Tower kernel: hdl: timeout waiting for DMA Sep 11 20:17:38 Tower kernel: PDC202XX: Secondary channel reset. Sep 11 20:17:38 Tower kernel: hdl: timeout waiting for DMA Sep 11 20:17:38 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 11 20:17:38 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 11 20:17:38 Tower kernel: Sep 11 20:17:38 Tower kernel: hdl: drive not ready for command Sep 11 20:17:38 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 11 20:17:38 Tower kernel: Sep 11 20:17:38 Tower kernel: PDC202XX: Secondary channel reset. Sep 11 20:17:38 Tower kernel: hdl: drive not ready for command Sep 11 20:17:39 Tower kernel: ide5: reset: success Sep 11 20:19:34 Tower kernel: hdf: dma_timer_expiry: dma status == 0x60 Sep 11 20:19:34 Tower kernel: hdf: timeout waiting for DMA Sep 11 20:19:34 Tower kernel: PDC202XX: Primary channel reset. Sep 11 20:19:34 Tower kernel: hdf: timeout waiting for DMA Sep 11 20:19:34 Tower kernel: hdf: (__ide_dma_test_irq) called while not waiting Sep 11 20:19:34 Tower kernel: hdf: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 11 20:19:34 Tower kernel: Sep 11 20:19:34 Tower kernel: hdf: drive not ready for command Sep 11 20:19:34 Tower kernel: hdf: status timeout: status=0xd1 { Busy } Sep 11 20:19:34 Tower kernel: Sep 11 20:19:34 Tower kernel: PDC202XX: Primary channel reset. Sep 11 20:19:34 Tower kernel: hdf: drive not ready for command Sep 11 20:19:35 Tower kernel: ide2: reset: success Sep 11 20:19:35 Tower kernel: blk: queue c033bfa4, I/O limit 4095Mb (mask 0xffff ffff) Sep 11 20:19:55 Tower kernel: get_token: status Sep 11 20:19:55 Tower kernel: hdk: drive_cmd: status=0x51 { DriveReady SeekCompl ete Error } Sep 11 20:19:55 Tower kernel: hdk: drive_cmd: error=0x04 { DriveStatusError } Sep 11 20:19:55 Tower kernel: hdf: dma_timer_expiry: dma status == 0x40 Sep 11 20:19:55 Tower kernel: hdf: timeout waiting for DMA Sep 11 20:19:55 Tower kernel: PDC202XX: Primary channel reset. Sep 11 20:19:55 Tower kernel: hdf: timeout waiting for DMA Sep 11 20:19:55 Tower kernel: hdf: (__ide_dma_test_irq) called while not waiting Sep 11 20:19:55 Tower kernel: hdf: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 11 20:19:55 Tower kernel: Sep 11 20:19:55 Tower kernel: hdf: drive not ready for command Sep 11 20:19:55 Tower kernel: hde: status timeout: status=0xd1 { Busy } Sep 11 20:19:55 Tower kernel: Sep 11 20:19:55 Tower kernel: hde: drive not ready for command Sep 11 20:19:56 Tower kernel: hdf: status timeout: status=0xd0 { Busy } Sep 11 20:19:56 Tower kernel: Sep 11 20:19:56 Tower kernel: PDC202XX: Primary channel reset. Sep 11 20:19:56 Tower kernel: hdf: no DRQ after issuing WRITE Sep 11 20:19:56 Tower kernel: ide2: reset: success Sep 11 20:19:59 Tower kernel: get_token: status Sep 11 20:19:59 Tower kernel: hdk: drive_cmd: status=0x51 { DriveReady SeekCompl ete Error } Sep 11 20:19:59 Tower kernel: hdk: drive_cmd: error=0x04 { DriveStatusError } Sep 11 20:20:16 Tower kernel: hdf: dma_timer_expiry: dma status == 0x40 Sep 11 20:20:16 Tower kernel: hdf: timeout waiting for DMA Sep 11 20:20:16 Tower kernel: PDC202XX: Primary channel reset. Sep 11 20:20:16 Tower kernel: hdf: timeout waiting for DMA Sep 11 20:20:16 Tower kernel: hdf: (__ide_dma_test_irq) called while not waiting Sep 11 20:20:17 Tower kernel: hdf: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 11 20:20:17 Tower kernel: Sep 11 20:20:17 Tower kernel: hdf: drive not ready for command Sep 11 20:20:17 Tower kernel: hde: status timeout: status=0xd1 { Busy } Sep 11 20:20:17 Tower kernel: Sep 11 20:20:17 Tower kernel: hde: drive not ready for command Sep 11 20:20:17 Tower kernel: hdf: status timeout: status=0xd0 { Busy } Sep 11 20:20:17 Tower kernel: Sep 11 20:20:17 Tower kernel: PDC202XX: Primary channel reset. Sep 11 20:20:17 Tower kernel: hdf: no DRQ after issuing WRITE Sep 11 20:20:17 Tower kernel: ide2: reset: success Sep 11 20:20:17 Tower kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComple te Error } Sep 11 20:20:17 Tower kernel: hdf: dma_intr: error=0x84 { DriveStatusError BadCR C } Sep 11 20:20:17 Tower kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComple te Error } Sep 11 20:20:17 Tower kernel: hdf: dma_intr: error=0x84 { DriveStatusError BadCR C } Sep 11 20:20:18 Tower kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComple te Error } Sep 11 20:20:18 Tower kernel: hdf: dma_intr: error=0x84 { DriveStatusError BadCR C } Sep 11 20:20:38 Tower kernel: hdf: dma_timer_expiry: dma status == 0x40 Sep 11 20:20:38 Tower kernel: hdf: timeout waiting for DMA Sep 11 20:20:38 Tower kernel: PDC202XX: Primary channel reset. Sep 11 20:20:38 Tower kernel: hdf: timeout waiting for DMA Sep 11 20:20:38 Tower kernel: hdf: (__ide_dma_test_irq) called while not waiting Sep 11 20:20:38 Tower kernel: hdf: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 11 20:20:38 Tower kernel: Sep 11 20:20:38 Tower kernel: hdf: drive not ready for command Sep 11 20:20:38 Tower kernel: hdf: status timeout: status=0xd1 { Busy } Sep 11 20:20:38 Tower kernel: Sep 11 20:20:38 Tower kernel: PDC202XX: Primary channel reset. Sep 11 20:20:38 Tower kernel: hdf: drive not ready for command Sep 11 20:20:39 Tower kernel: ide2: reset: success Sep 11 20:21:06 Tower kernel: get_token: status Sep 11 20:21:07 Tower kernel: hdk: drive_cmd: status=0x51 { DriveReady SeekCompl ete Error } Sep 11 20:21:07 Tower kernel: hdk: drive_cmd: error=0x04 { DriveStatusError } Sep 11 20:21:08 Tower kernel: hde: drive_cmd: status=0x51 { DriveReady SeekCompl ete Error } Sep 11 20:21:09 Tower kernel: hde: drive_cmd: error=0x04 { DriveStatusError } Sep 11 20:21:17 Tower nmbd[1001]: [2006/09/11 20:21:17, 0] nmbd/nmbd.c:terminate (56) Sep 11 20:21:17 Tower nmbd[1001]: Got SIGTERM: going down... Sep 11 20:21:21 Tower kernel: get_token: stop Sep 11 20:21:21 Tower kernel: md1: stopping Sep 11 20:21:21 Tower kernel: md2: stopping Sep 11 20:21:21 Tower kernel: md3: stopping Sep 11 20:21:21 Tower kernel: md4: stopping Sep 11 20:21:21 Tower kernel: md5: stopping Sep 11 20:21:21 Tower kernel: md6: stopping Sep 11 20:21:21 Tower kernel: md7: stopping Sep 11 20:21:21 Tower kernel: md8: stopping Sep 11 20:21:21 Tower kernel: md9: stopping Sep 11 20:21:21 Tower kernel: md: writing superblock to device sda2 Sep 11 20:21:21 Tower kernel: get_token: status Sep 11 20:21:21 Tower kernel: md: reading superblock from device sda2 Sep 11 20:21:21 Tower kernel: md: superblock events: 3 Sep 11 20:21:21 Tower kernel: md: blkdev_get error: -6 Sep 11 20:21:21 Tower kernel: md0: removed Sep 11 20:21:21 Tower kernel: md: import hdb WDC WD2500BB-00DWA0 WD-WMAEH1243568 offset: 63 size: 244198552 Sep 11 20:21:21 Tower kernel: md: import hdc Maxtor 4G120J6 G607S54E offset: 63 size: 120060832 Sep 11 20:21:21 Tower kernel: md: import hdd WDC WD2000JB-22GVA0 WD-WCAL81127566 offset: 63 size: 195360952 Sep 11 20:21:21 Tower kernel: md: import hdi ST3400620A 3QG053Y7 offset: 63 size : 390711352 Sep 11 20:21:21 Tower kernel: md: import hdj Maxtor 5A300J0 A825S3TE offset: 63 size: 292970128 Sep 11 20:21:21 Tower kernel: md: import hdk HDS724040KLAT80 KRFA01RAG1V4WA offs et: 63 size: 390711352 Sep 11 20:21:21 Tower kernel: md: import hdl ST3200826A 5ND274LX offset: 63 size : 195360952 Sep 11 20:21:21 Tower kernel: md: import hde HDS722525VLAT80 VN693ECFV03RAD offs et: 63 size: 244198552 Sep 11 20:21:21 Tower kernel: md: import hdf ST3300631A 5NF1K71E offset: 63 size : 293036152 Sep 11 20:21:21 Tower kernel: md: blkdev_get error: -6 Sep 11 20:21:21 Tower kernel: md: blkdev_get error: -6 Sep 11 20:21:21 Tower kernel: hdk: drive_cmd: status=0x51 { DriveReady SeekCompl ete Error } Sep 11 20:21:21 Tower kernel: hdk: drive_cmd: error=0x04 { DriveStatusError } Sep 11 20:21:21 Tower kernel: hde: drive_cmd: status=0x51 { DriveReady SeekCompl ete Error } Sep 11 20:21:21 Tower kernel: hde: drive_cmd: error=0x04 { DriveStatusError } Sep 11 20:21:21 Tower kernel: get_token: status Sep 11 20:21:21 Tower kernel: md: reading superblock from device sda2 Sep 11 20:21:21 Tower kernel: md: superblock events: 3 Sep 11 20:21:21 Tower kernel: md: blkdev_get error: -6 Sep 11 20:21:21 Tower kernel: md0: removed Sep 11 20:21:21 Tower kernel: md: import hdb WDC WD2500BB-00DWA0 WD-WMAEH1243568 offset: 63 size: 244198552 Sep 11 20:21:21 Tower kernel: md: import hdc Maxtor 4G120J6 G607S54E offset: 63 size: 120060832 Sep 11 20:21:21 Tower kernel: md: import hdd WDC WD2000JB-22GVA0 WD-WCAL81127566 offset: 63 size: 195360952 Sep 11 20:21:21 Tower kernel: md: import hdi ST3400620A 3QG053Y7 offset: 63 size : 390711352 Sep 11 20:21:21 Tower kernel: md: import hdj Maxtor 5A300J0 A825S3TE offset: 63 size: 292970128 Sep 11 20:21:21 Tower kernel: md: import hdk HDS724040KLAT80 KRFA01RAG1V4WA offs et: 63 size: 390711352 Sep 11 20:21:21 Tower kernel: md: import hdl ST3200826A 5ND274LX offset: 63 size : 195360952 Sep 11 20:21:21 Tower kernel: md: import hde HDS722525VLAT80 VN693ECFV03RAD offs et: 63 size: 244198552 Sep 11 20:21:21 Tower kernel: md: import hdf ST3300631A 5NF1K71E offset: 63 size : 293036152 Sep 11 20:21:21 Tower kernel: md: blkdev_get error: -6 Sep 11 20:21:21 Tower kernel: md: blkdev_get error: -6 Sep 11 20:21:21 Tower kernel: hde: drive_cmd: status=0x51 { DriveReady SeekCompl ete Error } Sep 11 20:21:21 Tower kernel: hde: drive_cmd: error=0x04 { DriveStatusError } Sep 11 20:21:21 Tower kernel: hdk: drive_cmd: status=0x51 { DriveReady SeekCompl ete Error } Sep 11 20:21:21 Tower kernel: hdk: drive_cmd: error=0x04 { DriveStatusError } Sep 11 20:21:22 Tower smbd[1108]: [2006/09/11 20:21:22, 0] smbd/service.c:make_c
September 12, 200619 yr Author so... /disk6,7,8,9 are all slow boats interesting... are those 4 disks all on the same controller card? (not sure how you did your cabling.) Or, are they all the same brand/model? I have one old 40Gig drive in my array that is veeeeerrrrrrryyyyyyy slow. As in your array, it will take forever to check parity until it gets past the first 40Gig... then it flies... I have not bothered to do anything with it as it is empty and is only there from my experiments when we tracked down the bug in replacing/upgrading drives. Joe L. Almost missed you post among my large log posts 6,7,8,9 are not on the same controller my array is setup how it should be setup. So 6 and 7 are ide2 of controller 1 and 8,9 are on ide1 controller 2. I don't know, but this sucks.... this weekend I spent about 8 hours ripping my backed up dvd collection over to my PC here and really wanted to push it all over All of these hard drives are newer high capacity drives. I could run that command that you gave me so you can see all the info. I don't think I can troubleshoot anymore tonight... I will have to wait and see what tom says. Also as an FYI disk7,8,9 don't have any data on them because when I try to copy hardly any data... the time just keeps increasing
September 12, 200619 yr Author teamhood- What power supply are you using? 2xSparkle 350's that were recommended
September 12, 200619 yr Author Also, the way the P/S are distributed PS1 (top one) is running the MOBO and the first 4 hard drives, and the top fan PS2 is running the other 8 hard drives.
September 12, 200619 yr Also, the way the P/S are distributed PS1 (top one) is running the MOBO and the first 4 hard drives, and the top fan PS2 is running the other 8 hard drives. In looking over the thread am I correct that it's possible the problems start cropping up when you've reached a certain NUMBER of hard drives installed? If so, I'd like to suggest you change the power distribution, not necessarily the supplies themselves (yet). For example, when we build the MD1200, we put first 6 drives + mobo + fans one one supply, the other 6 drives on the other supply. Would it be possible for you to wire it like that?
September 12, 200619 yr Author Tom, It is possible, I will give it a try when I get home tonight and I will let you know what happens! What makes you think it is a power supply issue?
September 12, 200619 yr Because with marginal power often times the first thing to fail is the hard drive (because of seeking).
September 12, 200619 yr I can tell you that for me I had these type of errors for months in my UnRaid. I was chasing HDD around, moving them and it was very painful and disharenting (you can see my thread). I changed my power supplies around and have not had an issue since. My UnRaid uptime is now over 100 days without issue. The frustrating thing for me is that I only have a total of 5 drives in the system so a 350W power supply should have been enough, all I can figure is that it is marginal (even though it was a brand new Sparkle). I just post to let you know it is not a far fetched theory
September 12, 200619 yr Author I can tell you that for me I had these type of errors for months in my UnRaid. I was chasing HDD around, moving them and it was very painful and disharenting (you can see my thread). I changed my power supplies around and have not had an issue since. My UnRaid uptime is now over 100 days without issue. The frustrating thing for me is that I only have a total of 5 drives in the system so a 350W power supply should have been enough, all I can figure is that it is marginal (even though it was a brand new Sparkle). I just post to let you know it is not a far fetched theory Thank you for the FYI. It this solves the problem... well I will be very happy. My server was rocking for a month but then when I added disk8 and disk9 it really seemed to crap the bed
September 13, 200619 yr Author Okay, parity and disk 1-5 are now on PS1. I copied over a file to disk1,2,and 3 with no problems. Disk 4 is having DMA issues: Sep 12 17:12:43 Tower kernel: hdi: dma_timer_expiry: dma status == 0x60 Sep 12 17:12:43 Tower kernel: hdi: timeout waiting for DMA Sep 12 17:12:43 Tower kernel: PDC202XX: Primary channel reset. Sep 12 17:12:43 Tower kernel: hdi: timeout waiting for DMA Sep 12 17:12:43 Tower kernel: hdi: (__ide_dma_test_irq) called while not waitin Sep 12 17:12:43 Tower kernel: hdi: status error: status=0x58 { DriveReady SeekC mplete DataRequest } Sep 12 17:12:43 Tower kernel: Sep 12 17:12:43 Tower kernel: hdi: drive not ready for command Sep 12 17:12:43 Tower kernel: hdi: status timeout: status=0xd0 { Busy } Sep 12 17:12:43 Tower kernel: Sep 12 17:12:43 Tower kernel: PDC202XX: Primary channel reset. Sep 12 17:12:43 Tower kernel: hdi: drive not ready for command Sep 12 17:12:43 Tower kernel: ide4: reset: success Sep 12 17:12:43 Tower kernel: blk: queue c033c710, I/O limit 4095Mb (mask 0xfff ffff) Sep 12 17:13:08 Tower kernel: hdi: dma_timer_expiry: dma status == 0x20 Sep 12 17:13:08 Tower kernel: hdi: timeout waiting for DMA Sep 12 17:13:08 Tower kernel: PDC202XX: Primary channel reset. Sep 12 17:13:08 Tower kernel: hdi: timeout waiting for DMA Sep 12 17:13:08 Tower kernel: hdi: (__ide_dma_test_irq) called while not waitin Sep 12 17:13:08 Tower kernel: hdi: status error: status=0x58 { DriveReady SeekC mplete DataRequest } Sep 12 17:13:08 Tower kernel: Sep 12 17:13:08 Tower kernel: hdi: drive not ready for command Sep 12 17:13:08 Tower kernel: hdi: status timeout: status=0xd0 { Busy } Sep 12 17:13:08 Tower kernel: Sep 12 17:13:08 Tower kernel: PDC202XX: Primary channel reset. Sep 12 17:13:08 Tower kernel: hdi: drive not ready for command Sep 12 17:13:09 Tower kernel: ide4: reset: success Copying over to disk 5 now......bah its hanging the server and My Networks. So it doesn't seem that P/S was causing me the issues, but we know that disk 4 has something wrong with it
September 13, 200619 yr Author Here is /disk7's error Sep 12 18:28:18 Tower kernel: hdl: dma_timer_expiry: dma status == 0x60 Sep 12 18:28:18 Tower kernel: hdl: timeout waiting for DMA Sep 12 18:28:18 Tower kernel: PDC202XX: Secondary channel reset. Sep 12 18:28:18 Tower kernel: hdl: timeout waiting for DMA Sep 12 18:28:18 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 12 18:28:18 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 12 18:28:18 Tower kernel: Sep 12 18:28:18 Tower kernel: hdl: drive not ready for command Sep 12 18:28:18 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 12 18:28:18 Tower kernel: Sep 12 18:28:18 Tower kernel: PDC202XX: Secondary channel reset. Sep 12 18:28:18 Tower kernel: hdl: drive not ready for command Sep 12 18:28:18 Tower kernel: ide5: reset: success Sep 12 18:28:18 Tower kernel: blk: queue c033cca0, I/O limit 4095Mb (mask 0xffff ffff) Sep 12 18:28:38 Tower kernel: hdl: dma_timer_expiry: dma status == 0x40 Sep 12 18:28:38 Tower kernel: hdl: timeout waiting for DMA Sep 12 18:28:38 Tower kernel: PDC202XX: Secondary channel reset. Sep 12 18:28:38 Tower kernel: hdl: timeout waiting for DMA Sep 12 18:28:38 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 12 18:28:38 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 12 18:28:38 Tower kernel: Sep 12 18:28:38 Tower kernel: hdl: drive not ready for command Sep 12 18:28:38 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 12 18:28:38 Tower kernel: Sep 12 18:28:38 Tower kernel: PDC202XX: Secondary channel reset. Sep 12 18:28:38 Tower kernel: hdl: drive not ready for command Sep 12 18:28:39 Tower kernel: ide5: reset: success
September 13, 200619 yr Author /disk8 seemed to copy over without an error and here is /disk9's error log Sep 12 18:32:36 Tower kernel: hdf: dma_timer_expiry: dma status == 0x60 Sep 12 18:32:36 Tower kernel: hdf: timeout waiting for DMA Sep 12 18:32:36 Tower kernel: PDC202XX: Primary channel reset. Sep 12 18:32:36 Tower kernel: hdf: timeout waiting for DMA Sep 12 18:32:36 Tower kernel: hdf: (__ide_dma_test_irq) called while not waiting Sep 12 18:32:36 Tower kernel: hdf: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 12 18:32:36 Tower kernel: Sep 12 18:32:36 Tower kernel: hdf: drive not ready for command Sep 12 18:32:36 Tower kernel: hdf: status timeout: status=0xd1 { Busy } Sep 12 18:32:36 Tower kernel: Sep 12 18:32:36 Tower kernel: PDC202XX: Primary channel reset. Sep 12 18:32:36 Tower kernel: hdf: drive not ready for command Sep 12 18:32:36 Tower kernel: ide2: reset: success Sep 12 18:32:36 Tower kernel: blk: queue c033bfa4, I/O limit 4095Mb (mask 0xffff ffff) Sep 12 18:32:56 Tower kernel: hdf: dma_timer_expiry: dma status == 0x40 Sep 12 18:32:56 Tower kernel: hdf: timeout waiting for DMA Sep 12 18:32:56 Tower kernel: PDC202XX: Primary channel reset. Sep 12 18:32:56 Tower kernel: hdf: timeout waiting for DMA Sep 12 18:32:56 Tower kernel: hdf: (__ide_dma_test_irq) called while not waiting Sep 12 18:32:56 Tower kernel: hdf: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 12 18:32:56 Tower kernel: Sep 12 18:32:56 Tower kernel: hdf: drive not ready for command Sep 12 18:32:56 Tower kernel: hdf: status timeout: status=0xd1 { Busy } Sep 12 18:32:56 Tower kernel: Sep 12 18:32:56 Tower kernel: PDC202XX: Primary channel reset. Sep 12 18:32:56 Tower kernel: hdf: drive not ready for command Sep 12 18:32:56 Tower kernel: ide2: reset: success
September 13, 200619 yr Author And it looks like parity is gone... check what happened when I tried to run parity........................ Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5176 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5184 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5192 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5200 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5208 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5216 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5224 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5232 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5240 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5248 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5256 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5264 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5272 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5280 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5288 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5296 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5304 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5312 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5320 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5328 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5336 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5344 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5352 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5360 Sep 12 18:35:08 Tower kernel: md0: parity incorrect: 5368 this went on and on and on and on and on
September 13, 200619 yr We need to try and isolate the h/w problems & see if they follow the drives or follow the controllers or something else. So I sugest you do this. 1. Let's assume the parity,disk1,disk2,disk3 are all good and the m/b controllers are good. 2. Set up your system so that ONLY disk4,disk5,disk6,disk7 are installed. Init the array config and Start the array - it will see there's no parity so it should just immediately mount and export those 4 drives, and no background parity build will run which might confuse things. 3. Transfer files to these disks individually and see if you get errors. 4. If NO errors, then install disk1,disk2,disk3 (but not parity) and repeat testing. 5. If NO errors, then install disk8, disk9,disk10,disk11 (still no parity) and repeat testing. If you get this far without error let me know or let me where you start getting errors. Sorry if this seems tedius, but it's actually less tedius than what I would to (which is add disks one at a time). I still think it might be a power issue.
September 13, 200619 yr Author Tom, I will try all of this when I get home from work this evening. I can pull a 550watt p/s out of my other computer and use that. Which p/s do you think is overloaded? I bought the same PS, and everything that you used in your complete 12disk system....
September 13, 200619 yr FWIW, I bought his reccomended Sparkle P/S too. I had eight data drives and a parity drive in there before finally buying a second cheapie P/S this weekend. I never once had the sorts of issues you are running into, perhaps the Sparkle P/S have QC issues?
September 13, 200619 yr Author FWIW, I bought his reccomended Sparkle P/S too. I had eight data drives and a parity drive in there before finally buying a second cheapie P/S this weekend. I never once had the sorts of issues you are running into, perhaps the Sparkle P/S have QC issues? I originally purchased 2 of the 350 Sparkle's from NewEgg and the bottom power supply was DOA. They did an RMA and when the new one came it seemed to work. You know, really thinking about everything that has been happening with the server, I really noticed the problems start once I installed /disk7. Disk7,8,9 are all empty because I was never able to get good speed copying to them. I tried different cables, removed the rack units and everything. I kept thinking it was a controller, but I am going to put it through these tests that Tom suggested and I am also going to pull my power supply out of my other pc and try that out. Hopefully I will get to the bottom of this soon!!!
September 14, 200619 yr Author Alright, all disks removed except for disk4,5,6,7 Copied over a 200meg file to disk 7 Error: same as before Sep 13 17:06:02 Tower kernel: hdl: dma_timer_expiry: dma status == 0x60 Sep 13 17:06:02 Tower kernel: hdl: timeout waiting for DMA Sep 13 17:06:02 Tower kernel: PDC202XX: Secondary channel reset. Sep 13 17:06:02 Tower kernel: hdl: timeout waiting for DMA Sep 13 17:06:02 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 13 17:06:02 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 13 17:06:02 Tower kernel: Sep 13 17:06:02 Tower kernel: hdl: drive not ready for command Sep 13 17:06:02 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 13 17:06:02 Tower kernel: Sep 13 17:06:02 Tower kernel: PDC202XX: Secondary channel reset. Sep 13 17:06:02 Tower kernel: hdl: drive not ready for command Sep 13 17:06:02 Tower kernel: ide5: reset: success Sep 13 17:06:22 Tower kernel: hdl: dma_timer_expiry: dma status == 0x40 Sep 13 17:06:22 Tower kernel: hdl: timeout waiting for DMA Sep 13 17:06:22 Tower kernel: PDC202XX: Secondary channel reset. Sep 13 17:06:22 Tower kernel: hdl: timeout waiting for DMA Sep 13 17:06:22 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 13 17:06:22 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 13 17:06:22 Tower kernel: Sep 13 17:06:22 Tower kernel: hdl: drive not ready for command Sep 13 17:06:22 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 13 17:06:22 Tower kernel: Sep 13 17:06:22 Tower kernel: PDC202XX: Secondary channel reset. Sep 13 17:06:22 Tower kernel: hdl: drive not ready for command Sep 13 17:06:22 Tower kernel: ide5: reset: success Sep 13 17:06:43 Tower kernel: hdl: dma_timer_expiry: dma status == 0x40 Sep 13 17:06:43 Tower kernel: hdl: timeout waiting for DMA Sep 13 17:06:43 Tower kernel: PDC202XX: Secondary channel reset. Sep 13 17:06:43 Tower kernel: hdl: timeout waiting for DMA Sep 13 17:06:43 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 13 17:06:43 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 13 17:06:43 Tower kernel: Sep 13 17:06:43 Tower kernel: hdl: drive not ready for command Sep 13 17:06:43 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 13 17:06:43 Tower kernel: Sep 13 17:06:43 Tower kernel: PDC202XX: Secondary channel reset. Sep 13 17:06:43 Tower kernel: hdl: drive not ready for command Sep 13 17:06:43 Tower kernel: ide5: reset: success Sep 13 17:07:03 Tower kernel: hdl: dma_timer_expiry: dma status == 0x40 Sep 13 17:07:03 Tower kernel: hdl: timeout waiting for DMA Sep 13 17:07:03 Tower kernel: PDC202XX: Secondary channel reset. Sep 13 17:07:03 Tower kernel: hdl: timeout waiting for DMA Sep 13 17:07:03 Tower kernel: hdl: (__ide_dma_test_irq) called while not waiting Sep 13 17:07:03 Tower kernel: hdl: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 13 17:07:03 Tower kernel: Sep 13 17:07:03 Tower kernel: hdl: drive not ready for command Sep 13 17:07:03 Tower kernel: hdl: status timeout: status=0xd1 { Busy } Sep 13 17:07:03 Tower kernel: Sep 13 17:07:03 Tower kernel: PDC202XX: Secondary channel reset. Sep 13 17:07:03 Tower kernel: hdl: drive not ready for command Sep 13 17:07:03 Tower kernel: ide5: reset: success
September 14, 200619 yr Author Disk 6: I don't see any erros on the tail log, but it look a while to copy over....
September 14, 200619 yr Author disk 5, it copied but was slow as hell. also i see that disk6 and disk7's LED's are pinged??! why the hell is that? disk4 exactly the same thing. why is it that disk4 or disk5 those LEDs don't light up at all, but disk6 and disk7 are lighting up while copying data over to disk4 or 5? These are currently on the bottom PS, I am going to put them on the first PS
September 14, 200619 yr Author Plugged 4,5,6,7 into top powersupply. I started copying data to disk4 first then 5 and 6. NO PROBLEM. Disk7 is a problem it seems. I am going to take it out of the rack unit and plug directly into the hard drive. I will let you know
September 14, 200619 yr Author heh.. well... disk7 is attached directly into the promise card and it copied 6GB in about 10 minutes.... son of a ....... looks like a bad mobile rack... which would be the second one in my system
September 14, 200619 yr Author There is something more going on too. So, I hooked back me entire system Parity through disk5 are on PS1 (top) and disk6 through disk9 are connected to the bottom PS. I was able to copy the 200 meg file to disk1-disk3 with no problem. The system hung while copying to disk4. So what does this mean? I received the following error while copying the data over to /disk4 with the above configuration. I don't understand this. If this is a power supply issue.. what is wrong with it? Tom, don't you use 2 of the 350 sparkle powersupplies?? Sep 13 19:59:24 Tower smbd[965]: [2006/09/13 19:59:24, 0] lib/util_sock.c:get_pe er_addr(1000) Sep 13 19:59:24 Tower smbd[965]: getpeername failed. Error was Transport endpo int is not connected Sep 13 19:59:24 Tower smbd[965]: [2006/09/13 19:59:24, 0] lib/util_sock.c:write_ socket_data(430) Sep 13 19:59:24 Tower smbd[965]: write_socket_data: write failure. Error = Con nection reset by peer Sep 13 19:59:24 Tower smbd[965]: [2006/09/13 19:59:24, 0] lib/util_sock.c:write_ socket(455) Sep 13 19:59:24 Tower smbd[965]: write_socket: Error writing 4 bytes to socket 22: ERRNO = Connection reset by peer Sep 13 19:59:24 Tower smbd[965]: [2006/09/13 19:59:24, 0] lib/util_sock.c:send_s mb(647) Sep 13 19:59:24 Tower smbd[965]: Error writing 4 bytes to client. -1. (Connect ion reset by peer) Sep 13 20:01:28 Tower kernel: hdi: dma_timer_expiry: dma status == 0x60 Sep 13 20:01:28 Tower kernel: hdi: timeout waiting for DMA Sep 13 20:01:28 Tower kernel: PDC202XX: Primary channel reset. Sep 13 20:01:28 Tower kernel: hdi: timeout waiting for DMA Sep 13 20:01:28 Tower kernel: hdi: (__ide_dma_test_irq) called while not waiting Sep 13 20:01:28 Tower kernel: hdi: status error: status=0x58 { DriveReady SeekCo mplete DataRequest } Sep 13 20:01:28 Tower kernel: Sep 13 20:01:28 Tower kernel: hdi: drive not ready for command Sep 13 20:01:28 Tower kernel: hdi: status timeout: status=0xd0 { Busy } Sep 13 20:01:28 Tower kernel: Sep 13 20:01:28 Tower kernel: PDC202XX: Primary channel reset. Sep 13 20:01:28 Tower kernel: hdi: drive not ready for command Sep 13 20:01:28 Tower kernel: ide4: reset: success Sep 13 20:01:28 Tower kernel: blk: queue c033c710, I/O limit 4095Mb (mask 0xffff ffff) Sep 13 20:01:48 Tower kernel: hdi: dma_timer_expiry: dma status == 0x20 Sep 13 20:01:48 Tower kernel: hdi: timeout waiting for DMA Sep 13 20:01:48 Tower kernel: PDC202XX: Primary channel reset. Sep 13 20:01:48 Tower kernel: hdi: timeout waiting for DMA Sep 13 20:01:48 Tower kernel: hdi: (__ide_dma_test_irq) called while not waiting Sep 13 20:01:48 Tower kernel: hdi: status error: status=0x58 { DriveReady SeekCo I don't know, but this is getting really, really annoying.
September 14, 200619 yr Yes, we use 2 Sparkle 350's. But you, they're still "PC" power supplies & you will run across bad ones from time to time. I understand how frustrating it can get... At this point, Remove all the drives but disk4 (leave disk4 where it is). Reboot - reset array config - and see if drive fails in isolation.
Archived
This topic is now archived and is closed to further replies.