June 8, 201016 yr Hi all, My disk15 failed, so I replaced it with a new drive. Attached are some of the errors that occurred while rebuilding. The rebuild speed slowed from 20,000KB/sec to as low as 136KB/sec, then sped up again (it's still rebuilding). Should I worry about these errors? And if the rebuild failed and disk15 is still disabled, will parity still simulate disk15 until I can rebuild it again? Thanks! EDIT: I've attached syslog2, which should cover the entire period of errors (they have now ended) syslog.zip syslog2.zip
June 9, 201016 yr Author The rebuild completed without any more errors and I've checked the data on the rebuilt disc and the last few files written before the failure and it all seems fine. When the rebuild finished, a parity check was triggered automatically? I let that run, then when it finished another one was triggered. Why was that? I canceled the second check. syslog is attached -- if the experts have time to review it, I'd appreciate it. Thanks! syslog4.zip
June 14, 201016 yr Sorry to say, but these errors seem to indicate communications problems (cabling, noise, etc) You might be stressing your power supply with so many drives, or it could be induced noise if you've neatly tied all the sata cables into a bundle. Basically, the parity drive is returning data that is being interpreted by the disk controller as having a bad crc checksum over the SATA link. un 8 16:25:57 Tower kernel: ata17.00: error: { ICRC ABRT } Jun 8 16:25:57 Tower kernel: ata17: hard resetting link Jun 8 16:25:58 Tower kernel: ata17: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 8 16:25:58 Tower kernel: ata17.00: configured for UDMA/33 Jun 8 16:25:58 Tower kernel: ata17: EH complete Jun 8 16:25:58 Tower kernel: ata17.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 Jun 8 16:25:58 Tower kernel: ata17.00: BMDMA stat 0x26 Jun 8 16:25:58 Tower kernel: ata17: SError: { UnrecovData HostInt 10B8B BadCRC } Jun 8 16:25:58 Tower kernel: ata17.00: failed command: READ DMA EXT Jun 8 16:25:58 Tower kernel: ata17.00: cmd 25/00:00:b7:e6:50/00:04:04:00:00/e0 tag 0 dma 524288 in Jun 8 16:25:58 Tower kernel: res 51/84:ff:b7:e6:50/84:00:00:00:00/e0 Emask 0x70 (host bus error) Jun 8 16:25:58 Tower kernel: ata17.00: status: { DRDY ERR } Jun 8 16:25:58 Tower kernel: ata17.00: error: { ICRC ABRT } Jun 8 16:25:58 Tower kernel: ata17: hard resetting link Jun 8 16:25:58 Tower kernel: ata17: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 8 16:25:58 Tower kernel: ata17.00: configured for UDMA/33 Jun 8 16:25:58 Tower kernel: ata17: EH complete Jun 8 16:25:58 Tower kernel: ata17.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 Jun 8 16:25:58 Tower kernel: ata17.00: BMDMA stat 0x26 Jun 8 16:25:58 Tower kernel: ata17: SError: { UnrecovData 10B8B BadCRC } Jun 8 16:25:58 Tower kernel: ata17.00: failed command: READ DMA EXT Jun 8 16:25:58 Tower kernel: ata17.00: cmd 25/00:00:b7:e6:50/00:04:04:00:00/e0 tag 0 dma 524288 in Jun 8 16:25:58 Tower kernel: res 51/84:0f:b7:e6:50/84:00:00:00:00/e0 Emask 0x30 (host bus error) Jun 8 16:25:58 Tower kernel: ata17.00: status: { DRDY ERR } Jun 8 16:25:58 Tower kernel: ata17.00: error: { ICRC ABRT } Jun 8 16:25:58 Tower kernel: ata17: hard resetting link Jun 8 16:25:59 Tower kernel: ata17: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 8 16:25:59 Tower kernel: ata17.00: configured for UDMA/33 Jun 8 16:25:59 Tower kernel: sd 17:0:0:0: [sdr] Result: hostbyte=0x00 driverbyte=0x08 Jun 8 16:25:59 Tower kernel: sd 17:0:0:0: [sdr] Sense Key : 0xb [current] [descriptor] Jun 8 16:25:59 Tower kernel: Descriptor sense data with sense descriptors (in hex): Jun 8 16:25:59 Tower kernel: 72 0b 47 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Jun 8 16:25:59 Tower kernel: 00 50 e6 b7 Jun 8 16:25:59 Tower kernel: sd 17:0:0:0: [sdr] ASC=0x47 ASCQ=0x0 Jun 8 16:25:59 Tower kernel: sd 17:0:0:0: [sdr] CDB: cdb[0]=0x28: 28 00 04 50 e6 b7 00 04 00 00 Jun 8 16:25:59 Tower kernel: end_request: I/O error, dev sdr, sector 72410807 Jun 8 16:25:59 Tower kernel: md: disk0 read error Jun 8 16:25:59 Tower kernel: handle_stripe read error: 72410744/0, count: 1
June 14, 201016 yr Author Thanks Joe. I think it might have been cabling, as I checked the connections and haven't seen any errors since. Were the errors corrected? In other words, do I have to worry about the integrity of the data on that drive? Cheers.
June 14, 201016 yr Thanks Joe. I think it might have been cabling, as I checked the connections and haven't seen any errors since. Were the errors corrected? In other words, do I have to worry about the integrity of the data on that drive? Cheers. I would do a parity check. It will read the parity disk and correct it if it needs correction. Then, if it finds parity errors, and you do not get disk errors in the syslog, do a second parity check. It should find NO errors. Joe L.
Archived
This topic is now archived and is closed to further replies.