January 15, 201115 yr Fairly new to unRaid - have been experimenting a lot, and finally thought I had stable system... and now this: Went to watch movie last night, no connection via shares or http interface. Telnet allowed me in, but commands didn't do anything. Had to do a hard shutdown. Since then, saw that the after reboot the file system replayed a couple transactions (which I think is a good thing). Ran a parity check, no update, with 128 errors. Then ran parity check with update, the same 128 errors got corrected. However, before the last parity check completed, same issue cropped up. This time, I was able to pull down the syslog before losing all connections. It's posted below. I see a bunch of disk errors, but not sure if that's the problem. Tried to run smartctl from command line, and it worked for some drives but not others. Issuing reboot or powerdown from command line had no effect. You can actually see multiple restart commands at the end of the log. Ideas? syslog.zip
January 15, 201115 yr You are right. One of your disks (your parity disk /dev/sdg) is failing to respond. n 15 12:05:20 Hurricane kernel: hda: ide_dma_sff_timer_expiry: DMA status (0x22) Jan 15 12:05:20 Hurricane kernel: hda: DMA timeout error Jan 15 12:05:20 Hurricane kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } Jan 15 12:05:20 Hurricane kernel: hda: possibly failed opcode: 0x35 Jan 15 12:05:20 Hurricane kernel: hda: DMA disabled Jan 15 12:05:20 Hurricane kernel: ide0: reset: success Jan 15 12:05:31 Hurricane kernel: ata7.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x6 frozen Jan 15 12:05:31 Hurricane kernel: ata7: SError: { HostInt } Jan 15 12:05:31 Hurricane kernel: ata7.00: failed command: READ DMA EXT Jan 15 12:05:31 Hurricane kernel: ata7.00: cmd 25/00:00:c7:5a:cb/00:04:91:00:00/e0 tag 0 dma 524288 in Jan 15 12:05:31 Hurricane kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x44 (timeout) Jan 15 12:05:31 Hurricane kernel: ata7.00: status: { DRDY } Jan 15 12:05:31 Hurricane kernel: ata7: hard resetting link Jan 15 12:05:31 Hurricane kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 15 12:05:36 Hurricane kernel: ata7.00: qc timeout (cmd 0xec) Jan 15 12:05:36 Hurricane kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jan 15 12:05:36 Hurricane kernel: ata7.00: revalidation failed (errno=-5) Jan 15 12:05:36 Hurricane kernel: ata7: hard resetting link Jan 15 12:05:37 Hurricane kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 15 12:05:47 Hurricane kernel: ata7.00: qc timeout (cmd 0xec) Jan 15 12:05:47 Hurricane kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jan 15 12:05:47 Hurricane kernel: ata7.00: revalidation failed (errno=-5) Jan 15 12:05:47 Hurricane kernel: ata7: limiting SATA link speed to 1.5 Gbps Jan 15 12:05:47 Hurricane kernel: ata7: hard resetting link Jan 15 12:05:47 Hurricane kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jan 15 12:05:49 Hurricane kernel: sdf: unknown partition table Jan 15 12:05:51 Hurricane kernel: hda: lost interrupt Jan 15 12:06:17 Hurricane kernel: ata7.00: qc timeout (cmd 0xec) Jan 15 12:06:17 Hurricane kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jan 15 12:06:17 Hurricane kernel: ata7.00: revalidation failed (errno=-5) Jan 15 12:06:17 Hurricane kernel: ata7.00: disabled Jan 15 12:06:17 Hurricane kernel: ata7.00: device reported invalid CHS sector 0 Jan 15 12:06:17 Hurricane kernel: ata7: hard resetting link Jan 15 12:06:18 Hurricane kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jan 15 12:06:18 Hurricane kernel: ata7: EH complete Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Unhandled error code Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00 Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] CDB: cdb[0]=0x28: 28 00 91 cb 5a c7 00 04 00 00 Jan 15 12:06:18 Hurricane kernel: end_request: I/O error, dev sdg, sector 2446023367 Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Unhandled error code Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00 Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] CDB: cdb[0]=0x28: 28 00 91 cb 5e c7 00 01 e0 00 Jan 15 12:06:18 Hurricane kernel: end_request: I/O error, dev sdg, sector 2446024391 Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Unhandled error code Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00 Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] CDB: cdb[0]=0x28: 28 00 91 cb 60 a7 00 03 20 00 Jan 15 12:06:18 Hurricane kernel: end_request: I/O error, dev sdg, sector 2446024871 Jan 15 12:06:18 Hurricane kernel: md: disk0 read error Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023304/0, count: 1 Jan 15 12:06:18 Hurricane kernel: md: disk0 read error Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023312/0, count: 1 Jan 15 12:06:18 Hurricane kernel: md: disk0 read error Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023320/0, count: 1 Jan 15 12:06:18 Hurricane kernel: md: disk0 read error Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023328/0, count: 1 Jan 15 12:06:18 Hurricane kernel: md: disk0 read error Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023336/0, count: 1 Jan 15 12:06:18 Hurricane kernel: md: disk0 read error Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023344/0, count: 1 Jan 15 12:06:18 Hurricane kernel: md: disk0 read error Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023352/0, count: 1 It could be a bad disk or a bad cable, or a loose cable. Joe L.
January 15, 201115 yr Author I'm suspecting the cable. I had another drive in there originally (same make/model - Caviar Black) that was giving different errors. I swapped the drive and the errors went away for a while, but now this. I've replaced the cable and will run the checks again. Interestingly, when I rebooted, it showed both the parity and cache drive as "New" - not sure why this happened. I'm rebuilding parity now. Will a drive error make the entire box hang like that - to the point of not even repsonding to a reboot or powerdown command?
January 15, 201115 yr I'm suspecting the cable. I had another drive in there originally (same make/model - Caviar Black) that was giving different errors. I swapped the drive and the errors went away for a while, but now this. I've replaced the cable and will run the checks again. Interestingly, when I rebooted, it showed both the parity and cache drive as "New" - not sure why this happened. I'm rebuilding parity now. Because you booted with them not responding. Will a drive error make the entire box hang like that - to the point of not even repsonding to a reboot or powerdown command? It is your hardware.... but yes... a single disk can lock up the entire server if it confuses the disk controller and the person who wrote the driver did not anticipate that set of conditions.
Archived
This topic is now archived and is closed to further replies.