[4.6] Weird crash - disk problems or other hardware? - General Support (V5 and Older)

January 15, 201115 yr

Fairly new to unRaid - have been experimenting a lot, and finally thought I had stable system... and now this:

Went to watch movie last night, no connection via shares or http interface. Telnet allowed me in, but commands didn't do anything. Had to do a hard shutdown. Since then, saw that the after reboot the file system replayed a couple transactions (which I think is a good thing). Ran a parity check, no update, with 128 errors. Then ran parity check with update, the same 128 errors got corrected. However, before the last parity check completed, same issue cropped up. This time, I was able to pull down the syslog before losing all connections. It's posted below. I see a bunch of disk errors, but not sure if that's the problem. Tried to run smartctl from command line, and it worked for some drives but not others. Issuing reboot or powerdown from command line had no effect. You can actually see multiple restart commands at the end of the log.

Ideas?

syslog.zip

Quote

January 15, 201115 yr

You are right.

One of your disks (your parity disk /dev/sdg) is failing to respond.

n 15 12:05:20 Hurricane kernel: hda: ide_dma_sff_timer_expiry: DMA status (0x22)
Jan 15 12:05:20 Hurricane kernel: hda: DMA timeout error
Jan 15 12:05:20 Hurricane kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
Jan 15 12:05:20 Hurricane kernel: hda: possibly failed opcode: 0x35
Jan 15 12:05:20 Hurricane kernel: hda: DMA disabled
Jan 15 12:05:20 Hurricane kernel: ide0: reset: success
Jan 15 12:05:31 Hurricane kernel: ata7.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x6 frozen
Jan 15 12:05:31 Hurricane kernel: ata7: SError: { HostInt }
Jan 15 12:05:31 Hurricane kernel: ata7.00: failed command: READ DMA EXT
Jan 15 12:05:31 Hurricane kernel: ata7.00: cmd 25/00:00:c7:5a:cb/00:04:91:00:00/e0 tag 0 dma 524288 in
Jan 15 12:05:31 Hurricane kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x44 (timeout)
Jan 15 12:05:31 Hurricane kernel: ata7.00: status: { DRDY }
Jan 15 12:05:31 Hurricane kernel: ata7: hard resetting link
Jan 15 12:05:31 Hurricane kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 15 12:05:36 Hurricane kernel: ata7.00: qc timeout (cmd 0xec)
Jan 15 12:05:36 Hurricane kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 15 12:05:36 Hurricane kernel: ata7.00: revalidation failed (errno=-5)
Jan 15 12:05:36 Hurricane kernel: ata7: hard resetting link
Jan 15 12:05:37 Hurricane kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 15 12:05:47 Hurricane kernel: ata7.00: qc timeout (cmd 0xec)
Jan 15 12:05:47 Hurricane kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 15 12:05:47 Hurricane kernel: ata7.00: revalidation failed (errno=-5)
Jan 15 12:05:47 Hurricane kernel: ata7: limiting SATA link speed to 1.5 Gbps
Jan 15 12:05:47 Hurricane kernel: ata7: hard resetting link
Jan 15 12:05:47 Hurricane kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan 15 12:05:49 Hurricane kernel:  sdf: unknown partition table
Jan 15 12:05:51 Hurricane kernel: hda: lost interrupt
Jan 15 12:06:17 Hurricane kernel: ata7.00: qc timeout (cmd 0xec)
Jan 15 12:06:17 Hurricane kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 15 12:06:17 Hurricane kernel: ata7.00: revalidation failed (errno=-5)
Jan 15 12:06:17 Hurricane kernel: ata7.00: disabled
Jan 15 12:06:17 Hurricane kernel: ata7.00: device reported invalid CHS sector 0
Jan 15 12:06:17 Hurricane kernel: ata7: hard resetting link
Jan 15 12:06:18 Hurricane kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan 15 12:06:18 Hurricane kernel: ata7: EH complete
Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Unhandled error code
Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00
Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] CDB: cdb[0]=0x28: 28 00 91 cb 5a c7 00 04 00 00
Jan 15 12:06:18 Hurricane kernel: end_request: I/O error, dev sdg, sector 2446023367
Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Unhandled error code
Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00
Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] CDB: cdb[0]=0x28: 28 00 91 cb 5e c7 00 01 e0 00
Jan 15 12:06:18 Hurricane kernel: end_request: I/O error, dev sdg, sector 2446024391
Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Unhandled error code
Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00
Jan 15 12:06:18 Hurricane kernel: sd 6:0:0:0: [sdg] CDB: cdb[0]=0x28: 28 00 91 cb 60 a7 00 03 20 00
Jan 15 12:06:18 Hurricane kernel: end_request: I/O error, dev sdg, sector 2446024871
Jan 15 12:06:18 Hurricane kernel: md: disk0 read error
Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023304/0, count: 1
Jan 15 12:06:18 Hurricane kernel: md: disk0 read error
Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023312/0, count: 1
Jan 15 12:06:18 Hurricane kernel: md: disk0 read error
Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023320/0, count: 1
Jan 15 12:06:18 Hurricane kernel: md: disk0 read error
Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023328/0, count: 1
Jan 15 12:06:18 Hurricane kernel: md: disk0 read error
Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023336/0, count: 1
Jan 15 12:06:18 Hurricane kernel: md: disk0 read error
Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023344/0, count: 1
Jan 15 12:06:18 Hurricane kernel: md: disk0 read error
Jan 15 12:06:18 Hurricane kernel: handle_stripe read error: 2446023352/0, count: 1

It could be a bad disk or a bad cable, or a loose cable.

Joe L.

Quote

January 15, 201115 yr

Author

I'm suspecting the cable. I had another drive in there originally (same make/model - Caviar Black) that was giving different errors. I swapped the drive and the errors went away for a while, but now this. I've replaced the cable and will run the checks again.

Interestingly, when I rebooted, it showed both the parity and cache drive as "New" - not sure why this happened. I'm rebuilding parity now.

Will a drive error make the entire box hang like that - to the point of not even repsonding to a reboot or powerdown command?

Quote

January 15, 201115 yr

I'm suspecting the cable. I had another drive in there originally (same make/model - Caviar Black) that was giving different errors. I swapped the drive and the errors went away for a while, but now this. I've replaced the cable and will run the checks again.

Interestingly, when I rebooted, it showed both the parity and cache drive as "New" - not sure why this happened. I'm rebuilding parity now.

Because you booted with them not responding.

Will a drive error make the entire box hang like that - to the point of not even repsonding to a reboot or powerdown command?

It is your hardware.... but yes... a single disk can lock up the entire server if it confuses the disk controller and the person who wrote the driver did not anticipate that set of conditions.

Quote

[4.6] Weird crash - disk problems or other hardware?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)