December 23, 201411 yr Last night I powered off my unraid server to reconnect an SSD drive (an old Windows OS drive), which I needed to copy some data from. When I started up again I also connected a USB hard drive to copy the data to. I completed what I needed to do successfully and unmounted these drives. About 15 minutes later the following errors occurred (I actually didn't notice the problem until today but am looking at the logs etc now). I believe the array was being written to at the time (almost certainly not disk1 though): Dec 22 22:56:47 Tower ntfs-3g[9507]: Unmounting /dev/sdc1 (Work) Dec 22 23:15:27 Tower kernel: ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Dec 22 23:15:27 Tower kernel: ata6.00: irq_stat 0x08000000, interface fatal error Dec 22 23:15:27 Tower kernel: ata6: SError: { UnrecovData 10B8B BadCRC } Dec 22 23:15:27 Tower kernel: ata6.00: failed command: READ DMA EXT Dec 22 23:15:27 Tower kernel: ata6.00: cmd 25/00:00:e0:66:c4/00:04:7d:00:00/e0 tag 0 dma 524288 in Dec 22 23:15:27 Tower kernel: res 50/00:00:df:66:c4/00:00:7d:00:00/e0 Emask 0x10 (ATA bus error) Dec 22 23:15:27 Tower kernel: ata6.00: status: { DRDY } Dec 22 23:15:27 Tower kernel: ata6: hard resetting link (repeats several times) ... Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023328 Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023336 Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023344 Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023352 Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023360 Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023368 (repeats lots of times with different sector numbers!) ... This carries on until 23:30, but then nothing happens until around 3am when unraid decides the disk needs to disabled, then it e-mails me at 03:47. I've since stopped the array, noting that the disk had disappeared. Shutdown, disconnected the SSD I had connected last night and checked all the SATA connections. Powered up again and the disk is visible again, but still red. I've attached a SMART report along with the syslog. I assume I need to do the un-assign, start, stop, reassign procedure regardless, but do the errors in the logs suggest a SATA hardware issue or dodgy connection, or that the disk is on the way out? All seems a bit co-incidental with me connecting another drive, so hopefully the disk is not toast? unraid_syslog-2014-12-23.txt smartctl.txt
December 23, 201411 yr I can't be completely sure of the exact problem, but the obvious one is a very bad SATA cable. Almost all of the exceptions (disk issues) included the BadCRC flag, which typically indicates a bad SATA cable, although could be bad or noisy power, or serious crosstalk or other interference with the cable signal. In your case, I think the cable is so bad that it did NOT include the BadCRC flag a few times (yet had a number of other indicators of packet corruption), because it was so corrupted it didn't even get to the CRC check. In addition, your SMART report looked fine, except for a large number of CRC errors, so this has probably been a problem for awhile. I'm not sure, but it's possible all of the issues are just because of the very bad cable. When the drive was finally marked as 'disabled', communications were so bad that the drive was not able to correctly transmit identification info, so the kernel gave up on the drive. As you can tell, it's the cable to Disk 1, sdg, on ata6, the Samsung 1.5tb.
December 23, 201411 yr Author Thanks! Rebuild is in progress now but I'll change the cable after and keep an eye on the CRC error count for a while. Out of interest does Unraid care if I move the drive to a different SATA port (might try this if the CRC error count keeps increasing after changing the cable).
December 23, 201411 yr Out of interest does Unraid care if I move the drive to a different SATA port (might try this if the CRC error count keeps increasing after changing the cable). No problem changing it, to another port or even another controller. On boot, UnRAID finds the drives by serial number, not slot or connection.
Archived
This topic is now archived and is closed to further replies.