Last night I powered off my unraid server to reconnect an SSD drive (an old Windows OS drive), which I needed to copy some data from. When I started up again I also connected a USB hard drive to copy the data to. I completed what I needed to do successfully and unmounted these drives. About 15 minutes later the following errors occurred (I actually didn't notice the problem until today but am looking at the logs etc now). I believe the array was being written to at the time (almost certainly not disk1 though):
Dec 22 22:56:47 Tower ntfs-3g[9507]: Unmounting /dev/sdc1 (Work)
Dec 22 23:15:27 Tower kernel: ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen
Dec 22 23:15:27 Tower kernel: ata6.00: irq_stat 0x08000000, interface fatal error
Dec 22 23:15:27 Tower kernel: ata6: SError: { UnrecovData 10B8B BadCRC }
Dec 22 23:15:27 Tower kernel: ata6.00: failed command: READ DMA EXT
Dec 22 23:15:27 Tower kernel: ata6.00: cmd 25/00:00:e0:66:c4/00:04:7d:00:00/e0 tag 0 dma 524288 in
Dec 22 23:15:27 Tower kernel: res 50/00:00:df:66:c4/00:00:7d:00:00/e0 Emask 0x10 (ATA bus error)
Dec 22 23:15:27 Tower kernel: ata6.00: status: { DRDY }
Dec 22 23:15:27 Tower kernel: ata6: hard resetting link
(repeats several times)
...
Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023328
Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023336
Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023344
Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023352
Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023360
Dec 22 23:15:36 Tower kernel: md: disk1 read error, sector=2110023368
(repeats lots of times with different sector numbers!)
...
This carries on until 23:30, but then nothing happens until around 3am when unraid decides the disk needs to disabled, then it e-mails me at 03:47.
I've since stopped the array, noting that the disk had disappeared. Shutdown, disconnected the SSD I had connected last night and checked all the SATA connections. Powered up again and the disk is visible again, but still red. I've attached a SMART report along with the syslog.
I assume I need to do the un-assign, start, stop, reassign procedure regardless, but do the errors in the logs suggest a SATA hardware issue or dodgy connection, or that the disk is on the way out? All seems a bit co-incidental with me connecting another drive, so hopefully the disk is not toast?
unraid_syslog-2014-12-23.txt
smartctl.txt