Hi all,
I've recently started having issues with my server. It seemed to completely lock up every day or so. When this happened I could not reach it via ssh, the web gui, and plex stopped responding too.
At first I had to hard reset the server, but I started paying more attention to the syslog and later I also made sure the syslog was written to a file (using: tail -f /var/log/syslog > /mnt/user/data/syslog.txt). After looking through them I found these errors that seemed to occur whenever a lot of stuff was being written to the HDDs:
Jun 17 21:35:03 Tesla kernel: ata1.00: exception Emask 0x50 SAct 0x1000 SErr 0x280900 action 0x6 frozen
Jun 17 21:35:03 Tesla kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Jun 17 21:35:03 Tesla kernel: ata1: SError: { UnrecovData HostInt 10B8B BadCRC }
Jun 17 21:35:03 Tesla kernel: ata1.00: failed command: READ FPDMA QUEUED
Jun 17 21:35:03 Tesla kernel: ata1.00: cmd 60/48:60:d0:c2:8e/00:00:0c:00:00/40 tag 12 ncq dma 36864 in
Jun 17 21:35:03 Tesla kernel: res 40/00:64:d0:c2:8e/00:00:0c:00:00/40 Emask 0x50 (ATA bus error)
Jun 17 21:35:03 Tesla kernel: ata1.00: status: { DRDY }
Jun 17 21:35:03 Tesla kernel: ata1: hard resetting link
Jun 17 21:35:03 Tesla kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jun 17 21:35:03 Tesla kernel: ata1.00: configured for UDMA/133
Jun 17 21:35:03 Tesla kernel: ata1: EH complete
Sometimes theres mere seconds between these errors, and sometimes it's fine for 5 minutes. What I've found that really sets it off is backing up all docker data and then invoking the mover.
What I mostly would like to know is: to which drive does ata1.00 correspond? So as I am writing this I am looking through the diagnostics and found this at the beginning of the syslog:
Jun 17 17:30:17 Tesla kernel: ata1.00: ATA-8: SanDisk SD6SF1M128G, 133287403247, X231200, max UDMA/133
So it seems like it is one of my SSDs, and not an HDD as I would have thought. I've already tried replacing my data drive, and was gonna try replacing the cache, but it seems like I'll have to replace this SSD. I have already tried using different SATA cables, and my PSU is very new and probably a bit overkill for the system, so I doubt either one of those is the problem.
For now I'll have to rip out the SSD (and just run on one for the time being) and see if that fixes the problem. I'll post any updates here when I have tested more.
My diagnostics: tesla-diagnostics-20170617-2200.zip