dirtdivr Posted September 9, 2022 Share Posted September 9, 2022 (edited) Hi everyone, I'm relatively new to unraid, and linux in general, so please excuse the simplistic nature of my request. I have been having issues recently, which started out of the blue, with no changes I can think of which would have caused this. I run a basic setup of 1 cache drive SSD, 3x16tb WD Red Plus, 1 of these is parity. Over the last few weeks, every few days, all the dockers will stop. When I check the share section, half of the shares have vanised, leaving just three. The logs show many lines, but the first sign of trouble is Sep 9 03:40:16 Tower crond[1140]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Sep 9 08:40:32 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x8400 SErr 0x0 action 0x6 frozen Sep 9 08:40:32 Tower kernel: ata1.00: failed command: READ FPDMA QUEUED Sep 9 08:40:32 Tower kernel: ata1.00: cmd 60/00:50:d0:fe:18/01:00:2f:00:00/40 tag 10 ncq dma 131072 in Sep 9 08:40:32 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 9 08:40:32 Tower kernel: ata1.00: status: { DRDY } Sep 9 08:40:32 Tower kernel: ata1.00: failed command: READ FPDMA QUEUED Sep 9 08:40:32 Tower kernel: ata1.00: cmd 60/08:78:d8:ea:e4/00:00:2c:00:00/40 tag 15 ncq dma 4096 in Sep 9 08:40:32 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 9 08:40:32 Tower kernel: ata1.00: status: { DRDY } Sep 9 08:40:32 Tower kernel: ata1: hard resetting link Sep 9 08:40:33 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 9 08:40:38 Tower kernel: ata1.00: qc timeout (cmd 0xec) Sep 9 08:40:38 Tower kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Sep 9 08:40:38 Tower kernel: ata1.00: revalidation failed (errno=-5) Sep 9 08:40:38 Tower kernel: ata1: hard resetting link Sep 9 08:40:38 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 9 08:40:43 Tower kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20210730/psargs-330) Sep 9 08:40:43 Tower kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.SPT0._GTF due to previous error (AE_NOT_FOUND) (20210730/psparse-529) Sep 9 08:40:43 Tower kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20210730/psargs-330) Sep 9 08:40:43 Tower kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.SPT0._GTF due to previous error (AE_NOT_FOUND) (20210730/psparse-529) Sep 9 08:40:43 Tower kernel: ata1.00: configured for UDMA/133 Sep 9 08:40:43 Tower kernel: sd 2:0:0:0: [sdc] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=40s Sep 9 08:40:43 Tower kernel: sd 2:0:0:0: [sdc] tag#10 Sense Key : 0x5 [current] Sep 9 08:40:43 Tower kernel: sd 2:0:0:0: [sdc] tag#10 ASC=0x21 ASCQ=0x4 Sep 9 08:40:43 Tower kernel: sd 2:0:0:0: [sdc] tag#10 CDB: opcode=0x28 28 00 2f 18 fe d0 00 01 00 00 Sep 9 08:40:43 Tower kernel: blk_update_request: I/O error, dev sdc, sector 790167248 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0 Sep 9 08:40:43 Tower kernel: ata1: EH complete Sep 9 08:42:25 Tower kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x6 frozen Sep 9 08:42:25 Tower kernel: ata1.00: failed command: WRITE FPDMA QUEUED Sep 9 08:42:25 Tower kernel: ata1.00: cmd 61/80:00:a0:2b:2e/03:00:2f:00:00/40 tag 0 ncq dma 458752 out Sep 9 08:42:25 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) I've attached the logs, if this provides any further help, at the moment after this happens a reboot fixesays the issue for a few days and then it happens again. I have recently updated to the newest version of Unraid hoping this would fix the issue but it hasnt. The logs also mention SDC issues which I have checked and this is my cached drive (SSD). Could this be an indication that this drive is failing and should be replaced? syslog_edited.txt Edited September 9, 2022 by dirtdivr Quote Link to comment
JorgeB Posted September 9, 2022 Share Posted September 9, 2022 Please post the diagnostics. Quote Link to comment
dirtdivr Posted September 9, 2022 Author Share Posted September 9, 2022 Here are the most recent diagnostics. Unfortunately I only have a download of the logs, i have since rebooted. tower-diagnostics-20220909-1157.zip Quote Link to comment
JorgeB Posted September 9, 2022 Share Posted September 9, 2022 Cache SSD is dropping offline, SMART looks OK, try swapping cables (both power and SATA) with a different disk, or just replace them. Quote Link to comment
dirtdivr Posted September 9, 2022 Author Share Posted September 9, 2022 (edited) Perfect. I've ordered a new SSD. I'll swap the cables and see if this helps. Thank you for your help. Edited September 9, 2022 by dirtdivr Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.