September 28, 20169 yr Yesterday I replace my cache drive (single SSD) with a new matched pair. Formatted as RAID0, and copied everything back to the new cache. Everything seems to be working fine. Copy to and from no errors. Today I checked the syslog curious to see if there was anything regarding the new additions. I saw a bunch of these "warnings" which started up about 9 AM. Replaced the SATA cables, restarted and oddly enough I had lost Cache config. Was able to reset the 2 disks in order and everything appeared to be okay (dockers, folders etc). Now its 5 PM and I am seeing the warning/errors again. Help? Sep 27 16:03:59 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x1f000 SErr 0x0 action 0x6 frozen Sep 27 16:03:59 Tower kernel: ata9.00: failed command: SEND FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 64/01:60:00:00:00/00:00:00:00:00/a0 tag 12 ncq 512 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/60:68:50:3b:0b/00:00:01:00:00/40 tag 13 ncq 49152 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/70:70:50:3a:0b/00:00:01:00:00/40 tag 14 ncq 57344 out Sep 27 16:03:59 Tower kernel: res 40/00:00:01:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/80:78:c0:3a:0b/00:00:01:00:00/40 tag 15 ncq 65536 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/10:80:40:3b:0b/00:00:01:00:00/40 tag 16 ncq 8192 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9: hard resetting link Sep 27 16:03:59 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 27 16:03:59 Tower kernel: ata9.00: supports DRM functions and may not be fully accessible Sep 27 16:03:59 Tower kernel: ata9.00: supports DRM functions and may not be fully accessible Sep 27 16:03:59 Tower kernel: ata9.00: configured for UDMA/133 Sep 27 16:03:59 Tower kernel: ata9: EH complete Sep 27 16:03:59 Tower kernel: ata9.00: Enabling discard_zeroes_data Sep 27 16:04:30 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x1c0000 SErr 0x0 action 0x6 frozen Sep 27 16:04:30 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:04:30 Tower kernel: ata10.00: cmd 61/c0:90:00:99:19/00:00:00:00:00/40 tag 18 ncq 98304 out Sep 27 16:04:30 Tower kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:04:30 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:04:30 Tower kernel: ata10.00: failed command: SEND FPDMA QUEUED Sep 27 16:04:30 Tower kernel: ata10.00: cmd 64/01:98:00:00:00/00:00:00:00:00/a0 tag 19 ncq 512 out Sep 27 16:04:30 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:04:30 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:04:30 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:04:30 Tower kernel: ata10.00: cmd 61/28:a0:98:a5:e9/00:00:00:00:00/40 tag 20 ncq 20480 out Sep 27 16:04:30 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:04:30 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:04:30 Tower kernel: ata10: hard resetting link Sep 27 16:04:30 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 27 16:04:30 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:04:30 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:04:30 Tower kernel: ata10.00: configured for UDMA/133 Sep 27 16:04:30 Tower kernel: ata10: EH complete Sep 27 16:04:30 Tower kernel: ata10.00: Enabling discard_zeroes_data Sep 27 16:05:01 Tower kernel: ata10.00: NCQ disabled due to excessive errors Sep 27 16:05:01 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x780 SErr 0x0 action 0x6 frozen Sep 27 16:05:01 Tower kernel: ata10.00: failed command: SEND FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 64/01:38:00:00:00/00:00:00:00:00/a0 tag 7 ncq 512 out Sep 27 16:05:01 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 61/08:40:c0:00:00/00:00:00:00:00/40 tag 8 ncq 4096 out Sep 27 16:05:01 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 61/08:48:40:00:02/00:00:00:00:00/40 tag 9 ncq 4096 out Sep 27 16:05:01 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 61/08:50:40:00:00/00:00:20:00:00/40 tag 10 ncq 4096 out Sep 27 16:05:01 Tower kernel: res 40/00:00:00:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10: hard resetting link Sep 27 16:05:01 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 27 16:05:01 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:05:01 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:05:01 Tower kernel: ata10.00: configured for UDMA/133 Sep 27 16:05:01 Tower kernel: ata10: EH complete Sep 27 16:05:01 Tower kernel: ata10.00: Enabling discard_zeroes_data Sep 27 16:05:15 Tower root: /mnt/cache: 399.7 GiB (429127376896 bytes) trimmed tower-diagnostics-20160927-1701.zip
September 28, 20169 yr Since the errors are on both SSD devices and they share the Asmedia controller, I would start by trying in a different slot for it if available, if errors continue try a different controller.
September 28, 20169 yr Author Since the errors are on both SSD devices and they share the Asmedia controller, I would start by trying in a different slot for it if available, if errors continue try a different controller. No new log entries since that bunch yesterday. Will keep an eye on it. Did a pretty big Mover move last night with no errors.
September 28, 20169 yr Yesterday I replace my cache drive (single SSD) with a new matched pair. Formatted as RAID0, and copied everything back to the new cache. Everything seems to be working fine. Copy to and from no errors. Today I checked the syslog curious to see if there was anything regarding the new additions. I saw a bunch of these "warnings" which started up about 9 AM. Replaced the SATA cables, restarted and oddly enough I had lost Cache config. Was able to reset the 2 disks in order and everything appeared to be okay (dockers, folders etc). Now its 5 PM and I am seeing the warning/errors again. Help? Sep 27 16:03:59 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x1f000 SErr 0x0 action 0x6 frozen Sep 27 16:03:59 Tower kernel: ata9.00: failed command: SEND FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 64/01:60:00:00:00/00:00:00:00:00/a0 tag 12 ncq 512 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/60:68:50:3b:0b/00:00:01:00:00/40 tag 13 ncq 49152 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/70:70:50:3a:0b/00:00:01:00:00/40 tag 14 ncq 57344 out Sep 27 16:03:59 Tower kernel: res 40/00:00:01:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/80:78:c0:3a:0b/00:00:01:00:00/40 tag 15 ncq 65536 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/10:80:40:3b:0b/00:00:01:00:00/40 tag 16 ncq 8192 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9: hard resetting link Sep 27 16:03:59 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 27 16:03:59 Tower kernel: ata9.00: supports DRM functions and may not be fully accessible Sep 27 16:03:59 Tower kernel: ata9.00: supports DRM functions and may not be fully accessible Sep 27 16:03:59 Tower kernel: ata9.00: configured for UDMA/133 Sep 27 16:03:59 Tower kernel: ata9: EH complete Sep 27 16:03:59 Tower kernel: ata9.00: Enabling discard_zeroes_data Sep 27 16:04:30 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x1c0000 SErr 0x0 action 0x6 frozen Sep 27 16:04:30 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:04:30 Tower kernel: ata10.00: cmd 61/c0:90:00:99:19/00:00:00:00:00/40 tag 18 ncq 98304 out Sep 27 16:04:30 Tower kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:04:30 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:04:30 Tower kernel: ata10.00: failed command: SEND FPDMA QUEUED Sep 27 16:04:30 Tower kernel: ata10.00: cmd 64/01:98:00:00:00/00:00:00:00:00/a0 tag 19 ncq 512 out Sep 27 16:04:30 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:04:30 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:04:30 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:04:30 Tower kernel: ata10.00: cmd 61/28:a0:98:a5:e9/00:00:00:00:00/40 tag 20 ncq 20480 out Sep 27 16:04:30 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:04:30 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:04:30 Tower kernel: ata10: hard resetting link Sep 27 16:04:30 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 27 16:04:30 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:04:30 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:04:30 Tower kernel: ata10.00: configured for UDMA/133 Sep 27 16:04:30 Tower kernel: ata10: EH complete Sep 27 16:04:30 Tower kernel: ata10.00: Enabling discard_zeroes_data Sep 27 16:05:01 Tower kernel: ata10.00: NCQ disabled due to excessive errors Sep 27 16:05:01 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x780 SErr 0x0 action 0x6 frozen Sep 27 16:05:01 Tower kernel: ata10.00: failed command: SEND FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 64/01:38:00:00:00/00:00:00:00:00/a0 tag 7 ncq 512 out Sep 27 16:05:01 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 61/08:40:c0:00:00/00:00:00:00:00/40 tag 8 ncq 4096 out Sep 27 16:05:01 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 61/08:48:40:00:02/00:00:00:00:00/40 tag 9 ncq 4096 out Sep 27 16:05:01 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 61/08:50:40:00:00/00:00:20:00:00/40 tag 10 ncq 4096 out Sep 27 16:05:01 Tower kernel: res 40/00:00:00:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10: hard resetting link Sep 27 16:05:01 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 27 16:05:01 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:05:01 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:05:01 Tower kernel: ata10.00: configured for UDMA/133 Sep 27 16:05:01 Tower kernel: ata10: EH complete Sep 27 16:05:01 Tower kernel: ata10.00: Enabling discard_zeroes_data Sep 27 16:05:15 Tower root: /mnt/cache: 399.7 GiB (429127376896 bytes) trimmed According to the syslog excerpt you quoted, both drives appeared to be timing out on writes for a brief period, as if blocking them temporarily. After a reset, they both worked fine. I noticed though that a trim operation completed just after this, which would seem that it was running during those timeouts. Is it possible that the trim was blocking writes until it finished? You may want to make sure the trim is scheduled for off hours.
September 28, 20169 yr Author Yesterday I replace my cache drive (single SSD) with a new matched pair. Formatted as RAID0, and copied everything back to the new cache. Everything seems to be working fine. Copy to and from no errors. Today I checked the syslog curious to see if there was anything regarding the new additions. I saw a bunch of these "warnings" which started up about 9 AM. Replaced the SATA cables, restarted and oddly enough I had lost Cache config. Was able to reset the 2 disks in order and everything appeared to be okay (dockers, folders etc). Now its 5 PM and I am seeing the warning/errors again. Help? Sep 27 16:03:59 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x1f000 SErr 0x0 action 0x6 frozen Sep 27 16:03:59 Tower kernel: ata9.00: failed command: SEND FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 64/01:60:00:00:00/00:00:00:00:00/a0 tag 12 ncq 512 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/60:68:50:3b:0b/00:00:01:00:00/40 tag 13 ncq 49152 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/70:70:50:3a:0b/00:00:01:00:00/40 tag 14 ncq 57344 out Sep 27 16:03:59 Tower kernel: res 40/00:00:01:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/80:78:c0:3a:0b/00:00:01:00:00/40 tag 15 ncq 65536 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9.00: failed command: WRITE FPDMA QUEUED Sep 27 16:03:59 Tower kernel: ata9.00: cmd 61/10:80:40:3b:0b/00:00:01:00:00/40 tag 16 ncq 8192 out Sep 27 16:03:59 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:03:59 Tower kernel: ata9.00: status: { DRDY } Sep 27 16:03:59 Tower kernel: ata9: hard resetting link Sep 27 16:03:59 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 27 16:03:59 Tower kernel: ata9.00: supports DRM functions and may not be fully accessible Sep 27 16:03:59 Tower kernel: ata9.00: supports DRM functions and may not be fully accessible Sep 27 16:03:59 Tower kernel: ata9.00: configured for UDMA/133 Sep 27 16:03:59 Tower kernel: ata9: EH complete Sep 27 16:03:59 Tower kernel: ata9.00: Enabling discard_zeroes_data Sep 27 16:04:30 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x1c0000 SErr 0x0 action 0x6 frozen Sep 27 16:04:30 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:04:30 Tower kernel: ata10.00: cmd 61/c0:90:00:99:19/00:00:00:00:00/40 tag 18 ncq 98304 out Sep 27 16:04:30 Tower kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:04:30 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:04:30 Tower kernel: ata10.00: failed command: SEND FPDMA QUEUED Sep 27 16:04:30 Tower kernel: ata10.00: cmd 64/01:98:00:00:00/00:00:00:00:00/a0 tag 19 ncq 512 out Sep 27 16:04:30 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:04:30 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:04:30 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:04:30 Tower kernel: ata10.00: cmd 61/28:a0:98:a5:e9/00:00:00:00:00/40 tag 20 ncq 20480 out Sep 27 16:04:30 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:04:30 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:04:30 Tower kernel: ata10: hard resetting link Sep 27 16:04:30 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 27 16:04:30 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:04:30 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:04:30 Tower kernel: ata10.00: configured for UDMA/133 Sep 27 16:04:30 Tower kernel: ata10: EH complete Sep 27 16:04:30 Tower kernel: ata10.00: Enabling discard_zeroes_data Sep 27 16:05:01 Tower kernel: ata10.00: NCQ disabled due to excessive errors Sep 27 16:05:01 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x780 SErr 0x0 action 0x6 frozen Sep 27 16:05:01 Tower kernel: ata10.00: failed command: SEND FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 64/01:38:00:00:00/00:00:00:00:00/a0 tag 7 ncq 512 out Sep 27 16:05:01 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 61/08:40:c0:00:00/00:00:00:00:00/40 tag 8 ncq 4096 out Sep 27 16:05:01 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 61/08:48:40:00:02/00:00:00:00:00/40 tag 9 ncq 4096 out Sep 27 16:05:01 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10.00: failed command: WRITE FPDMA QUEUED Sep 27 16:05:01 Tower kernel: ata10.00: cmd 61/08:50:40:00:00/00:00:20:00:00/40 tag 10 ncq 4096 out Sep 27 16:05:01 Tower kernel: res 40/00:00:00:01:80/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 27 16:05:01 Tower kernel: ata10.00: status: { DRDY } Sep 27 16:05:01 Tower kernel: ata10: hard resetting link Sep 27 16:05:01 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 27 16:05:01 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:05:01 Tower kernel: ata10.00: supports DRM functions and may not be fully accessible Sep 27 16:05:01 Tower kernel: ata10.00: configured for UDMA/133 Sep 27 16:05:01 Tower kernel: ata10: EH complete Sep 27 16:05:01 Tower kernel: ata10.00: Enabling discard_zeroes_data Sep 27 16:05:15 Tower root: /mnt/cache: 399.7 GiB (429127376896 bytes) trimmed According to the syslog excerpt you quoted, both drives appeared to be timing out on writes for a brief period, as if blocking them temporarily. After a reset, they both worked fine. I noticed though that a trim operation completed just after this, which would seem that it was running during those timeouts. Is it possible that the trim was blocking writes until it finished? You may want to make sure the trim is scheduled for off hours. interesting observation. In fact I was curious if/how Trim worked on a raid0 pool and had set it to Hourly to see. Once I saw that it appears to treat both drives as one (only one output message?), I switched it back to daily.
October 5, 20169 yr Author I am now getting what look like a lot severe errors today. Of course I am in the middle of a parity rebuild so I can't take it offline. The errors start at about 6 AM when TRIM ran on schedule. I got a error warning email from unRaid but didn't see it until after 8AM. subject: cron for user root /sbin/fstrim -v /mnt/cache | logger &> /dev/null body: fstrim: /mnt/cache: FITRIM ioctl failed: Input/output error Went and looked at log and found it struggling with one of the pair of SSDs in RAID0. WTF is going on here? Is the disk bad? Is the RAID0 setup corrupt? I have turned off Docker service to avoid any problems until I can fix this. tower-diagnostics-20161005-0856.zip
October 5, 20169 yr Author Second question, assuming first error post is resolved, is it possible to return to default RAID1 for Cache without backing up & restoring contents of Cache?
October 5, 20169 yr You can revert back to RAID1 but one of your SSDs dropped offline, you have to fix that first. I also may want try and fix the constant timeout errors you're getting on the SSDs or you'll have issues again. If you didn't yet try changing cables, if it continuous replace the asmedia controller.
October 5, 20169 yr Author You can revert back to RAID1 but one of your SSDs dropped offline, you have to fix that first. I also may want try and fix the constant timeout errors you're getting on the SSDs or you'll have issues again. If you didn't yet try changing cables, if it continuous replace the asmedia controller. Both SSDs still show as green Cache & Cache 2 in webgui (odd). You would think there would be some red ball action or something. I swapped cables last time. The asmedia is the 2 onboard SATAIII ports. Oddly enough I have always used one of them when I had 1 SSD and left the other for slightly faster preclears. I will try moving both SSDs to HBA ports once parity is done rebuilding.
October 5, 20169 yr Both SSDs still show as green Cache & Cache 2 in webgui (odd). You would think there would be some red ball action or something. I swapped cables last time. That's normal for the cache pool, usually the only clue is the lack of temperature info on that device (if supported), it can appear unmountable after a reboot, but if it's raid0 it's to late to back it up. Make sure you also swap power cables, not just SATA.
October 6, 20169 yr Author After much shuffling (in process of replacing parity & 2 data drives AND getting cache to settle down), I have ended up with the 2 SSD on SATA2 ports on the motherboard. When connected to the HBA as planned,cron sent me an error about TRIM*, so moved them to mobo, SATA2. Now on to converting back to default raid1 cache pool. I ran the Balance command "-dconvert=raid1 -mconvert=raid1" and that seemed to split them back out to mirrors and I now show 275 GB total space as expected. However I think I am reading here that something is stuck in raid0 format? How to I fix this? Backup & reformat? Rerun Balance? btrfs filesystem show: Label: none uuid: f4c1a582-3eed-471c-be90-21393425b4fd Total devices 2 FS bytes used 106.21GiB devid 1 size 256.17GiB used 119.03GiB path /dev/sdl1 devid 2 size 256.17GiB used 119.03GiB path /dev/sdn1 btrfs filesystem df: Data, RAID1: total=117.00GiB, used=105.81GiB Data, RAID0: total=2.00GiB, used=0.00B System, RAID1: total=32.00MiB, used=48.00KiB Metadata, RAID1: total=1.00GiB, used=408.30MiB GlobalReserve, single: total=144.00MiB, used=0.00B *Trim error via cron email: fstrim: /mnt/cache: the discard operation is not supported
October 6, 20169 yr Author Try balancing again. That cleared it up. My first line went down? Data, RAID1: total=107.00GiB, used=105.83GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=1.00GiB, used=407.06MiB GlobalReserve, single: total=144.00MiB, used=0.00B before: Data, RAID1: total=117.00GiB, used=105.81GiB after: Data, RAID1: total=107.00GiB, used=105.83GiB
Archived
This topic is now archived and is closed to further replies.