Stubbs Posted September 6, 2022 Share Posted September 6, 2022 (edited) I noticed these errors started appearing in my log. The reads and writes on my array tab are all incrementing, except for Disk 3 which is remaining completely static at 1,375,745 reads and 581 writes. At first only disk0(parity drive) had errors, but now multiple disks do. Sep 6 23:24:45 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x3fc00 SErr 0x0 action 0x0 Sep 6 23:24:45 Tower kernel: ata6.00: irq_stat 0x40000008 Sep 6 23:24:45 Tower kernel: ata6.00: failed command: READ FPDMA QUEUED Sep 6 23:24:45 Tower kernel: ata6.00: cmd 60/08:50:f8:8c:37/04:00:26:00:00/40 tag 10 ncq dma 528384 in Sep 6 23:24:45 Tower kernel: res 41/40:00:f8:8c:37/00:00:26:00:00/40 Emask 0x409 (media error) <F> Sep 6 23:24:45 Tower kernel: ata6.00: status: { DRDY ERR } Sep 6 23:24:45 Tower kernel: ata6.00: error: { UNC } Sep 6 23:24:45 Tower kernel: ata6.00: ATA Identify Device Log not supported Sep 6 23:24:45 Tower kernel: ata6.00: ATA Identify Device Log not supported Sep 6 23:24:45 Tower kernel: ata6.00: configured for UDMA/133 Sep 6 23:24:45 Tower kernel: sd 6:0:0:0: [sde] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=7s Sep 6 23:24:45 Tower kernel: sd 6:0:0:0: [sde] tag#10 Sense Key : 0x3 [current] Sep 6 23:24:45 Tower kernel: sd 6:0:0:0: [sde] tag#10 ASC=0x11 ASCQ=0x4 Sep 6 23:24:45 Tower kernel: sd 6:0:0:0: [sde] tag#10 CDB: opcode=0x88 88 00 00 00 00 00 26 37 8c f8 00 00 04 08 00 00 Sep 6 23:24:45 Tower kernel: blk_update_request: I/O error, dev sde, sector 641174776 op 0x0:(READ) flags 0x0 phys_seg 129 prio class 0 Sep 6 23:24:45 Tower kernel: md: disk0 read error, sector=641174712 Sep 6 23:24:45 Tower kernel: md: disk0 read error, sector=641174720 Sep 6 23:24:45 Tower kernel: md: disk0 read error, sector=641174728 Sep 6 23:24:45 Tower kernel: md: disk0 read error, sector=641174736 Sep 6 23:24:45 Tower kernel: md: disk0 read error, sector=641174744 Sep 6 23:24:45 Tower kernel: md: disk0 read error, sector=641174752 And they're continuing in intervals. Sep 6 23:41:25 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x3fe8 SErr 0x0 action 0x0 Sep 6 23:41:25 Tower kernel: ata6.00: irq_stat 0x40000008 Sep 6 23:41:25 Tower kernel: ata6.00: failed command: READ FPDMA QUEUED Sep 6 23:41:25 Tower kernel: ata6.00: cmd 60/40:18:b8:54:b2/05:00:93:00:00/40 tag 3 ncq dma 688128 in Sep 6 23:41:25 Tower kernel: res 41/40:00:b8:54:b2/00:00:93:00:00/40 Emask 0x409 (media error) <F> Sep 6 23:41:25 Tower kernel: ata6.00: status: { DRDY ERR } Sep 6 23:41:25 Tower kernel: ata6.00: error: { UNC } Sep 6 23:41:25 Tower kernel: ata6.00: ATA Identify Device Log not supported Sep 6 23:41:25 Tower kernel: ata6.00: ATA Identify Device Log not supported Sep 6 23:41:25 Tower kernel: ata6.00: configured for UDMA/133 Sep 6 23:41:25 Tower kernel: sd 6:0:0:0: [sde] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=7s Sep 6 23:41:25 Tower kernel: sd 6:0:0:0: [sde] tag#3 Sense Key : 0x3 [current] Sep 6 23:41:25 Tower kernel: sd 6:0:0:0: [sde] tag#3 ASC=0x11 ASCQ=0x4 Sep 6 23:41:25 Tower kernel: sd 6:0:0:0: [sde] tag#3 CDB: opcode=0x88 88 00 00 00 00 00 93 b2 54 b8 00 00 05 40 00 00 Sep 6 23:41:25 Tower kernel: blk_update_request: I/O error, dev sde, sector 2477937848 op 0x0:(READ) flags 0x0 phys_seg 168 prio class 0 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937784 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937792 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937800 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937808 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937816 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937824 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937832 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937840 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937848 Sep 6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937856 Sep 7 00:00:48 Tower kernel: ata9.00: cmd 60/08:18:68:71:11/05:00:9e:00:00/40 tag 3 ncq dma 659456 in Sep 7 00:00:48 Tower kernel: res 41/40:00:68:71:11/00:00:9e:00:00/40 Emask 0x409 (media error) <F> Sep 7 00:00:48 Tower kernel: ata9.00: status: { DRDY ERR } Sep 7 00:00:48 Tower kernel: ata9.00: error: { UNC } Sep 7 00:00:48 Tower kernel: ata9.00: ATA Identify Device Log not supported Sep 7 00:00:48 Tower kernel: ata9.00: ATA Identify Device Log not supported Sep 7 00:00:48 Tower kernel: ata9.00: configured for UDMA/133 Sep 7 00:00:48 Tower kernel: sd 9:0:0:0: [sdf] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=7s Sep 7 00:00:48 Tower kernel: sd 9:0:0:0: [sdf] tag#3 Sense Key : 0x3 [current] Sep 7 00:00:48 Tower kernel: sd 9:0:0:0: [sdf] tag#3 ASC=0x11 ASCQ=0x4 Sep 7 00:00:48 Tower kernel: sd 9:0:0:0: [sdf] tag#3 CDB: opcode=0x28 28 00 9e 11 71 68 00 05 08 00 Sep 7 00:00:48 Tower kernel: blk_update_request: I/O error, dev sdf, sector 2651943272 op 0x0:(READ) flags 0x0 phys_seg 161 prio class 0 Sep 7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943208 Sep 7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943216 Sep 7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943224 Sep 7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943232 Sep 7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943240 Sep 7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943248 Diagnostics attached. Also attached a SMART test for the parity drive, although it got stuck on 90% and won't complete. Says "Interrupted (host reset)". For what it's worth, a couple of weeks ago I had some big problems with power failures and didn't have a UPS working at the time. tower-diagnostics-20220906-1405.zip (disk0) tower-smart-20220906-2339.zip Edited September 6, 2022 by Stubbs Quote Link to comment
Stubbs Posted September 6, 2022 Author Share Posted September 6, 2022 And just one extra question which I'm not 100% sure is related: does a parity sync achieve the same things as a parity check? I performed a parity sync about a fortnight ago. Would there be any point in starting a parity check less than a month later? Quote Link to comment
trurl Posted September 6, 2022 Share Posted September 6, 2022 No point doing anything until you resolve your hardware problems. Do these drives share a power cable? Parity is on one controller but disks 2,3 on another. Quote Link to comment
JorgeB Posted September 6, 2022 Share Posted September 6, 2022 They are logged as actual disk errors, though possibly only parity has a problem, run extended SMART test on all, make sure spin down is disabled. Quote Link to comment
trurl Posted September 6, 2022 Share Posted September 6, 2022 4 minutes ago, Stubbs said: does a parity sync achieve the same things as a parity check? A parity sync should produce the same result as a correcting parity check, but sync doesn't bother to check, it just builds parity. Not sure if your use of the word 'sync' is the same or not. Quote Link to comment
Stubbs Posted September 6, 2022 Author Share Posted September 6, 2022 4 minutes ago, trurl said: No point doing anything until you resolve your hardware problems. Do these drives share a power cable? Parity is on one controller but disks 2,3 on another. I have a slightly unorthodox setup. I believe disks 2 & 3 are installed within my servers case, whereas the parity disk + disks 1 and 4 are inside the hotswap bay connected to the front of the case. This bay (and its fan) is powered by two SATA power connectors. I've had to interchange them over the years because of past problems. One time I had a defective cable, another time one of the hotswap bays had the wrong mounting screws, causing a faulty connection. Never really kept track of where each specific disk is because Unraid remembers their IDs anyway. 7 minutes ago, JorgeB said: They are logged as actual disk errors, though possibly only parity has a problem, run extended SMART test on all, make sure spin down is disabled. The thing is, I can't (or rather, couldn't) even complete the short test. It got stuck at 90% before showing the message "Interrupted (host reset)". In an attempt to fix this, I restarted the server, and the read errors have gone away. I assume they're still actually there though, and I'll try to run an extended test. First I'm backing everything important up. Quote Link to comment
trurl Posted September 6, 2022 Share Posted September 6, 2022 15 minutes ago, Stubbs said: The thing is, I can't (or rather, couldn't) even complete the short test. It got stuck at 90% before showing the message "Interrupted (host reset)". 24 minutes ago, JorgeB said: make sure spin down is disabled. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.