Disks with read errors

Stubbs · September 6, 2022

I noticed these errors started appearing in my log. The reads and writes on my array tab are all incrementing, except for Disk 3 which is remaining completely static at 1,375,745 reads and 581 writes.

At first only disk0(parity drive) had errors, but now multiple disks do.


Sep  6 23:24:45 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x3fc00 SErr 0x0 action 0x0
Sep  6 23:24:45 Tower kernel: ata6.00: irq_stat 0x40000008
Sep  6 23:24:45 Tower kernel: ata6.00: failed command: READ FPDMA QUEUED
Sep  6 23:24:45 Tower kernel: ata6.00: cmd 60/08:50:f8:8c:37/04:00:26:00:00/40 tag 10 ncq dma 528384 in
Sep  6 23:24:45 Tower kernel:         res 41/40:00:f8:8c:37/00:00:26:00:00/40 Emask 0x409 (media error) <F>
Sep  6 23:24:45 Tower kernel: ata6.00: status: { DRDY ERR }
Sep  6 23:24:45 Tower kernel: ata6.00: error: { UNC }
Sep  6 23:24:45 Tower kernel: ata6.00: ATA Identify Device Log not supported
Sep  6 23:24:45 Tower kernel: ata6.00: ATA Identify Device Log not supported
Sep  6 23:24:45 Tower kernel: ata6.00: configured for UDMA/133
Sep  6 23:24:45 Tower kernel: sd 6:0:0:0: [sde] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=7s
Sep  6 23:24:45 Tower kernel: sd 6:0:0:0: [sde] tag#10 Sense Key : 0x3 [current] 
Sep  6 23:24:45 Tower kernel: sd 6:0:0:0: [sde] tag#10 ASC=0x11 ASCQ=0x4 
Sep  6 23:24:45 Tower kernel: sd 6:0:0:0: [sde] tag#10 CDB: opcode=0x88 88 00 00 00 00 00 26 37 8c f8 00 00 04 08 00 00
Sep  6 23:24:45 Tower kernel: blk_update_request: I/O error, dev sde, sector 641174776 op 0x0:(READ) flags 0x0 phys_seg 129 prio class 0
Sep  6 23:24:45 Tower kernel: md: disk0 read error, sector=641174712
Sep  6 23:24:45 Tower kernel: md: disk0 read error, sector=641174720
Sep  6 23:24:45 Tower kernel: md: disk0 read error, sector=641174728
Sep  6 23:24:45 Tower kernel: md: disk0 read error, sector=641174736
Sep  6 23:24:45 Tower kernel: md: disk0 read error, sector=641174744
Sep  6 23:24:45 Tower kernel: md: disk0 read error, sector=641174752

And they're continuing in intervals.

Sep  6 23:41:25 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x3fe8 SErr 0x0 action 0x0
Sep  6 23:41:25 Tower kernel: ata6.00: irq_stat 0x40000008
Sep  6 23:41:25 Tower kernel: ata6.00: failed command: READ FPDMA QUEUED
Sep  6 23:41:25 Tower kernel: ata6.00: cmd 60/40:18:b8:54:b2/05:00:93:00:00/40 tag 3 ncq dma 688128 in
Sep  6 23:41:25 Tower kernel:         res 41/40:00:b8:54:b2/00:00:93:00:00/40 Emask 0x409 (media error) <F>
Sep  6 23:41:25 Tower kernel: ata6.00: status: { DRDY ERR }
Sep  6 23:41:25 Tower kernel: ata6.00: error: { UNC }
Sep  6 23:41:25 Tower kernel: ata6.00: ATA Identify Device Log not supported
Sep  6 23:41:25 Tower kernel: ata6.00: ATA Identify Device Log not supported
Sep  6 23:41:25 Tower kernel: ata6.00: configured for UDMA/133
Sep  6 23:41:25 Tower kernel: sd 6:0:0:0: [sde] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=7s
Sep  6 23:41:25 Tower kernel: sd 6:0:0:0: [sde] tag#3 Sense Key : 0x3 [current] 
Sep  6 23:41:25 Tower kernel: sd 6:0:0:0: [sde] tag#3 ASC=0x11 ASCQ=0x4 
Sep  6 23:41:25 Tower kernel: sd 6:0:0:0: [sde] tag#3 CDB: opcode=0x88 88 00 00 00 00 00 93 b2 54 b8 00 00 05 40 00 00
Sep  6 23:41:25 Tower kernel: blk_update_request: I/O error, dev sde, sector 2477937848 op 0x0:(READ) flags 0x0 phys_seg 168 prio class 0
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937784
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937792
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937800
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937808
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937816
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937824
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937832
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937840
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937848
Sep  6 23:41:25 Tower kernel: md: disk0 read error, sector=2477937856

Sep  7 00:00:48 Tower kernel: ata9.00: cmd 60/08:18:68:71:11/05:00:9e:00:00/40 tag 3 ncq dma 659456 in
Sep  7 00:00:48 Tower kernel:         res 41/40:00:68:71:11/00:00:9e:00:00/40 Emask 0x409 (media error) <F>
Sep  7 00:00:48 Tower kernel: ata9.00: status: { DRDY ERR }
Sep  7 00:00:48 Tower kernel: ata9.00: error: { UNC }
Sep  7 00:00:48 Tower kernel: ata9.00: ATA Identify Device Log not supported
Sep  7 00:00:48 Tower kernel: ata9.00: ATA Identify Device Log not supported
Sep  7 00:00:48 Tower kernel: ata9.00: configured for UDMA/133
Sep  7 00:00:48 Tower kernel: sd 9:0:0:0: [sdf] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=7s
Sep  7 00:00:48 Tower kernel: sd 9:0:0:0: [sdf] tag#3 Sense Key : 0x3 [current] 
Sep  7 00:00:48 Tower kernel: sd 9:0:0:0: [sdf] tag#3 ASC=0x11 ASCQ=0x4 
Sep  7 00:00:48 Tower kernel: sd 9:0:0:0: [sdf] tag#3 CDB: opcode=0x28 28 00 9e 11 71 68 00 05 08 00
Sep  7 00:00:48 Tower kernel: blk_update_request: I/O error, dev sdf, sector 2651943272 op 0x0:(READ) flags 0x0 phys_seg 161 prio class 0
Sep  7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943208
Sep  7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943216
Sep  7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943224
Sep  7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943232
Sep  7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943240
Sep  7 00:00:48 Tower kernel: md: disk2 read error, sector=2651943248

Diagnostics attached.

Also attached a SMART test for the parity drive, although it got stuck on 90% and won't complete. Says "Interrupted (host reset)".

For what it's worth, a couple of weeks ago I had some big problems with power failures and didn't have a UPS working at the time.

tower-diagnostics-20220906-1405.zip (disk0) tower-smart-20220906-2339.zip

Edited September 6, 2022 by Stubbs

Stubbs · September 6, 2022

And just one extra question which I'm not 100% sure is related: does a parity sync achieve the same things as a parity check?

I performed a parity sync about a fortnight ago. Would there be any point in starting a parity check less than a month later?

trurl · September 6, 2022

No point doing anything until you resolve your hardware problems.

Do these drives share a power cable? Parity is on one controller but disks 2,3 on another.

JorgeB · September 6, 2022

They are logged as actual disk errors, though possibly only parity has a problem, run extended SMART test on all, make sure spin down is disabled.

trurl · September 6, 2022

4 minutes ago, Stubbs said:

does a parity sync achieve the same things as a parity check?

A parity sync should produce the same result as a correcting parity check, but sync doesn't bother to check, it just builds parity.

Not sure if your use of the word 'sync' is the same or not.

Stubbs · September 6, 2022

4 minutes ago, trurl said:

No point doing anything until you resolve your hardware problems.

Do these drives share a power cable? Parity is on one controller but disks 2,3 on another.

I have a slightly unorthodox setup. I believe disks 2 & 3 are installed within my servers case, whereas the parity disk + disks 1 and 4 are inside the hotswap bay connected to the front of the case. This bay (and its fan) is powered by two SATA power connectors.

I've had to interchange them over the years because of past problems. One time I had a defective cable, another time one of the hotswap bays had the wrong mounting screws, causing a faulty connection. Never really kept track of where each specific disk is because Unraid remembers their IDs anyway.

7 minutes ago, JorgeB said:

They are logged as actual disk errors, though possibly only parity has a problem, run extended SMART test on all, make sure spin down is disabled.

The thing is, I can't (or rather, couldn't) even complete the short test. It got stuck at 90% before showing the message "Interrupted (host reset)".

In an attempt to fix this, I restarted the server, and the read errors have gone away. I assume they're still actually there though, and I'll try to run an extended test.

First I'm backing everything important up.

trurl · September 6, 2022

15 minutes ago, Stubbs said:

The thing is, I can't (or rather, couldn't) even complete the short test. It got stuck at 90% before showing the message "Interrupted (host reset)".

24 minutes ago, JorgeB said:

make sure spin down is disabled.

Disks with read errors

Recommended Posts

Stubbs

Link to comment

Stubbs

Link to comment

trurl

Link to comment

JorgeB

Link to comment

trurl

Link to comment

Stubbs

Link to comment

trurl

Link to comment

Join the conversation