HDD read errors - controller related?

KRiSX · March 6, 2022

Hi all, I woke up this morning to 46 read errors on a brand new 10tb Ironwolf drive that successfully passed a full preclear. I'm not running parity, but I used preclear on the drive to give it a good test/work out before putting data on it. Due to some shuffling of data and drives I'm doing using unBALANCE, this particular drive is 98% full currently and I now seem to be getting errors with it. My first suspicion is a controller issue as it wouldn't be the first time I've had a bad time with my Adaptec with unRAID and I do have an LSI controller on the way, but right now I want to try and work out if I've got a faulty drive or if its my controller as I suspect. Diagnostics attached. Highlights are...

kernel: blk_update_request: I/O error, dev sdt, sector 6497176416 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
kernel: md: disk16 read error, sector=6497176352
kernel: XFS (md16): metadata I/O error in "xfs_da_read_buf+0x9e/0xfe [xfs]" at daddr 0x183430b20 len 8 error 5
kernel: sd 1:1:27:0: [sdt] tag#520 access beyond end of device
kernel: blk_update_request: I/O error, dev sdt, sector 10754744120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
kernel: md: disk16 read error, sector=10754744056
kernel: XFS (md16): metadata I/O error in "xfs_da_read_buf+0x9e/0xfe [xfs]" at daddr 0x281085ef8 len 8 error 5
kernel: sd 1:1:27:0: [sdt] tag#568 access beyond end of device
kernel: blk_update_request: I/O error, dev sdt, sector 19327352984 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
kernel: md: disk16 read error, sector=19327352920
kernel: md: disk16 read error, sector=19327352928
kernel: md: disk16 read error, sector=19327352936
kernel: md: disk16 read error, sector=19327352944
kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x480000058 len 32 error 5

I pulled down the diagnostics initially after I'd triggered an extended smart test, i then noticed the smart test was stopped by the host (host reset it says) and more errors appeared so I re-downloaded the logs... new items are...

rc.diskinfo[11028]: SIGHUP received, forcing refresh of disks info.
kernel: sd 1:1:27:0: [sdt] tag#678 access beyond end of device
kernel: print_req_error: 4 callbacks suppressed
kernel: blk_update_request: I/O error, dev sdt, sector 11491091352 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
kernel: md: disk16 read error, sector=11491091288
kernel: md: disk16 read error, sector=11491091296
kernel: md: disk16 read error, sector=11491091304
kernel: md: disk16 read error, sector=11491091312
kernel: XFS: metadata IO error: 21 callbacks suppressed
kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5
kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5
kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5
kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5
kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5
kernel: XFS (md16): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x2acec2358 len 32 error 5
kernel: sd 1:1:27:0: [sdt] 19532873728 512-byte logical blocks: (10.0 TB/9.10 TiB)
kernel: sdt: detected capacity change from 0 to 10000831348736
kernel: GPT:Primary header thinks Alt. header is not at the end of the disk.
kernel: GPT:19524464639 != 19532873727
kernel: GPT:Alternate GPT header not at the end of the disk.
kernel: GPT:19524464639 != 19532873727
kernel: GPT: Use GNU Parted to correct GPT errors.
kernel: sdt: sdt1

I'm going to try triggering an extended test again now.

Hopefully I'm right and I'll have these issues solved by replacing the controller, but I need to work it out for sure if possible.

Thanks

UPDATE: second extended test failed, same message as above "kernel: GPT:Primary header thinks Alt. header is not at the end of the disk." - I still strongly suspect its a controller issue, but would love some feedback

newbehemoth-diagnostics-20220307-0811.zip

Edited March 6, 2022 by KRiSX

KRiSX · March 7, 2022

So I went ahead and stopped the array and restarted it to see if that would change anything - it didn't. After finding some other threads referencing some of erros I'm seeing, I tried sgdisk -v which led me to running sgdisk -e. Neither of which seemed to do a whole lot. I fired up storman (adaptec docker) to check the status from there and noticed the drive was "Ready" instead of JBOD like it should have been. Instead of adjusting this as I was pretty confident I would destroy the data doing so, I connected the drive to my motherboard directly instead. Tested if the drive was mountable and it is. Tried including it in my array but it was unmountable and wanted to format, so removed it again and now I'm moving the data off it via Unassigned Devices. Have managed to move off a couple hundred gigs without issue at this point, so I really still don't know for sure what the issue is but at least data loss appears to be minimal (if any). After I get the data off I'll remove the drive and test it elsewhere.

eikum · March 9, 2022

I have the same issues with my Ironwolf drives, 8TB and 14TB. I got warning from Unraid: read error and disable disk. My 8tb disk only started to get these warnings after I connected the 14TB drive.

After some testing my only solution was to use the internal SATA ports on these Ironwolf drives. I guess I need to buy a "normal" PCI to Sata card or something?

S.M.A.R.T Test = All good

PreClear = All good

SeaTolls = All good

"Tweak" the disk = Nothing changed

KRiSX · March 18, 2022

On 3/9/2022 at 7:59 PM, eikum said:

I have the same issues with my Ironwolf drives, 8TB and 14TB. I got warning from Unraid: read error and disable disk. My 8tb disk only started to get these warnings after I connected the 14TB drive.

After some testing my only solution was to use the internal SATA ports on these Ironwolf drives. I guess I need to buy a "normal" PCI to Sata card or something?

S.M.A.R.T Test = All good

PreClear = All good

SeaTolls = All good

"Tweak" the disk = Nothing changed

I have since replaced my Adaptec controller with an LSI 9207-8i, so far so good as I had to move all my data and format all my disks again and I've recieved absolutely no errors so far including on the 10tb that gave me issues. I really feel this was a case of my Adaptec couldn't properly handle 10tb disks, but they are on the support list so who knows... either way I'm in a so far so good situation and will continue to monitor. The onlly weird thing I'm seeing now is the Fix Common Problems plugin is saying write cache is disabled on all my disks connected to this controller, but speeds seem fine to me (150-250MB/s), I had 6 drives running flat out doing moves this past week and was hitting 550-650MB/s across them so I think its a false report or it simply doesn't matter.

HDD read errors - controller related?

Recommended Posts

KRiSX

Link to comment

KRiSX

Link to comment

eikum

Link to comment

KRiSX

Link to comment

Join the conversation