xfs on raid device showing I/O error, /dev/sd?1 device xfs shows fine

darkwolf · April 8, 2021

So I have has some cabling and controller issues, kept on losing parity, so disabled parity (i know I know) until I could get my replacement card in.

Re-cabled with new cables, had an issue with md1 (And the physical drive as well), rechecked power connections, same, then changed out whole new power connection for 4 drives and md1 came back fine, but now somewhere in the ups and downs the md4 device began having I/O errors, but the device (currently /dev/sdg1) xfs checks looks fine (-n so it wouldn't change anything).

I did the 'shrink array' method of rebuilding the array with new config, keep data, to see if that would 'fix' the error. Still no.

Since I do not have a parity right now, I am guessing it is safe to xfs_repair the non-md device (ie /dev/sdg1), and recover what I can, then just make a new config - keep files, and go from there?

Output from xfs_repair

root@media:~# xfs_repair -vn /dev/sdg1
Phase 1 - find and verify superblock...
        - block cache size set to 6157048 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 1233498 tail block 1233498
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Thu Apr  8 01:18:04 2021

Phase           Start           End             Duration
Phase 1:        04/08 01:17:39  04/08 01:17:39
Phase 2:        04/08 01:17:39  04/08 01:17:40  1 second
Phase 3:        04/08 01:17:40  04/08 01:17:56  16 seconds
Phase 4:        04/08 01:17:56  04/08 01:17:57  1 second
Phase 5:        Skipped
Phase 6:        04/08 01:17:57  04/08 01:18:04  7 seconds
Phase 7:        04/08 01:18:04  04/08 01:18:04

Total run time: 25 seconds
root@media:~# xfs_repair -vn /dev/md4
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1

fatal error -- Input/output error

media-diagnostics-20210408-0054.zip

darkwolf · April 8, 2021

(Oh, and then of course adding my parity drive back in (now that it is looking fine) and doing parity checks of course

darkwolf · April 8, 2021

So I did it, because most of the data is either backed up somewhere or media data I can re-rip from disc.

Looks like I have a few disk errors

Suggestions?

darkwolf · April 8, 2021

The drives with read errors are on separate sas cables and different power segments, so I am thinking it is a mix of old drives and funkyness from my old controller.

My plan atm is removing the drives with the read errors from the drive pool, making a new config with the remaining drives, then mount -o ro,norecovery the other drives and moving the data over to the array. I have some spare drives to throw in to make more space so that should work out space wise. I know I may have some file corruption, anything of importance, like I said, is backed up already so I may just restore those shares that are important and worry about the rest on a case by case basis.

I am still open to feedback though, as that process will take a while and I wont start on it for a few days

JorgeB · April 8, 2021

1 hour ago, darkwolf said:

Suggestions?

Post new diags after the errors previous one didn't have those.

darkwolf · April 10, 2021

I just took the faulty drives out of the array, rebuilt the new config, and copied over the data. Everything looked good, Once I get my new drives in for double parity and everything is good I will pre-clear the 'faulty' drives and see how that goes.

If I have issues with them I will make a new post, thanks for doing your awesome jobs! Very much appreciate everyone who helps out here in the community!

Much love!

xfs on raid device showing I/O error, /dev/sd?1 device xfs shows fine

Recommended Posts

darkwolf

Link to comment

darkwolf

Link to comment

darkwolf

Link to comment

darkwolf

Link to comment

JorgeB

Link to comment

darkwolf

Link to comment

Join the conversation