xfs on raid device showing I/O error, /dev/sd?1 device xfs shows fine


Recommended Posts

So I have has some cabling and controller issues, kept on losing parity, so disabled parity (i know I know) until I could get my replacement card in. 

Re-cabled with new cables, had an issue with md1 (And the physical drive as well), rechecked power connections, same, then changed out whole new power connection for 4 drives and md1 came back fine, but now somewhere in the ups and downs the md4 device began having I/O errors, but the device (currently /dev/sdg1) xfs checks looks fine (-n so it wouldn't change anything).

I did the 'shrink array' method of rebuilding the array with new config, keep data, to see if that would 'fix' the error. Still no.

Since I do not have a parity right now, I am guessing it is safe to xfs_repair the non-md device (ie /dev/sdg1), and recover what I can, then just make a new config - keep files, and go from there?

Output from xfs_repair 

 

root@media:~# xfs_repair -vn /dev/sdg1
Phase 1 - find and verify superblock...
        - block cache size set to 6157048 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 1233498 tail block 1233498
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Thu Apr  8 01:18:04 2021

Phase           Start           End             Duration
Phase 1:        04/08 01:17:39  04/08 01:17:39
Phase 2:        04/08 01:17:39  04/08 01:17:40  1 second
Phase 3:        04/08 01:17:40  04/08 01:17:56  16 seconds
Phase 4:        04/08 01:17:56  04/08 01:17:57  1 second
Phase 5:        Skipped
Phase 6:        04/08 01:17:57  04/08 01:18:04  7 seconds
Phase 7:        04/08 01:18:04  04/08 01:18:04

Total run time: 25 seconds
root@media:~# xfs_repair -vn /dev/md4
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1

fatal error -- Input/output error

 

media-diagnostics-20210408-0054.zip

Link to comment

The drives with read errors are on separate sas cables and different power segments, so I am thinking it is a mix of old drives and funkyness from my old controller. 

My plan atm is removing the drives with the read errors from the drive pool, making a new config with the remaining drives, then mount -o ro,norecovery the other drives and moving the data over to the array. I have some spare drives to throw in to make more space so that should work out space wise. I know I may have some file corruption, anything of importance, like I said, is backed up already so I may just restore those shares that are important and worry about the rest on a case by case basis.

I am still open to feedback though, as that process will take a while and I wont start on it for a few days :D

Link to comment

I just took the faulty drives out of the array, rebuilt the new config, and copied over the data. Everything looked good, Once I get my new drives in for double parity and everything is good I will pre-clear the 'faulty' drives and see how that goes.

 

If I have issues with them I will make a new post, thanks for doing your awesome jobs! Very much appreciate everyone who helps out here in the community!

Much love! 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.