Xfs file system corruption questions?


Recommended Posts

First of thanks for looking. 

 

Now, I have a multi-disk system which is reporting during boot 'Metadata I/O error in "xks_trans_read_buf_map" at daddr 0xb780a130" and then notes FS /dev/md3 

 

This happened after I had two old disk fail on me at once and I replaced them with 3tb (same capacity) white label drives which I precleared before hand. The array rebuilt but reported literally millions of reads and errors on a drive. I formatted the disk it reported, pre-cleared it again and it came back clear. Not a single SMART error which leads me to believe this is file system corruption and not a failed disk. But if you believe otherwise I'll be happy to listen. 

 

So I've booted up the system in maintenance mode and brought the array online without mounting it (Maintenance mode) and ran a 'xfs_repair -v /dev/md3' since that is were it reports the error. Problem is it gets all the way to phase 7 and then does this.. 

 

Phase 7 - verify and correct link counts...
resetting inode 99 nlinks from 5 to 4
resetting inode 117 nlinks from 27 to 8
resetting inode 16197945 nlinks from 3 to 2
resetting inode 16197967 nlinks from 2 to 71

---- <and it freezes here, goes no further even after 10 hours. 

 

I am fine loosing data there is nothing I can't replace on this box but how can I get this array back to a healthy state? I'd like to save my docker images if possible. 

 

Any help would be welcome, I am new to Unraid so I've attached my diagnostic file as I see people do here. Thanks you all. 

tower-diagnostics-20181118-2120.zip

Link to comment

A successful rebuild relies in their being zero read errors on any other drive other than the one(s) being rebuilt and zero write errors on the rebuilt drives.   Every read error (or write error) is likely to result in a corrupt sector on the rebuilt drive.    Also, file system corruption cannot cause read errors as rebuild works at the physical sector level) and not the file system level - it is the other way around in that read errors (or write errors) will cause file system corruption.

 

Exactly which were the drives you were replacing, and which was the one with read errors during the rebuild.

  • Upvote 1
Link to comment

Hey there. 

 

I got the errors on 1CH166_W1F4TC3V, Disk 5 in the array. When I took down the array I replaced two older Seagate drives with the two Hitachi drives now in the array. I checked the cable on 1CH166_W1F4TC3V, made sure everything was okay as far as I could see and then turned the system back on and it reported no errors on a SMART test. I did pre-clear both of the Hitachi drives. 

 

So is my file system pretty well beyond repair now? Should I just nuke it and restart? I feel like I am close to that point here. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.