XFS filesystem corruption but parity version seems fine - best course of action?


Go to solution Solved by itimpi,

Recommended Posts


Hi, im running unraid 6.10.3.  Got the dreaded file system missing/umountable error on a drive along with IO/CRC errors in the log.  Filesystem is XFS with encryption enabled.  

 

I ran a check with -nv and saw quite a bit of corruption.  I've slimmed down the logs but basically the corrupted data falls into 3 categories: 

 

 

entry "FILE1" at block 0 offset 96 in directory inode 276415028 references non-existent inode 2147483800
    would clear inode number in entry at offset 96

 

These two are similar but one references a file and one a folder, so im curious what the consequences are for both.  

entry "FILE2" in directory inode 276415028 points to non-existent inode 2147483800, would junk entry



entry "FOLDER1" in directory inode 616472519 points to non-existent inode 2750238235, would junk entry
bad hash table for directory inode 616472519 (no data entry): would rebuild
would rebuild directory inode 616472519

 

 

1.  What does 'clear inode in entry' indicate if I was to run an actual repair?  Same question for 'junk entry.'  Would they be discarded or end up in lost and found?  

 

2.  I removed the device from the array and spot checked the files listed in the log based on the parity drive.  They all seem to work fine.  Should I just swap the disc and rebuild opposed to running a XFS_REPAIR?  There's nothing too critical in there but if it would save the hassle of lost and found fragmentation/data loss, im all up for it.   But I also don't know if im just misunderstanding FS corruption and the parity version is also going to have the same issues on a rebuilt drive.  

Edited by oliver
Link to comment
  • Solution

Was the disk disabled (had a red ‘x’ showing on the main tab)?  If it was then the disk would have been emulated and the file system check run against the emulated disk, whereas if not it was against the real disk.   The reason I ask is that the rebuild process simply makes a physical drive correspond to the emulated one (including any file system corruption).   If the drive was NOT disabled then it could be worth running the check again this time against the emulated drive to see what that reports.
 

Do you have a spare disk available to rebuild to?   This is desirable as it means for the time being you can keep the physical problem drive with its contents intact so you have more recovery options if a rebuild is not completely successful.

Link to comment
11 minutes ago, itimpi said:

Was the disk disabled (had a red ‘x’ showing on the main tab)?  If it was then the disk would have been emulated and the file system check run against the emulated disk, whereas if not it was against the real disk.   The reason I ask is that the rebuild process simply makes a physical drive correspond to the emulated one (including any file system corruption).   If the drive was NOT disabled then it could be worth running the check again this time against the emulated drive to see what that reports.
 

Do you have a spare disk available to rebuild to?   This is desirable as it means for the time being you can keep the physical problem drive with its contents intact so you have more recovery options if a rebuild is not completely successful.

 

it didn't have a X when i ran the check. 

 

i just ran another check in emulated mode and it shows no errors (attached log to end of post).  but i just want to make sure i did the right thing.  the drive shows 'not installed' with an X but i still clicked into it and was able to run the check with -nv.  was that the correct way to run against an emulated drive?  

 

and yes i do have a spare disk available.  im thinking do the rebuild and if that fails, i can put the original into a file recovery program on another computer.  

 

 

Phase 1 - find and verify superblock...
        - block cache size set to 1473600 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 149292 tail block 149292
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Sun Aug  7 09:19:20 2022

Phase        Start        End        Duration
Phase 1:    08/07 09:19:14    08/07 09:19:15    1 second
Phase 2:    08/07 09:19:15    08/07 09:19:15
Phase 3:    08/07 09:19:15    08/07 09:19:20    5 seconds
Phase 4:    08/07 09:19:20    08/07 09:19:20
Phase 5:    Skipped
Phase 6:    08/07 09:19:20    08/07 09:19:20
Phase 7:    08/07 09:19:20    08/07 09:19:20

Total run time: 6 seconds
 


 

 

 

 

 

 

Link to comment
7 minutes ago, itimpi said:

 

That is very promising - it look like the emulated drive is showing no corruption so I would expect a successful rebuild will result in all your data being intact.   Since you have a spare disk then rebuilding to another disk is the right way to proceed.

 

great, thanks!  my first drive failure and first rebuild so hopefully it all goes well.  unfortunately, most of the drives in my array are around the same age so hopefully something else doesn't fail.  

Link to comment
Just now, oliver said:

 

great, thanks!  my first drive failure and first rebuild so hopefully it all goes well.  unfortunately, most of the drives in my array are around the same age so hopefully something else doesn't fail.  

It is quite possible that there is nothing wrong with the 'failed' drive.  External factors such as cabling, power supply etc are much more common causes of problems that the physical drives failing.  After a successful rebuild you can test the 'failed' drive to see if it appears to be OK in which case you can keep it as a spare.

Link to comment
2 minutes ago, itimpi said:

It is quite possible that there is nothing wrong with the 'failed' drive.  External factors such as cabling, power supply etc are much more common causes of problems that the physical drives failing.  After a successful rebuild you can test the 'failed' drive to see if it appears to be OK in which case you can keep it as a spare.

 

ah, thanks, i was wondering what to do with it.  

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.