[SOLVED] XFS Metadata corruption on 'unknown' disk

seneo · December 18, 2020

I'm trying to troubleshoot recent issues that I have recently with my unraid server (unraid version 6.8.3 2020-03-05).

I have many issues regarding docker, container crashing/stopping, full docker system crashing etc... but right now I'm focusing on some warnings that pop when I start the array:

kernel: XFS (dm-0): Metadata corruption detected at xfs_dinode_verify+0xa5/0x52e [xfs], inode 0x18c72912a dinode

I'm not able to identify the disk in question, I assume that dm-0 correspond to md0 but looking at the logs I don't see this mount point. The only disks I don't see mounted in the logs are the parity and the flash drive but they are not using XFS.

The warnings seems to appear only after starting the array after booting, not after stoping/starting the array.

Does anybody have any idea?

intersect-syslog-20201218-1836.zip

Edited December 18, 2020 by seneo

JorgeB · December 18, 2020

dm-0 is disk1.

seneo · December 18, 2020

Ok so when I use the `check` button of the `Check Filesystem Status`section of disk1 I can see the following log:

Phase 1 - find and verify superblock...
        - block cache size set to 736264 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 3726195 tail block 3726191
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
sb_fdblocks 75199576, counted 76180631
        - found root inode chunk
	Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Fri Dec 18 20:18:39 2020

Phase		Start		End		Duration
Phase 1:	12/18 20:15:59	12/18 20:15:59
Phase 2:	12/18 20:15:59	12/18 20:16:05	6 seconds
Phase 3:	12/18 20:16:05	12/18 20:17:42	1 minute, 37 seconds
Phase 4:	12/18 20:17:42	12/18 20:17:42
Phase 5:	Skipped
Phase 6:	12/18 20:17:42	12/18 20:18:39	57 seconds
Phase 7:	12/18 20:18:39	12/18 20:18:39

Total run time: 2 minutes, 40 seconds

Should I retry without the -n option as mentionned in 'phase 2'? I'm scared to do more harm than good without the 'no modification' option.

JorgeB · December 18, 2020

Run without -n or nothing will be done.

seneo · December 18, 2020

The outpout without the -n is as follow

ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

I'm not sur what 'Mount the filesystem to replay the log, and unmount it before re-running xfs_repair.' means in this context. Does it means I should start/stop the array before re-running xfs_repair in maintenance mode?

itimpi · December 18, 2020

You need to run without the -n option and with the -L option. The warning always pops up but virtually never leads to any data loss, and at worst only affects the last file being written.

seneo · December 18, 2020

Done.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
sb_fdblocks 75199576, counted 76180631
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (109:3726193) is ahead of log (1:2).
Format log to cycle 112.
done

Does it means it should be back to normal now and I can re-start the array? At least for this issue.

itimpi · December 18, 2020

Yes - restarting in normal mode and the disk should mount fine with its data intact.

seneo · December 18, 2020

Machine re-booted, array started and no more warnings for now. Thank you both for your help and your quick answers. I'll mark the topic as solved in a couple hours just to be sure (the warning wasn't always immedialty after starting the array).

bschaeff18 · April 2

Hello, I'm getting the same error message. I was hoping you could help me figure out which drive I need to repair. These are the logs:

unraidnas kernel: XFS (md5p1): Metadata corruption detected at xfs_dinode_verify+0xa0/0x732 [xfs], inode 0x13a6acf62 dinode

unraidnas kernel: XFS (md5p1): Unmount and run xfs_repair

I gathered from above that (md5p1) is the disk in question but what disk is that supposed to represent?

JonathanM · April 3

45 minutes ago, bschaeff18 said:

Hello, I'm getting the same error message. I was hoping you could help me figure out which drive I need to repair. These are the logs:

unraidnas kernel: XFS (md5p1): Metadata corruption detected at xfs_dinode_verify+0xa0/0x732 [xfs], inode 0x13a6acf62 dinode

unraidnas kernel: XFS (md5p1): Unmount and run xfs_repair

I gathered from above that (md5p1) is the disk in question but what disk is that supposed to represent?

diagnostics will contain that info

itimpi · April 3

8 hours ago, bschaeff18 said:

md5p1

This will be disk5. Any time you see a 'md' type device name the number refers to the disk with the corresponding number in the Unraid GUI.

[SOLVED] XFS Metadata corruption on 'unknown' disk

Recommended Posts

seneo

Link to comment

JorgeB

Link to comment

seneo

Link to comment

JorgeB

Link to comment

seneo

Link to comment

itimpi

Link to comment

seneo

Link to comment

itimpi

Link to comment

seneo

Link to comment

bschaeff18

Link to comment

JonathanM

Link to comment

itimpi

Link to comment

Join the conversation