Unmountable xfs disk - XFS (dm-2): Internal error xfs_efi_item_recover

Apeiron · July 4

The Problem:
I experienced a lockup during an array stop procedure, unclean shutdown, and now an unmountable disk. There were no known issues prior to this error.

What Happened:

I was preparing for a manual parity check. I went through my normal procedure: shutdown running services, stop array, reboot and parity check. I shutdown all VMs and Containers, and clicked Stop Array, the system hung while unmounting the disks. I left the system overnight, and came back in the morning to no progress. I did a hard shutdown via power button, unplugged the server power, then booted back up. I started the array in maintenance mode, and issued a parity check. There was a notification for the unclean shutdown. The parity check completed ~24 hours later (normal) with no issues. I stopped the array, and started it back in normal mode. Disk 3 in my array was detected as being unmountable. I stopped the array, shutdown, unplugged power, and booted back up. Disk 3 was still unmountable. Stopped the array again. Pulled the diagnostic log, and now I'm here.

syslog entry for disk 3:

Jul  4 11:49:06 solidsnake emhttpd: mounting /mnt/disk3
Jul  4 11:49:06 solidsnake emhttpd: shcmd (115): mkdir -p /mnt/disk3
Jul  4 11:49:06 solidsnake emhttpd: shcmd (116): mount -t xfs -o noatime,nouuid /dev/mapper/md3p1 /mnt/disk3
Jul  4 11:49:06 solidsnake kernel: XFS (dm-2): Mounting V5 Filesystem
Jul  4 11:49:06 solidsnake kernel: XFS (dm-2): Starting recovery (logdev: internal)
Jul  4 11:49:06 solidsnake kernel: 00000000: 36 12 01 00 01 00 00 00 40 d4 b7 3c 84 88 ff ff  6.......@..<....
Jul  4 11:49:06 solidsnake kernel: XFS (dm-2): Internal error xfs_efi_item_recover at line 614 of file fs/xfs/xfs_extfree_item.c.  Caller xlog_recover_process_intents+0x9c/0x25e [xfs]
Jul  4 11:49:06 solidsnake kernel: CPU: 12 PID: 14282 Comm: mount Tainted: P           O       6.1.64-Unraid #1
Jul  4 11:49:06 solidsnake kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470 Taichi Ultimate, BIOS P3.10 04/25/2019
Jul  4 11:49:06 solidsnake kernel: Call Trace:
Jul  4 11:49:06 solidsnake kernel: <TASK>
Jul  4 11:49:06 solidsnake kernel: dump_stack_lvl+0x44/0x5c
Jul  4 11:49:06 solidsnake kernel: xfs_corruption_error+0x63/0x83 [xfs]
Jul  4 11:49:06 solidsnake kernel: ? xlog_recover_process_intents+0x9c/0x25e [xfs]
Jul  4 11:49:06 solidsnake kernel: xfs_efi_item_recover+0x92/0x1a8 [xfs]
Jul  4 11:49:06 solidsnake kernel: ? xlog_recover_process_intents+0x9c/0x25e [xfs]
Jul  4 11:49:06 solidsnake kernel: xlog_recover_process_intents+0x9c/0x25e [xfs]
Jul  4 11:49:06 solidsnake kernel: ? preempt_latency_start+0x2b/0x46
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Jul  4 11:49:06 solidsnake kernel: xlog_recover_finish+0x2b/0x290 [xfs]
Jul  4 11:49:06 solidsnake kernel: ? xfs_ag_resv_init+0x164/0x1af [xfs]
Jul  4 11:49:06 solidsnake kernel: xfs_log_mount_finish+0x5a/0x111 [xfs]
Jul  4 11:49:06 solidsnake kernel: xfs_mountfs+0x5c6/0x73b [xfs]
Jul  4 11:49:06 solidsnake kernel: xfs_fs_fill_super+0x683/0x761 [xfs]
Jul  4 11:49:06 solidsnake kernel: ? xfs_open_devices+0x184/0x184 [xfs]
Jul  4 11:49:06 solidsnake kernel: get_tree_bdev+0x1d5/0x229
Jul  4 11:49:06 solidsnake kernel: vfs_get_tree+0x1c/0x8a
Jul  4 11:49:06 solidsnake kernel: path_mount+0x62f/0x70d
Jul  4 11:49:06 solidsnake kernel: do_mount+0x5c/0x8d
Jul  4 11:49:06 solidsnake kernel: __do_sys_mount+0x100/0x12e
Jul  4 11:49:06 solidsnake kernel: do_syscall_64+0x6b/0x81
Jul  4 11:49:06 solidsnake kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Jul  4 11:49:06 solidsnake kernel: RIP: 0033:0x14780a0c9eea
Jul  4 11:49:06 solidsnake kernel: Code: 48 8b 0d 31 1f 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fe 1e 0d 00 f7 d8 64 89 01 48
Jul  4 11:49:06 solidsnake kernel: RSP: 002b:00007ffcbd4cdd18 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
Jul  4 11:49:06 solidsnake kernel: RAX: ffffffffffffffda RBX: 000000000040f380 RCX: 000014780a0c9eea
Jul  4 11:49:06 solidsnake kernel: RDX: 000000000040f5d0 RSI: 000000000040f650 RDI: 000000000040f5b0
Jul  4 11:49:06 solidsnake kernel: RBP: 0000000000000000 R08: 000000000040f610 R09: 0000000000000060
Jul  4 11:49:06 solidsnake kernel: R10: 0000000000000400 R11: 0000000000000206 R12: 000000000040f5b0
Jul  4 11:49:06 solidsnake kernel: R13: 000000000040f5d0 R14: 000014780a25efa4 R15: 000000000040f498
Jul  4 11:49:06 solidsnake kernel: </TASK>
Jul  4 11:49:06 solidsnake kernel: XFS (dm-2): Corruption detected. Unmount and run xfs_repair
Jul  4 11:49:06 solidsnake kernel: XFS (dm-2): Failed to recover intents
Jul  4 11:49:06 solidsnake kernel: XFS (dm-2): Filesystem has been shut down due to log error (0x2).
Jul  4 11:49:06 solidsnake kernel: XFS (dm-2): Please unmount the filesystem and rectify the problem(s).
Jul  4 11:49:06 solidsnake kernel: XFS (dm-2): Ending recovery (logdev: internal)
Jul  4 11:49:06 solidsnake kernel: XFS (dm-2): log mount finish failed
Jul  4 11:49:06 solidsnake root: mount: /mnt/disk3: mount(2) system call failed: Structure needs cleaning.
Jul  4 11:49:06 solidsnake root:        dmesg(1) may have more information after failed mount system call.
Jul  4 11:49:06 solidsnake emhttpd: shcmd (116): exit status: 32
Jul  4 11:49:06 solidsnake emhttpd: /mnt/disk3 mount error: Unsupported or no file system
Jul  4 11:49:06 solidsnake emhttpd: shcmd (117): rmdir /mnt/disk3

Next steps:
Looking through some of the other posts, it would appear that I need to run a file system check. The logs mention running xfs_repair. Before I proceed with anything else, I wanted to confirm what exactly my next steps should be to maximize chances of recovering the disk intact.

Thank you for any help you can provide.

solidsnake-diagnostics-20240704-1153.zip

itimpi · July 4

First thing is to run a check filesystem via the GUI. If run with -n (the default) nothing is done but the check results might give a clue as to how well a repair would go so post those results here for feedback.

Apeiron · July 4

20 minutes ago, itimpi said:

First thing is to run a check filesystem via the GUI. If run with -n (the default) nothing is done but the check results might give a clue as to how well a repair would go so post those results here for feedback.

I started the array in maintenance mode per the XFS instructions. Here is the output from the filesystem check -n on Disk 3. If I'm reading this correctly, it looks like a single file has an issue, and would be rebuilt if modify was allowed?

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used. Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
- scan filesystem freespace and inode maps...
sb_fdblocks 1489953900, counted 1503578523
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
inode 15068100473 - bad extent starting block number 4503567550935200, offset 0
correcting nextents for inode 15068100473
bad data fork in inode 15068100473
would have cleared inode 15068100473
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 7
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 6
- agno = 5
entry "Phil Hine - Tantrum Magick.pdf" at block 0 offset 2672 in directory inode 15066855983 references free inode 15068100473
would clear inode number in entry at offset 2672...
inode 15068100473 - bad extent starting block number 4503567550935200, offset 0
correcting nextents for inode 15068100473
bad data fork in inode 15068100473
would have cleared inode 15068100473
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
entry "Phil Hine - Tantrum Magick.pdf" in directory inode 15066855983 points to free inode 15068100473, would junk entry
bad hash table for directory inode 15066855983 (no data entry): would rebuild
would rebuild directory inode 15066855983
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

JorgeB · July 4

Run it again without -n, and if it asks for -L use it.

Apeiron · July 4

Reran the check with no option, which would not proceed. Did as suggested, ran it with -L. The repair appears successful, I stopped the array and started it back in normal mode. Disk 3 mounted successfully. Everything looks normal.

Thank you both for your responses and help. Unless further checks are necessary, I'll marked this as solved.

Unmountable xfs disk - XFS (dm-2): Internal error xfs_efi_item_recover

Recommended Posts

Apeiron

Link to comment

itimpi

Link to comment

Apeiron

Link to comment

JorgeB

Link to comment

Apeiron

Link to comment

Join the conversation