Apeiron Posted July 4 Share Posted July 4 The Problem: I experienced a lockup during an array stop procedure, unclean shutdown, and now an unmountable disk. There were no known issues prior to this error. What Happened: I was preparing for a manual parity check. I went through my normal procedure: shutdown running services, stop array, reboot and parity check. I shutdown all VMs and Containers, and clicked Stop Array, the system hung while unmounting the disks. I left the system overnight, and came back in the morning to no progress. I did a hard shutdown via power button, unplugged the server power, then booted back up. I started the array in maintenance mode, and issued a parity check. There was a notification for the unclean shutdown. The parity check completed ~24 hours later (normal) with no issues. I stopped the array, and started it back in normal mode. Disk 3 in my array was detected as being unmountable. I stopped the array, shutdown, unplugged power, and booted back up. Disk 3 was still unmountable. Stopped the array again. Pulled the diagnostic log, and now I'm here. syslog entry for disk 3: Jul 4 11:49:06 solidsnake emhttpd: mounting /mnt/disk3 Jul 4 11:49:06 solidsnake emhttpd: shcmd (115): mkdir -p /mnt/disk3 Jul 4 11:49:06 solidsnake emhttpd: shcmd (116): mount -t xfs -o noatime,nouuid /dev/mapper/md3p1 /mnt/disk3 Jul 4 11:49:06 solidsnake kernel: XFS (dm-2): Mounting V5 Filesystem Jul 4 11:49:06 solidsnake kernel: XFS (dm-2): Starting recovery (logdev: internal) Jul 4 11:49:06 solidsnake kernel: 00000000: 36 12 01 00 01 00 00 00 40 d4 b7 3c 84 88 ff ff 6.......@..<.... Jul 4 11:49:06 solidsnake kernel: XFS (dm-2): Internal error xfs_efi_item_recover at line 614 of file fs/xfs/xfs_extfree_item.c. Caller xlog_recover_process_intents+0x9c/0x25e [xfs] Jul 4 11:49:06 solidsnake kernel: CPU: 12 PID: 14282 Comm: mount Tainted: P O 6.1.64-Unraid #1 Jul 4 11:49:06 solidsnake kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470 Taichi Ultimate, BIOS P3.10 04/25/2019 Jul 4 11:49:06 solidsnake kernel: Call Trace: Jul 4 11:49:06 solidsnake kernel: <TASK> Jul 4 11:49:06 solidsnake kernel: dump_stack_lvl+0x44/0x5c Jul 4 11:49:06 solidsnake kernel: xfs_corruption_error+0x63/0x83 [xfs] Jul 4 11:49:06 solidsnake kernel: ? xlog_recover_process_intents+0x9c/0x25e [xfs] Jul 4 11:49:06 solidsnake kernel: xfs_efi_item_recover+0x92/0x1a8 [xfs] Jul 4 11:49:06 solidsnake kernel: ? xlog_recover_process_intents+0x9c/0x25e [xfs] Jul 4 11:49:06 solidsnake kernel: xlog_recover_process_intents+0x9c/0x25e [xfs] Jul 4 11:49:06 solidsnake kernel: ? preempt_latency_start+0x2b/0x46 ### [PREVIOUS LINE REPEATED 1 TIMES] ### Jul 4 11:49:06 solidsnake kernel: xlog_recover_finish+0x2b/0x290 [xfs] Jul 4 11:49:06 solidsnake kernel: ? xfs_ag_resv_init+0x164/0x1af [xfs] Jul 4 11:49:06 solidsnake kernel: xfs_log_mount_finish+0x5a/0x111 [xfs] Jul 4 11:49:06 solidsnake kernel: xfs_mountfs+0x5c6/0x73b [xfs] Jul 4 11:49:06 solidsnake kernel: xfs_fs_fill_super+0x683/0x761 [xfs] Jul 4 11:49:06 solidsnake kernel: ? xfs_open_devices+0x184/0x184 [xfs] Jul 4 11:49:06 solidsnake kernel: get_tree_bdev+0x1d5/0x229 Jul 4 11:49:06 solidsnake kernel: vfs_get_tree+0x1c/0x8a Jul 4 11:49:06 solidsnake kernel: path_mount+0x62f/0x70d Jul 4 11:49:06 solidsnake kernel: do_mount+0x5c/0x8d Jul 4 11:49:06 solidsnake kernel: __do_sys_mount+0x100/0x12e Jul 4 11:49:06 solidsnake kernel: do_syscall_64+0x6b/0x81 Jul 4 11:49:06 solidsnake kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce Jul 4 11:49:06 solidsnake kernel: RIP: 0033:0x14780a0c9eea Jul 4 11:49:06 solidsnake kernel: Code: 48 8b 0d 31 1f 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fe 1e 0d 00 f7 d8 64 89 01 48 Jul 4 11:49:06 solidsnake kernel: RSP: 002b:00007ffcbd4cdd18 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 Jul 4 11:49:06 solidsnake kernel: RAX: ffffffffffffffda RBX: 000000000040f380 RCX: 000014780a0c9eea Jul 4 11:49:06 solidsnake kernel: RDX: 000000000040f5d0 RSI: 000000000040f650 RDI: 000000000040f5b0 Jul 4 11:49:06 solidsnake kernel: RBP: 0000000000000000 R08: 000000000040f610 R09: 0000000000000060 Jul 4 11:49:06 solidsnake kernel: R10: 0000000000000400 R11: 0000000000000206 R12: 000000000040f5b0 Jul 4 11:49:06 solidsnake kernel: R13: 000000000040f5d0 R14: 000014780a25efa4 R15: 000000000040f498 Jul 4 11:49:06 solidsnake kernel: </TASK> Jul 4 11:49:06 solidsnake kernel: XFS (dm-2): Corruption detected. Unmount and run xfs_repair Jul 4 11:49:06 solidsnake kernel: XFS (dm-2): Failed to recover intents Jul 4 11:49:06 solidsnake kernel: XFS (dm-2): Filesystem has been shut down due to log error (0x2). Jul 4 11:49:06 solidsnake kernel: XFS (dm-2): Please unmount the filesystem and rectify the problem(s). Jul 4 11:49:06 solidsnake kernel: XFS (dm-2): Ending recovery (logdev: internal) Jul 4 11:49:06 solidsnake kernel: XFS (dm-2): log mount finish failed Jul 4 11:49:06 solidsnake root: mount: /mnt/disk3: mount(2) system call failed: Structure needs cleaning. Jul 4 11:49:06 solidsnake root: dmesg(1) may have more information after failed mount system call. Jul 4 11:49:06 solidsnake emhttpd: shcmd (116): exit status: 32 Jul 4 11:49:06 solidsnake emhttpd: /mnt/disk3 mount error: Unsupported or no file system Jul 4 11:49:06 solidsnake emhttpd: shcmd (117): rmdir /mnt/disk3 Next steps: Looking through some of the other posts, it would appear that I need to run a file system check. The logs mention running xfs_repair. Before I proceed with anything else, I wanted to confirm what exactly my next steps should be to maximize chances of recovering the disk intact. Thank you for any help you can provide. solidsnake-diagnostics-20240704-1153.zip Quote Link to comment
itimpi Posted July 4 Share Posted July 4 First thing is to run a check filesystem via the GUI. If run with -n (the default) nothing is done but the check results might give a clue as to how well a repair would go so post those results here for feedback. 1 Quote Link to comment
Apeiron Posted July 4 Author Share Posted July 4 20 minutes ago, itimpi said: First thing is to run a check filesystem via the GUI. If run with -n (the default) nothing is done but the check results might give a clue as to how well a repair would go so post those results here for feedback. I started the array in maintenance mode per the XFS instructions. Here is the output from the filesystem check -n on Disk 3. If I'm reading this correctly, it looks like a single file has an issue, and would be rebuilt if modify was allowed? Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_fdblocks 1489953900, counted 1503578523 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 inode 15068100473 - bad extent starting block number 4503567550935200, offset 0 correcting nextents for inode 15068100473 bad data fork in inode 15068100473 would have cleared inode 15068100473 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 7 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 6 - agno = 5 entry "Phil Hine - Tantrum Magick.pdf" at block 0 offset 2672 in directory inode 15066855983 references free inode 15068100473 would clear inode number in entry at offset 2672... inode 15068100473 - bad extent starting block number 4503567550935200, offset 0 correcting nextents for inode 15068100473 bad data fork in inode 15068100473 would have cleared inode 15068100473 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... entry "Phil Hine - Tantrum Magick.pdf" in directory inode 15066855983 points to free inode 15068100473, would junk entry bad hash table for directory inode 15066855983 (no data entry): would rebuild would rebuild directory inode 15066855983 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Quote Link to comment
Solution JorgeB Posted July 4 Solution Share Posted July 4 Run it again without -n, and if it asks for -L use it. 1 Quote Link to comment
Apeiron Posted July 4 Author Share Posted July 4 Reran the check with no option, which would not proceed. Did as suggested, ran it with -L. The repair appears successful, I stopped the array and started it back in normal mode. Disk 3 mounted successfully. Everything looks normal. Thank you both for your responses and help. Unless further checks are necessary, I'll marked this as solved. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.