campfred Posted July 11, 2021 Share Posted July 11, 2021 Hello everyone! I would like having some assistance with a drive suffering data corruption. I noticed about it not because of a notification but rather because I wasn't able to write to Array powered shares (Cache-only ones were working fine). So, I went to check on the syslog and noticed this : Jul 11 15:45:02 Alfred kernel: XFS (md3): Corruption detected! Free inode 0x1800d6ba7 not marked free! (mode 0x41ed) Jul 11 15:45:02 Alfred kernel: XFS (md3): Internal error xfs_trans_cancel at line 954 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x280/0x2ea [xfs] Jul 11 15:45:02 Alfred kernel: CPU: 0 PID: 32201 Comm: shfs Tainted: P U O 5.10.28-Unraid #1 Jul 11 15:45:02 Alfred kernel: Hardware name: ASUS All Series/Z87-C, BIOS 2103 08/15/2014 Jul 11 15:45:02 Alfred kernel: Call Trace: Jul 11 15:45:02 Alfred kernel: dump_stack+0x6b/0x83 Jul 11 15:45:02 Alfred kernel: xfs_trans_cancel+0x52/0xc9 [xfs] Jul 11 15:45:02 Alfred kernel: xfs_create+0x280/0x2ea [xfs] Jul 11 15:45:02 Alfred kernel: xfs_generic_create+0xc9/0x1ed [xfs] Jul 11 15:45:02 Alfred kernel: vfs_mkdir+0x55/0x77 Jul 11 15:45:02 Alfred kernel: do_mkdirat+0x7a/0xc7 Jul 11 15:45:02 Alfred kernel: do_syscall_64+0x5d/0x6a Jul 11 15:45:02 Alfred kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 11 15:45:02 Alfred kernel: RIP: 0033:0x14d104ab8467 Jul 11 15:45:02 Alfred kernel: Code: 1f 40 00 48 8b 05 29 8a 0d 00 64 c7 00 5f 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 89 0d 00 f7 d8 64 89 01 48 Jul 11 15:45:02 Alfred kernel: RSP: 002b:000014d0fdc18bb8 EFLAGS: 00000206 ORIG_RAX: 0000000000000053 Jul 11 15:45:02 Alfred kernel: RAX: ffffffffffffffda RBX: 000014d0e8083cc0 RCX: 000014d104ab8467 Jul 11 15:45:02 Alfred kernel: RDX: 00000000000001c0 RSI: 00000000000001c0 RDI: 000014d0e807aae0 Jul 11 15:45:02 Alfred kernel: RBP: 000014d0fdc18bf0 R08: 000014d0e8561820 R09: 0065766973756c63 Jul 11 15:45:02 Alfred kernel: R10: 000014d0e807fe80 R11: 0000000000000206 R12: 0000000000000000 Jul 11 15:45:02 Alfred kernel: R13: 000000000000a67d R14: 000014d0e8087040 R15: 00000000000001c0 Jul 11 15:45:02 Alfred kernel: XFS (md3): xfs_do_force_shutdown(0x8) called from line 955 of file fs/xfs/xfs_trans.c. Return address = 00000000a737bb2b Jul 11 15:45:02 Alfred kernel: XFS (md3): Corruption of in-memory data detected. Shutting down filesystem Jul 11 15:45:02 Alfred kernel: XFS (md3): Please unmount the filesystem and rectify the problem(s) What I understood from this message : Data corruption has been found on Drive 3 (md3) and unRAID is stopping all I/O transfers to the Array and requesting that I unmount and check the drive. Fine, I'm gonna follow the « Check Disk Filesystems » guide in the wiki and I should be good. Except, I ran the check in verbose with no modify (so, with « xfs_repair -nv /dev/md3 ») and I don't understand what's the error? Here's the output from it: Phase 1 - find and verify superblock... - block cache size set to 1040488 entries Phase 2 - using internal log - zero log... zero_log: head block 5123 tail block 5119 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... agi_freecount 128, counted 105 in ag 9 agi_freecount 128, counted 105 in ag 9 finobt agi_freecount 63, counted 61 in ag 10 agi_freecount 63, counted 61 in ag 10 finobt sb_fdblocks 331469875, counted 345095662 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 imap claims in-use inode 6443330471 is free, correcting imap imap claims in-use inode 6443330472 is free, correcting imap imap claims in-use inode 6443330473 is free, correcting imap imap claims in-use inode 6443330474 is free, correcting imap - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 5 - agno = 3 - agno = 7 - agno = 6 - agno = 4 - agno = 8 - agno = 9 - agno = 10 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected dir inode 10856320197, would move to lost+found disconnected dir inode 21596214240, would move to lost+found Phase 7 - verify link counts... Maximum metadata LSN (11:7968) is ahead of log (11:5123). Would format log to cycle 14. No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Sun Jul 11 17:33:36 2021 Phase Start End Duration Phase 1: 07/11 17:33:18 07/11 17:33:18 Phase 2: 07/11 17:33:18 07/11 17:33:19 1 second Phase 3: 07/11 17:33:19 07/11 17:33:31 12 seconds Phase 4: 07/11 17:33:31 07/11 17:33:31 Phase 5: Skipped Phase 6: 07/11 17:33:31 07/11 17:33:36 5 seconds Phase 7: 07/11 17:33:36 07/11 17:33:36 Total run time: 18 seconds Okay, there's an alert for the FS' log telling me to mount the disk to resolve the log inconsistencies. ...Except after I did mount the Array back, I went back to square one with my array being I/O blocked because of corruption. So, I went back in Maintenance mode and tried to do the repair anyway to see if it's gonna attempt to do something with the log but nope, it asks me to mount the drive first. Phase 1 - find and verify superblock... - block cache size set to 1040488 entries Phase 2 - using internal log - zero log... zero_log: head block 5123 tail block 5119 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. I did power down the server and go replace the power and data cables for all my drives following that just in case it would be a cable fail causing this and there is no bend or kinks on them. 'Still getting that state, though. Now, I don't know what else I can do for resolving this issue. Does someone have an idea or a pointer that could potentially help me solve this? Of course, diagnostics data is attached to this post. Thank you very much for taking your time to read me! alfred-diagnostics-20210711-1735.zip Quote Link to comment
Squid Posted July 12, 2021 Share Posted July 12, 2021 Just do the -L flag. Usually there's no corruption. 1 Quote Link to comment
campfred Posted July 13, 2021 Author Share Posted July 13, 2021 On 7/11/2021 at 9:49 PM, Squid said: Just do the -L flag. Usually there's no corruption. Thank you for the pointer! It looks like it redone the log on the f.s. and it's mounting properly, now! I'll wait 'till the end of the week to see if something comes up and the array locks up the drive again. If everything's fine by the weekend, I'll mark the thread as solved. Command output for anyone who'd be interested or are in the same situation : root@Alfred:~# xfs_repair /dev/md3 -Lv Phase 1 - find and verify superblock... - block cache size set to 1040488 entries Phase 2 - using internal log - zero log... zero_log: head block 5123 tail block 5119 ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... agi_freecount 128, counted 105 in ag 9 agi_freecount 128, counted 105 in ag 9 finobt agi_freecount 63, counted 61 in ag 10 agi_freecount 63, counted 61 in ag 10 finobt sb_fdblocks 331469875, counted 345095662 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 imap claims in-use inode 6443330471 is free, correcting imap imap claims in-use inode 6443330472 is free, correcting imap imap claims in-use inode 6443330473 is free, correcting imap imap claims in-use inode 6443330474 is free, correcting imap - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 3 - agno = 6 - agno = 1 - agno = 7 - agno = 5 - agno = 4 - agno = 8 - agno = 9 - agno = 10 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected dir inode 10856320197, moving to lost+found disconnected dir inode 21596214240, moving to lost+found Phase 7 - verify and correct link counts... resetting inode 1181372 nlinks from 3 to 5 Maximum metadata LSN (11:7988) is ahead of log (1:2). Format log to cycle 14. XFS_REPAIR Summary Tue Jul 13 09:33:48 2021 Phase Start End Duration Phase 1: 07/13 09:31:14 07/13 09:31:14 Phase 2: 07/13 09:31:14 07/13 09:31:46 32 seconds Phase 3: 07/13 09:31:46 07/13 09:31:58 12 seconds Phase 4: 07/13 09:31:58 07/13 09:31:58 Phase 5: 07/13 09:31:58 07/13 09:31:59 1 second Phase 6: 07/13 09:31:59 07/13 09:32:05 6 seconds Phase 7: 07/13 09:32:05 07/13 09:32:05 Total run time: 51 seconds done root@Alfred:~# Quote Link to comment
trurl Posted July 13, 2021 Share Posted July 13, 2021 Be sure to check your lost+found share for anything the repair might have put there. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.