October 29, 20169 yr I wanted to share my recent experience with the unraid community. After copying 10K of pictures and a few dozen movies overnight, my unraid system crashed. The system was unresponsive and required that I power it down to bring it back. After the power up and restart of the array and the OS crashed again. The next power up, I tailed the /var/log/syslog on the CLI and found the following: Internal error XFS_WANT_CORRUPTED_GOTO - Log below... Oct 29 10:31:59 Tower emhttp: shcmd (57): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md1 /mnt/disk1 |& logger Oct 29 10:31:59 Tower kernel: XFS (md1): Mounting V5 Filesystem Oct 29 10:32:01 Tower kernel: XFS (md1): Starting recovery (logdev: internal) Oct 29 10:32:03 Tower kernel: XFS (md1): Internal error XFS_WANT_CORRUPTED_GOTO at line 3156 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x419/0x558 Oct 29 10:32:03 Tower kernel: CPU: 1 PID: 5433 Comm: mount Not tainted 4.4.26-unRAID #1 Damn.. It was starting to be a bad day. I searched the forums and found on example on CentOS where recover was with the xfs_repair. For your future reference, here is what I did to bring my unraid back alive again.. Based on the first line in the log, I know there is a problem mounting /dev/md1 as xfs. I wanted to be 100% that was the only disk with the problem. So I started to troubleshoot one disk at a time... First, I started the array in maintenance mode Second on the CLI, I started to make mount points for each disk. root@Tower:~# mkdir /mnt/disk1 root@Tower:~# mkdir /mnt/disk2 root@Tower:~# mkdir /mnt/disk3 root@Tower:~# mkdir /mnt/disk4 root@Tower:~# mkdir /mnt/disk5 Then, I started to mount them with mount -t xfs -o noatime,nodiratime /dev/mdX /mnt/diskX As soon as I ran mount -t xfs -o noatime,nodiratime /dev/md1 /mnt/disk1 - This systrem crashed again. I pulled the plug and powered the box back on.. I started the process all over again, but starting with the second disk. 1. Start the array in mainteance mode 2. make the mount points 3. mount the devices as xfs to the mount point Success: Oct 29 12:12:10 Tower kernel: XFS (md2): Mounting V5 Filesystem Oct 29 12:12:11 Tower kernel: XFS (md2): Ending clean mount And the success for next three disk. I had some luck. Now I know this problem is limited to md1. Then next step was to use xfs_repair to fix the xfs corruption root@Tower:~# xfs_repair -L /dev/md1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... Metadata corruption detected at xfs_agf block 0x74704441/0x200 flfirst 118 in agf 2 too large (max = 118) agi unlinked bucket 33 is 11992225 in ag 2 (inode=2159475873) sb_icount 401216, counted 401152 sb_ifree 3630, counted 6458 sb_fdblocks 125505044, counted 125513505 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 2159475873, moving to lost+found Phase 7 - verify and correct link counts... Maximum metadata LSN (10:416611) is ahead of log (1:2). Format log to cycle 13. done Next, I mounted the drive - mount -t xfs -o noatime,nodiratime /dev/md1 /mnt/disk1 Success: Oct 29 12:12:10 Tower kernel: XFS (md1): Mounting V5 Filesystem Oct 29 12:12:11 Tower kernel: XFS (md1): Ending clean mount Great !! All the data is there.. On to the final steps to bring the unraid back alive I had to unmount all the drive, since I manaully mounted them; root@Tower:~# umount /mnt/disk1 root@Tower:~# umount /mnt/disk2 root@Tower:~# umount /mnt/disk3 root@Tower:~# umount /mnt/disk4 root@Tower:~# umount /mnt/disk5 I went back to the GUI and stopped the array, then retarted the array.. Success... Happy again Error Log: Oct 29 10:31:59 Tower emhttp: shcmd (57): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md1 /mnt/disk1 |& logger Oct 29 10:31:59 Tower kernel: XFS (md1): Mounting V5 Filesystem Oct 29 10:32:01 Tower kernel: XFS (md1): Starting recovery (logdev: internal) Oct 29 10:32:03 Tower kernel: XFS (md1): Internal error XFS_WANT_CORRUPTED_GOTO at line 3156 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x419/0x558 Oct 29 10:32:03 Tower kernel: CPU: 1 PID: 5433 Comm: mount Not tainted 4.4.26-unRAID #1 Oct 29 10:32:03 Tower kernel: Hardware name: /DQ77KB, BIOS KBQ7710H.86A.0038.2012.0425.1537 04/25/2012 Oct 29 10:32:03 Tower kernel: 0000000000000000 ffff8800d54bfa38 ffffffff8136ad4c ffff880409142820 Oct 29 10:32:03 Tower kernel: 0000000000000000 ffff8800d54bfa50 ffffffff81275975 ffffffff81245f52 Oct 29 10:32:03 Tower kernel: ffff8800d54bfac0 ffffffff8125a119 ffffffff8125929e 00000000e6fef1d0 Oct 29 10:32:03 Tower kernel: Call Trace: Oct 29 10:32:03 Tower kernel: [<ffffffff8136ad4c>] dump_stack+0x61/0x7e Oct 29 10:32:03 Tower kernel: [<ffffffff81275975>] xfs_error_report+0x32/0x35 Oct 29 10:32:03 Tower kernel: [<ffffffff81245f52>] ? xfs_free_ag_extent+0x419/0x558 Oct 29 10:32:03 Tower kernel: [<ffffffff8125a119>] xfs_btree_insert+0xba/0x152 Oct 29 10:32:03 Tower kernel: [<ffffffff8125929e>] ? xfs_btree_lookup+0x307/0x4a1 Oct 29 10:32:03 Tower kernel: [<ffffffff81245f52>] xfs_free_ag_extent+0x419/0x558 Oct 29 10:32:03 Tower kernel: [<ffffffff81245f52>] ? xfs_free_ag_extent+0x419/0x558 Oct 29 10:32:03 Tower kernel: [<ffffffff81246b8e>] xfs_free_extent+0xbd/0xed Oct 29 10:32:03 Tower kernel: [<ffffffff81295aa2>] xfs_trans_free_extent+0x21/0x58 Oct 29 10:32:03 Tower kernel: [<ffffffff81291479>] xlog_recover_process_efi+0x125/0x155 Oct 29 10:32:03 Tower kernel: [<ffffffff8129151a>] xlog_recover_process_efis+0x71/0xb5 Oct 29 10:32:03 Tower kernel: [<ffffffff81076168>] ? wake_up_bit+0x1d/0x1f Oct 29 10:32:03 Tower kernel: [<ffffffff8127a657>] ? xfs_iget+0x50f/0x54e Oct 29 10:32:03 Tower kernel: [<ffffffff812948bc>] xlog_recover_finish+0x18/0x8b Oct 29 10:32:03 Tower kernel: [<ffffffff812948bc>] ? xlog_recover_finish+0x18/0x8b Oct 29 10:32:03 Tower kernel: [<ffffffff8128bbaf>] xfs_log_mount_finish+0x20/0x36 Oct 29 10:32:03 Tower kernel: [<ffffffff81284e24>] xfs_mountfs+0x601/0x6a8 Oct 29 10:32:03 Tower kernel: [<ffffffff81287724>] xfs_fs_fill_super+0x3fd/0x489 Oct 29 10:32:03 Tower kernel: [<ffffffff8110c53b>] mount_bdev+0x141/0x195 Oct 29 10:32:03 Tower kernel: [<ffffffff81287327>] ? xfs_parseargs+0x8c1/0x8c1 Oct 29 10:32:03 Tower kernel: [<ffffffff81285ce2>] xfs_fs_mount+0x10/0x12 Oct 29 10:32:03 Tower kernel: [<ffffffff8110d1ac>] mount_fs+0xf/0x84 Oct 29 10:32:03 Tower kernel: [<ffffffff81121cc9>] vfs_kern_mount+0x65/0xf7 Oct 29 10:32:03 Tower kernel: [<ffffffff8112463f>] do_mount+0x91c/0xa72 Oct 29 10:32:03 Tower kernel: [<ffffffff810ce50e>] ? strndup_user+0x3a/0x82 Oct 29 10:32:03 Tower kernel: [<ffffffff81124984>] SyS_mount+0x70/0x9c Oct 29 10:32:03 Tower kernel: [<ffffffff816213ae>] entry_SYSCALL_64_fastpath+0x12/0x6d Oct 29 10:32:03 Tower kernel: XFS (md1): Internal error xfs_trans_cancel at line 990 of file fs/xfs/xfs_trans.c. Caller xlog_recover_process_efi+0x148/0x155 Oct 29 10:32:03 Tower kernel: CPU: 1 PID: 5433 Comm: mount Not tainted 4.4.26-unRAID #1 Oct 29 10:32:03 Tower kernel: Hardware name: /DQ77KB, BIOS KBQ7710H.86A.0038.2012.0425.1537 04/25/2012 Oct 29 10:32:03 Tower kernel: 0000000000000000 ffff8800d54bfbd8 ffffffff8136ad4c ffff8803e6fef000 Oct 29 10:32:03 Tower kernel: 0000000000000000 ffff8800d54bfbf0 ffffffff81275975 ffffffff8129149c Oct 29 10:32:03 Tower kernel: ffff8800d54bfc18 ffffffff81289b8b ffff8803e6c56000 ffff8803e6c56190 Oct 29 10:32:03 Tower kernel: Call Trace: Oct 29 10:32:03 Tower kernel: [<ffffffff8136ad4c>] dump_stack+0x61/0x7e Oct 29 10:32:03 Tower kernel: [<ffffffff81275975>] xfs_error_report+0x32/0x35 Oct 29 10:32:03 Tower kernel: [<ffffffff8129149c>] ? xlog_recover_process_efi+0x148/0x155 Oct 29 10:32:03 Tower kernel: [<ffffffff81289b8b>] xfs_trans_cancel+0x49/0xbf Oct 29 10:32:03 Tower kernel: [<ffffffff8129149c>] xlog_recover_process_efi+0x148/0x155 Oct 29 10:32:03 Tower kernel: [<ffffffff8129151a>] xlog_recover_process_efis+0x71/0xb5 Oct 29 10:32:03 Tower kernel: [<ffffffff81076168>] ? wake_up_bit+0x1d/0x1f Oct 29 10:32:03 Tower kernel: [<ffffffff8127a657>] ? xfs_iget+0x50f/0x54e Oct 29 10:32:03 Tower kernel: [<ffffffff812948bc>] xlog_recover_finish+0x18/0x8b Oct 29 10:32:03 Tower kernel: [<ffffffff812948bc>] ? xlog_recover_finish+0x18/0x8b Oct 29 10:32:03 Tower kernel: [<ffffffff8128bbaf>] xfs_log_mount_finish+0x20/0x36 Oct 29 10:32:03 Tower kernel: [<ffffffff81284e24>] xfs_mountfs+0x601/0x6a8 Oct 29 10:32:03 Tower kernel: [<ffffffff81287724>] xfs_fs_fill_super+0x3fd/0x489 Oct 29 10:32:03 Tower kernel: [<ffffffff8110c53b>] mount_bdev+0x141/0x195 Oct 29 10:32:03 Tower kernel: [<ffffffff81287327>] ? xfs_parseargs+0x8c1/0x8c1 Oct 29 10:32:03 Tower kernel: [<ffffffff81285ce2>] xfs_fs_mount+0x10/0x12 Oct 29 10:32:03 Tower kernel: [<ffffffff8110d1ac>] mount_fs+0xf/0x84 Oct 29 10:32:03 Tower kernel: [<ffffffff81121cc9>] vfs_kern_mount+0x65/0xf7 Oct 29 10:32:03 Tower kernel: [<ffffffff8112463f>] do_mount+0x91c/0xa72 Oct 29 10:32:03 Tower kernel: [<ffffffff810ce50e>] ? strndup_user+0x3a/0x82 Oct 29 10:32:03 Tower kernel: [<ffffffff81124984>] SyS_mount+0x70/0x9c Oct 29 10:32:03 Tower kernel: [<ffffffff816213ae>] entry_SYSCALL_64_fastpath+0x12/0x6d Oct 29 10:32:03 Tower kernel: XFS (md1): xfs_do_force_shutdown(0x8) called from line 991 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff81289ba4 Oct 29 10:32:03 Tower kernel: XFS (md1): Corruption of in-memory data detected. Shutting down filesystem Oct 29 10:32:03 Tower kernel: XFS (md1): Please umount the filesystem and rectify the problem(s) Oct 29 10:32:03 Tower kernel: XFS (md1): Failed to recover EFIs Oct 29 10:32:03 Tower kernel: XFS (md1): log mount finish failed Oct 29 10:32:03 Tower kernel: XFS (md1): xfs_log_force: error -5 returned.
October 29, 20169 yr And I'm sure it was a great learning experience! But I had to comment that almost all of that is built in, not necessary, especially all the mounting and unmounting. That first syslog excerpt indicated it was XFS corruption on md1 which is Disk 1, so if you had restarted the array in Maintenance mode, then clicked on Disk 1, you would see xfs_repair all ready to run on the disk. It first offers to do a check only, but if you remove the -n option, it makes any changes necessary. You used the -L option, which is sometimes necessary, but not wise to do unless it instructs you to. Also, it looked like you mounted before running xfs_repair - The instructions for xfs_repair say it should be run with the file system unmounted. There's a wiki page to help, Check Disk File systems.
Archived
This topic is now archived and is closed to further replies.