RESOLVED: Internal error XFS_WANT_CORRUPTED_GOTO

October 29, 20169 yr

I wanted to share my recent experience with the unraid community. After copying 10K of pictures and a few dozen movies overnight, my unraid system crashed.

The system was unresponsive and required that I power it down to bring it back. After the power up and restart of the array and the OS crashed again. The next power up, I tailed the /var/log/syslog on the CLI and found the following: Internal error XFS_WANT_CORRUPTED_GOTO - Log below...

Oct 29 10:31:59 Tower emhttp: shcmd (57): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md1 /mnt/disk1 |& logger

Oct 29 10:31:59 Tower kernel: XFS (md1): Mounting V5 Filesystem

Oct 29 10:32:01 Tower kernel: XFS (md1): Starting recovery (logdev: internal)

Oct 29 10:32:03 Tower kernel: XFS (md1): Internal error XFS_WANT_CORRUPTED_GOTO at line 3156 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x419/0x558

Oct 29 10:32:03 Tower kernel: CPU: 1 PID: 5433 Comm: mount Not tainted 4.4.26-unRAID #1

Damn.. It was starting to be a bad day. I searched the forums and found on example on CentOS where recover was with the xfs_repair. For your future reference, here is what I did to bring my unraid back alive again..

Based on the first line in the log, I know there is a problem mounting /dev/md1 as xfs. I wanted to be 100% that was the only disk with the problem. So I started to troubleshoot one disk at a time...

First, I started the array in maintenance mode

Second on the CLI, I started to make mount points for each disk.

root@Tower:~# mkdir /mnt/disk1

root@Tower:~# mkdir /mnt/disk2

root@Tower:~# mkdir /mnt/disk3

root@Tower:~# mkdir /mnt/disk4

root@Tower:~# mkdir /mnt/disk5

Then, I started to mount them with mount -t xfs -o noatime,nodiratime /dev/mdX /mnt/diskX

As soon as I ran mount -t xfs -o noatime,nodiratime /dev/md1 /mnt/disk1 - This systrem crashed again.

I pulled the plug and powered the box back on..

I started the process all over again, but starting with the second disk.

1. Start the array in mainteance mode

2. make the mount points

3. mount the devices as xfs to the mount point

Success:

Oct 29 12:12:10 Tower kernel: XFS (md2): Mounting V5 Filesystem

Oct 29 12:12:11 Tower kernel: XFS (md2): Ending clean mount

And the success for next three disk. I had some luck.

Now I know this problem is limited to md1. Then next step was to use xfs_repair to fix the xfs corruption

root@Tower:~# xfs_repair -L /dev/md1

Phase 1 - find and verify superblock...

Phase 2 - using internal log

- zero log...

ALERT: The filesystem has valuable metadata changes in a log which is being

destroyed because the -L option was used.

- scan filesystem freespace and inode maps...

Metadata corruption detected at xfs_agf block 0x74704441/0x200

flfirst 118 in agf 2 too large (max = 118)

agi unlinked bucket 33 is 11992225 in ag 2 (inode=2159475873)

sb_icount 401216, counted 401152

sb_ifree 3630, counted 6458

sb_fdblocks 125505044, counted 125513505

- found root inode chunk

Phase 3 - for each AG...

- scan and clear agi unlinked lists...

- process known inodes and perform inode discovery...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- process newly discovered inodes...

Phase 4 - check for duplicate blocks...

- setting up duplicate extent list...

- check for inodes claiming duplicate blocks...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

Phase 5 - rebuild AG headers and trees...

- reset superblock...

Phase 6 - check inode connectivity...

- resetting contents of realtime bitmap and summary inodes

- traversing filesystem ...

- traversal finished ...

- moving disconnected inodes to lost+found ...

disconnected inode 2159475873, moving to lost+found

Phase 7 - verify and correct link counts...

Maximum metadata LSN (10:416611) is ahead of log (1:2).

Format log to cycle 13.

done

Next, I mounted the drive - mount -t xfs -o noatime,nodiratime /dev/md1 /mnt/disk1

Success:

Oct 29 12:12:10 Tower kernel: XFS (md1): Mounting V5 Filesystem

Oct 29 12:12:11 Tower kernel: XFS (md1): Ending clean mount

Great !! All the data is there..

On to the final steps to bring the unraid back alive

I had to unmount all the drive, since I manaully mounted them;

root@Tower:~# umount /mnt/disk1

root@Tower:~# umount /mnt/disk2

root@Tower:~# umount /mnt/disk3

root@Tower:~# umount /mnt/disk4

root@Tower:~# umount /mnt/disk5

I went back to the GUI and stopped the array, then retarted the array..

Success... Happy again

Error Log:

Oct 29 10:31:59 Tower emhttp: shcmd (57): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md1 /mnt/disk1 |& logger

Oct 29 10:31:59 Tower kernel: XFS (md1): Mounting V5 Filesystem

Oct 29 10:32:01 Tower kernel: XFS (md1): Starting recovery (logdev: internal)

Oct 29 10:32:03 Tower kernel: XFS (md1): Internal error XFS_WANT_CORRUPTED_GOTO at line 3156 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x419/0x558

Oct 29 10:32:03 Tower kernel: CPU: 1 PID: 5433 Comm: mount Not tainted 4.4.26-unRAID #1

Oct 29 10:32:03 Tower kernel: Hardware name: /DQ77KB, BIOS KBQ7710H.86A.0038.2012.0425.1537 04/25/2012

Oct 29 10:32:03 Tower kernel: 0000000000000000 ffff8800d54bfa38 ffffffff8136ad4c ffff880409142820

Oct 29 10:32:03 Tower kernel: 0000000000000000 ffff8800d54bfa50 ffffffff81275975 ffffffff81245f52

Oct 29 10:32:03 Tower kernel: ffff8800d54bfac0 ffffffff8125a119 ffffffff8125929e 00000000e6fef1d0

Oct 29 10:32:03 Tower kernel: Call Trace:

Oct 29 10:32:03 Tower kernel: [<ffffffff8136ad4c>] dump_stack+0x61/0x7e

Oct 29 10:32:03 Tower kernel: [<ffffffff81275975>] xfs_error_report+0x32/0x35

Oct 29 10:32:03 Tower kernel: [<ffffffff81245f52>] ? xfs_free_ag_extent+0x419/0x558

Oct 29 10:32:03 Tower kernel: [<ffffffff8125a119>] xfs_btree_insert+0xba/0x152

Oct 29 10:32:03 Tower kernel: [<ffffffff8125929e>] ? xfs_btree_lookup+0x307/0x4a1

Oct 29 10:32:03 Tower kernel: [<ffffffff81245f52>] xfs_free_ag_extent+0x419/0x558

Oct 29 10:32:03 Tower kernel: [<ffffffff81245f52>] ? xfs_free_ag_extent+0x419/0x558

Oct 29 10:32:03 Tower kernel: [<ffffffff81246b8e>] xfs_free_extent+0xbd/0xed

Oct 29 10:32:03 Tower kernel: [<ffffffff81295aa2>] xfs_trans_free_extent+0x21/0x58

Oct 29 10:32:03 Tower kernel: [<ffffffff81291479>] xlog_recover_process_efi+0x125/0x155

Oct 29 10:32:03 Tower kernel: [<ffffffff8129151a>] xlog_recover_process_efis+0x71/0xb5

Oct 29 10:32:03 Tower kernel: [<ffffffff81076168>] ? wake_up_bit+0x1d/0x1f

Oct 29 10:32:03 Tower kernel: [<ffffffff8127a657>] ? xfs_iget+0x50f/0x54e

Oct 29 10:32:03 Tower kernel: [<ffffffff812948bc>] xlog_recover_finish+0x18/0x8b

Oct 29 10:32:03 Tower kernel: [<ffffffff812948bc>] ? xlog_recover_finish+0x18/0x8b

Oct 29 10:32:03 Tower kernel: [<ffffffff8128bbaf>] xfs_log_mount_finish+0x20/0x36

Oct 29 10:32:03 Tower kernel: [<ffffffff81284e24>] xfs_mountfs+0x601/0x6a8

Oct 29 10:32:03 Tower kernel: [<ffffffff81287724>] xfs_fs_fill_super+0x3fd/0x489

Oct 29 10:32:03 Tower kernel: [<ffffffff8110c53b>] mount_bdev+0x141/0x195

Oct 29 10:32:03 Tower kernel: [<ffffffff81287327>] ? xfs_parseargs+0x8c1/0x8c1

Oct 29 10:32:03 Tower kernel: [<ffffffff81285ce2>] xfs_fs_mount+0x10/0x12

Oct 29 10:32:03 Tower kernel: [<ffffffff8110d1ac>] mount_fs+0xf/0x84

Oct 29 10:32:03 Tower kernel: [<ffffffff81121cc9>] vfs_kern_mount+0x65/0xf7

Oct 29 10:32:03 Tower kernel: [<ffffffff8112463f>] do_mount+0x91c/0xa72

Oct 29 10:32:03 Tower kernel: [<ffffffff810ce50e>] ? strndup_user+0x3a/0x82

Oct 29 10:32:03 Tower kernel: [<ffffffff81124984>] SyS_mount+0x70/0x9c

Oct 29 10:32:03 Tower kernel: [<ffffffff816213ae>] entry_SYSCALL_64_fastpath+0x12/0x6d

Oct 29 10:32:03 Tower kernel: XFS (md1): Internal error xfs_trans_cancel at line 990 of file fs/xfs/xfs_trans.c. Caller xlog_recover_process_efi+0x148/0x155

Oct 29 10:32:03 Tower kernel: CPU: 1 PID: 5433 Comm: mount Not tainted 4.4.26-unRAID #1

Oct 29 10:32:03 Tower kernel: Hardware name: /DQ77KB, BIOS KBQ7710H.86A.0038.2012.0425.1537 04/25/2012

Oct 29 10:32:03 Tower kernel: 0000000000000000 ffff8800d54bfbd8 ffffffff8136ad4c ffff8803e6fef000

Oct 29 10:32:03 Tower kernel: 0000000000000000 ffff8800d54bfbf0 ffffffff81275975 ffffffff8129149c

Oct 29 10:32:03 Tower kernel: ffff8800d54bfc18 ffffffff81289b8b ffff8803e6c56000 ffff8803e6c56190

Oct 29 10:32:03 Tower kernel: Call Trace:

Oct 29 10:32:03 Tower kernel: [<ffffffff8136ad4c>] dump_stack+0x61/0x7e

Oct 29 10:32:03 Tower kernel: [<ffffffff81275975>] xfs_error_report+0x32/0x35

Oct 29 10:32:03 Tower kernel: [<ffffffff8129149c>] ? xlog_recover_process_efi+0x148/0x155

Oct 29 10:32:03 Tower kernel: [<ffffffff81289b8b>] xfs_trans_cancel+0x49/0xbf

Oct 29 10:32:03 Tower kernel: [<ffffffff8129149c>] xlog_recover_process_efi+0x148/0x155