HELP! sync process hard locks on boot


KcWeBBy

Recommended Posts

Hey there, first off, I committed the first mortal sin by not being able to get a copy of the logs prior to my first hard reboot.

 

My system hung, while doing a mover operation, and I left it for 18 hours, as I was still able to use the shares, and manipulate the files, but the mover was not actually moving anything, and was not responding to a SIGTERM.

 

 

So..  I hard booted..

 

When it came back up, I noticed that the web interface was running (emhttp) but not accessible.  I looked into htop and found it waiting on a hung "sync" process.    I am unable to powerdown / reboot / shutdown -h now the machine.  it posts the broadcast message, but does not actually affect any change.    I can only assume the Sync is in some type of hardware IOWAIT, and holding processes up.

 

 

Now, upon reboot, I did capture a set of logs, but its the most basic (only since the last boot)..

 

You will see that disk8 is showing XFS errors, and what looks like a SEGFAULT message but I'm not too sure about that.

 

I have booted into safe mode, and attempted to run xfs_repair -nv to see what would happen, and got absolutely 0 output after 2 hours of running  (still on the first line of output)

In fact, I could not SIGTERM the process, and had to once again hard reboot.  While this process was running, the sync process was still running, and showing "D" for status in Htop (I think that means zombie?)

 

I have a large amount of data on this drive, and would like to recover most of it.

Is there a way to fix the XFS on the drive without the sync process starting up when the machine does...

BTW, the whole time the sync process is running, there are no I/O lights on my drive, so I assume its truly hung.

 

 

I'm not a novice administrator, but this one has me baffled,  I'm pretty new to XFS and UnRaid.

 

Any help would be appreciated....

 

A couple of areas I could use help.. 

with No GUI, I'm limited to command line.

How can i Identify Disk 8 ?  If I can identify it, can I then remove it and ask the parity to rebuild its contents?

 

How can I prevent the sync process from locking up on me...  What does this do, is it essential to the functioning of the array?

 

Is there a way to fix the filesystem direct to the drive, even if it destroys the parity?  I think maintaining the parity is what's causing the problems.

 

Am I on the right track?  Is there an easier fix?

 

Your help is appreciated

 

Thanks.

 

unraidbm-diagnostics-20161020-2224.zip

Link to comment

Quick update.

 

I edited offline (in another computer) the disks.cfg so that it would not auto-start the array, and attempted to mount md8 (the affected drive)..

 

I get this into syslog and a "Killed" message to my terminal.

Oct 23 22:40:53 unRAIDBM kernel: XFS (md8): Mounting V5 Filesystem

Oct 23 22:40:53 unRAIDBM kernel: XFS (md8): Starting recovery (logdev: internal)

Oct 23 22:41:15 unRAIDBM kernel: XFS (md8): _xfs_buf_find: Block out of range: block 0x7fffffff8, EOFS 0xe8e08870

Oct 23 22:41:15 unRAIDBM kernel: ------------[ cut here ]------------

Oct 23 22:41:15 unRAIDBM kernel: WARNING: CPU: 1 PID: 5625 at fs/xfs/xfs_buf.c:472 _xfs_buf_find+0x7f/0x28c()

Oct 23 22:41:15 unRAIDBM kernel: Modules linked in: md_mod tun bonding bnx2 hid_logitech_hidpp ipmi_devintf coretemp kvm_intel kvm ata_piix hid_logitech_dj mptsas mptscsih mptbase scsi_transport_sas ipmi_si acpi_cpufreq [last unloaded: md_mod]

Oct 23 22:41:15 unRAIDBM kernel: CPU: 1 PID: 5625 Comm: mount Not tainted 4.4.23-unRAID #1

Oct 23 22:41:15 unRAIDBM kernel: Hardware name: Dell Inc. PowerEdge 2900/0NX642, BIOS 2.7.0 10/30/2010

Oct 23 22:41:15 unRAIDBM kernel: 0000000000000000 ffff880213dc3950 ffffffff8136ad0c 0000000000000000

Oct 23 22:41:15 unRAIDBM kernel: 00000000000001d8 ffff880213dc3988 ffffffff8104a486 ffffffff81273961

Oct 23 22:41:15 unRAIDBM kernel: ffff88007c8f8000 ffff880223880c00 ffff880223880c00 0000000000000000

Oct 23 22:41:15 unRAIDBM kernel: Call Trace:

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8136ad0c>] dump_stack+0x61/0x7e

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8104a486>] warn_slowpath_common+0x8f/0xa8

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273961>] ? _xfs_buf_find+0x7f/0x28c

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8104a543>] warn_slowpath_null+0x15/0x17

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273961>] _xfs_buf_find+0x7f/0x28c

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273b92>] xfs_buf_get_map+0x24/0x12b

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295533>] xfs_trans_get_buf_map+0x80/0xa7

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81255b78>] xfs_btree_get_bufs+0x4b/0x4d

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8124656e>] xfs_alloc_fix_freelist+0x174/0x2de

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81065391>] ? wake_up_q+0x51/0x51

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127377b>] ? xfs_buf_rele+0x3d/0xe9

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81246b01>] xfs_free_extent+0x86/0xed

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295a4c>] xfs_trans_free_extent+0x21/0x58

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81291423>] xlog_recover_process_efi+0x125/0x155

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812914c4>] xlog_recover_process_efis+0x71/0xb5

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8107610b>] ? wake_up_bit+0x1d/0x1f

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127a601>] ? xfs_iget+0x50f/0x54e

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] xlog_recover_finish+0x18/0x8b

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] ? xlog_recover_finish+0x18/0x8b

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8128bb59>] xfs_log_mount_finish+0x20/0x36

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81284dce>] xfs_mountfs+0x601/0x6a8

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812876ce>] xfs_fs_fill_super+0x3fd/0x489

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110c2c3>] mount_bdev+0x141/0x195

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812872d1>] ? xfs_parseargs+0x8c1/0x8c1

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81285c8c>] xfs_fs_mount+0x10/0x12

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110cf34>] mount_fs+0xf/0x84

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81121a51>] vfs_kern_mount+0x65/0xf7

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff811243c7>] do_mount+0x91c/0xa72

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff810ce2ab>] ? strndup_user+0x3a/0x82

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8112470c>] SyS_mount+0x70/0x9c

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8162132e>] entry_SYSCALL_64_fastpath+0x12/0x6d

Oct 23 22:41:15 unRAIDBM kernel: ---[ end trace df2c473847fd23d3 ]---

Oct 23 22:41:15 unRAIDBM kernel: XFS (md8): _xfs_buf_find: Block out of range: block 0x7fffffff8, EOFS 0xe8e08870

Oct 23 22:41:15 unRAIDBM kernel: ------------[ cut here ]------------

Oct 23 22:41:15 unRAIDBM kernel: WARNING: CPU: 1 PID: 5625 at fs/xfs/xfs_buf.c:472 _xfs_buf_find+0x7f/0x28c()

Oct 23 22:41:15 unRAIDBM kernel: Modules linked in: md_mod tun bonding bnx2 hid_logitech_hidpp ipmi_devintf coretemp kvm_intel kvm ata_piix hid_logitech_dj mptsas mptscsih mptbase scsi_transport_sas ipmi_si acpi_cpufreq [last unloaded: md_mod]

Oct 23 22:41:15 unRAIDBM kernel: CPU: 1 PID: 5625 Comm: mount Tainted: G W 4.4.23-unRAID #1

Oct 23 22:41:15 unRAIDBM kernel: Hardware name: Dell Inc. PowerEdge 2900/0NX642, BIOS 2.7.0 10/30/2010

Oct 23 22:41:15 unRAIDBM kernel: 0000000000000000 ffff880213dc3950 ffffffff8136ad0c 0000000000000000

Oct 23 22:41:15 unRAIDBM kernel: 00000000000001d8 ffff880213dc3988 ffffffff8104a486 ffffffff81273961

Oct 23 22:41:15 unRAIDBM kernel: 0000000000000000 ffff880223880c00 ffff880223880c00 ffff8800cac29680

Oct 23 22:41:15 unRAIDBM kernel: Call Trace:

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8136ad0c>] dump_stack+0x61/0x7e

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8104a486>] warn_slowpath_common+0x8f/0xa8

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273961>] ? _xfs_buf_find+0x7f/0x28c

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8104a543>] warn_slowpath_null+0x15/0x17

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273961>] _xfs_buf_find+0x7f/0x28c

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81272a36>] ? xfs_buf_allocate_memory+0x161/0x299

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273bde>] xfs_buf_get_map+0x70/0x12b

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295533>] xfs_trans_get_buf_map+0x80/0xa7

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81255b78>] xfs_btree_get_bufs+0x4b/0x4d

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8124656e>] xfs_alloc_fix_freelist+0x174/0x2de

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81065391>] ? wake_up_q+0x51/0x51

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127377b>] ? xfs_buf_rele+0x3d/0xe9

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81246b01>] xfs_free_extent+0x86/0xed

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295a4c>] xfs_trans_free_extent+0x21/0x58

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81291423>] xlog_recover_process_efi+0x125/0x155

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812914c4>] xlog_recover_process_efis+0x71/0xb5

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8107610b>] ? wake_up_bit+0x1d/0x1f

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127a601>] ? xfs_iget+0x50f/0x54e

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] xlog_recover_finish+0x18/0x8b

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] ? xlog_recover_finish+0x18/0x8b

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8128bb59>] xfs_log_mount_finish+0x20/0x36

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81284dce>] xfs_mountfs+0x601/0x6a8

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812876ce>] xfs_fs_fill_super+0x3fd/0x489

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110c2c3>] mount_bdev+0x141/0x195

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812872d1>] ? xfs_parseargs+0x8c1/0x8c1

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81285c8c>] xfs_fs_mount+0x10/0x12

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110cf34>] mount_fs+0xf/0x84

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81121a51>] vfs_kern_mount+0x65/0xf7

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff811243c7>] do_mount+0x91c/0xa72

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff810ce2ab>] ? strndup_user+0x3a/0x82

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8112470c>] SyS_mount+0x70/0x9c

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8162132e>] entry_SYSCALL_64_fastpath+0x12/0x6d

Oct 23 22:41:15 unRAIDBM kernel: ---[ end trace df2c473847fd23d4 ]---

Oct 23 22:41:15 unRAIDBM kernel: BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8

Oct 23 22:41:15 unRAIDBM kernel: IP: [<ffffffff8129583d>] xfs_trans_binval+0x7/0x80

Oct 23 22:41:15 unRAIDBM kernel: PGD 213ce8067 PUD 213dbe067 PMD 0

Oct 23 22:41:15 unRAIDBM kernel: Oops: 0000 [#1] PREEMPT SMP

Oct 23 22:41:15 unRAIDBM kernel: Modules linked in: md_mod tun bonding bnx2 hid_logitech_hidpp ipmi_devintf coretemp kvm_intel kvm ata_piix hid_logitech_dj mptsas mptscsih mptbase scsi_transport_sas ipmi_si acpi_cpufreq [last unloaded: md_mod]

Oct 23 22:41:15 unRAIDBM kernel: CPU: 1 PID: 5625 Comm: mount Tainted: G W 4.4.23-unRAID #1

Oct 23 22:41:15 unRAIDBM kernel: Hardware name: Dell Inc. PowerEdge 2900/0NX642, BIOS 2.7.0 10/30/2010

Oct 23 22:41:15 unRAIDBM kernel: task: ffff8800c9af0ac0 ti: ffff880213dc0000 task.ti: ffff880213dc0000

Oct 23 22:41:15 unRAIDBM kernel: RIP: 0010:[<ffffffff8129583d>] [<ffffffff8129583d>] xfs_trans_binval+0x7/0x80

Oct 23 22:41:15 unRAIDBM kernel: RSP: 0018:ffff880213dc3a68 EFLAGS: 00010246

Oct 23 22:41:15 unRAIDBM kernel: RAX: 0000000000000000 RBX: ffff880213dc3b58 RCX: 00000000000e3101

Oct 23 22:41:15 unRAIDBM kernel: RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffff88007c8f8000

Oct 23 22:41:15 unRAIDBM kernel: RBP: ffff880213dc3a78 R08: 0000000000018ed0 R09: ffffffff81273417

Oct 23 22:41:15 unRAIDBM kernel: R10: ffffea00032b0a00 R11: 00000000000c9aab R12: ffff88007c8f8000

Oct 23 22:41:15 unRAIDBM kernel: R13: ffff8802238f46c0 R14: ffff880225f74800 R15: 0000000000000006

Oct 23 22:41:15 unRAIDBM kernel: FS: 00002b1592f88d80(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000

Oct 23 22:41:15 unRAIDBM kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b

Oct 23 22:41:15 unRAIDBM kernel: CR2: 00000000000000f8 CR3: 0000000213c82000 CR4: 00000000000006e0

Oct 23 22:41:15 unRAIDBM kernel: Stack:

Oct 23 22:41:15 unRAIDBM kernel: ffff880213dc3b58 ffff88007c8f8000 ffff880213dc3b48 ffffffff81246579

Oct 23 22:41:15 unRAIDBM kernel: 0000000200000002 ffffffff13dc3b08 ffff88022611b380 0000000000000000

Oct 23 22:41:15 unRAIDBM kernel: ffff8800c9af0ac0 0000000013dc3ab0 0000000000000001 ffff8800c9af0ac0

Oct 23 22:41:15 unRAIDBM kernel: Call Trace:

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81246579>] xfs_alloc_fix_freelist+0x17f/0x2de

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81065391>] ? wake_up_q+0x51/0x51

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127377b>] ? xfs_buf_rele+0x3d/0xe9

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81246b01>] xfs_free_extent+0x86/0xed

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295a4c>] xfs_trans_free_extent+0x21/0x58

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81291423>] xlog_recover_process_efi+0x125/0x155

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812914c4>] xlog_recover_process_efis+0x71/0xb5

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8107610b>] ? wake_up_bit+0x1d/0x1f

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127a601>] ? xfs_iget+0x50f/0x54e

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] xlog_recover_finish+0x18/0x8b

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] ? xlog_recover_finish+0x18/0x8b

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8128bb59>] xfs_log_mount_finish+0x20/0x36

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81284dce>] xfs_mountfs+0x601/0x6a8

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812876ce>] xfs_fs_fill_super+0x3fd/0x489

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110c2c3>] mount_bdev+0x141/0x195

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812872d1>] ? xfs_parseargs+0x8c1/0x8c1

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81285c8c>] xfs_fs_mount+0x10/0x12

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110cf34>] mount_fs+0xf/0x84

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81121a51>] vfs_kern_mount+0x65/0xf7

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff811243c7>] do_mount+0x91c/0xa72

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff810ce2ab>] ? strndup_user+0x3a/0x82

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8112470c>] SyS_mount+0x70/0x9c

Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8162132e>] entry_SYSCALL_64_fastpath+0x12/0x6d

Oct 23 22:41:15 unRAIDBM kernel: Code: 01 8b 50 78 89 d6 83 ce 0a 80 e2 80 89 70 78 75 12 55 89 ca 44 89 ce 48 89 c7 48 89 e5 e8 c8 94 ff ff 5d c3 55 48 89 e5 41 54 53 <48> 8b 9e f8 00 00 00 f6 43 78 04 75 67 49 89 fc 48 89 f7 e8 fd

Oct 23 22:41:15 unRAIDBM kernel: RIP [<ffffffff8129583d>] xfs_trans_binval+0x7/0x80

Oct 23 22:41:15 unRAIDBM kernel: RSP <ffff880213dc3a68>

Oct 23 22:41:15 unRAIDBM kernel: CR2: 00000000000000f8

Oct 23 22:41:15 unRAIDBM kernel: ---[ end trace df2c473847fd23d5 ]---

 

After doing this, I have no apparent option but to run a xfs_repair -L /dev/md8 after a fresh reboot to see if that can repair the drive....

 

 

wish me luck.

Link to comment

And success..

 

 

One inode of 0 bytes lost, but no other evidence of corruption.

 

Just to recap, incase anyone else has this issue.

 

I edited /boot/config/disks.cfg to make Autostart = "no"

I rebooted and started the array in maintenance mode via the gui.

 

I ran xfs_repair -v /dev/md8 

..the results indicated nothing could be done due to log metadata needing to be replayed.. it suggested a try mounting to replay, and then if that didn't work the dreaded -L

I attempted (and failed) to mount the affected disk, in this case /dev/md8..

...the failure caused me to have to hard boot (after the dump I copied in my last post)

 

I started the array in maintenance mode again via the gui, and this time via terminal i did:

xfs_repair -L /dev/md8

 

20 minutes later, it reported success and I mounted the drive and checked files.

 

After this check was successful, I rebooted, upgraded to 6.2.2 and am now bringing my array online for good this time.

 

Hope this helps someone else.

 

 

Until next time.

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.