KcWeBBy Posted October 24, 2016 Share Posted October 24, 2016 Hey there, first off, I committed the first mortal sin by not being able to get a copy of the logs prior to my first hard reboot. My system hung, while doing a mover operation, and I left it for 18 hours, as I was still able to use the shares, and manipulate the files, but the mover was not actually moving anything, and was not responding to a SIGTERM. So.. I hard booted.. When it came back up, I noticed that the web interface was running (emhttp) but not accessible. I looked into htop and found it waiting on a hung "sync" process. I am unable to powerdown / reboot / shutdown -h now the machine. it posts the broadcast message, but does not actually affect any change. I can only assume the Sync is in some type of hardware IOWAIT, and holding processes up. Now, upon reboot, I did capture a set of logs, but its the most basic (only since the last boot).. You will see that disk8 is showing XFS errors, and what looks like a SEGFAULT message but I'm not too sure about that. I have booted into safe mode, and attempted to run xfs_repair -nv to see what would happen, and got absolutely 0 output after 2 hours of running (still on the first line of output) In fact, I could not SIGTERM the process, and had to once again hard reboot. While this process was running, the sync process was still running, and showing "D" for status in Htop (I think that means zombie?) I have a large amount of data on this drive, and would like to recover most of it. Is there a way to fix the XFS on the drive without the sync process starting up when the machine does... BTW, the whole time the sync process is running, there are no I/O lights on my drive, so I assume its truly hung. I'm not a novice administrator, but this one has me baffled, I'm pretty new to XFS and UnRaid. Any help would be appreciated.... A couple of areas I could use help.. with No GUI, I'm limited to command line. How can i Identify Disk 8 ? If I can identify it, can I then remove it and ask the parity to rebuild its contents? How can I prevent the sync process from locking up on me... What does this do, is it essential to the functioning of the array? Is there a way to fix the filesystem direct to the drive, even if it destroys the parity? I think maintaining the parity is what's causing the problems. Am I on the right track? Is there an easier fix? Your help is appreciated Thanks. unraidbm-diagnostics-20161020-2224.zip Quote Link to comment
KcWeBBy Posted October 24, 2016 Author Share Posted October 24, 2016 Quick update. I edited offline (in another computer) the disks.cfg so that it would not auto-start the array, and attempted to mount md8 (the affected drive).. I get this into syslog and a "Killed" message to my terminal. Oct 23 22:40:53 unRAIDBM kernel: XFS (md8): Mounting V5 Filesystem Oct 23 22:40:53 unRAIDBM kernel: XFS (md8): Starting recovery (logdev: internal) Oct 23 22:41:15 unRAIDBM kernel: XFS (md8): _xfs_buf_find: Block out of range: block 0x7fffffff8, EOFS 0xe8e08870 Oct 23 22:41:15 unRAIDBM kernel: ------------[ cut here ]------------ Oct 23 22:41:15 unRAIDBM kernel: WARNING: CPU: 1 PID: 5625 at fs/xfs/xfs_buf.c:472 _xfs_buf_find+0x7f/0x28c() Oct 23 22:41:15 unRAIDBM kernel: Modules linked in: md_mod tun bonding bnx2 hid_logitech_hidpp ipmi_devintf coretemp kvm_intel kvm ata_piix hid_logitech_dj mptsas mptscsih mptbase scsi_transport_sas ipmi_si acpi_cpufreq [last unloaded: md_mod] Oct 23 22:41:15 unRAIDBM kernel: CPU: 1 PID: 5625 Comm: mount Not tainted 4.4.23-unRAID #1 Oct 23 22:41:15 unRAIDBM kernel: Hardware name: Dell Inc. PowerEdge 2900/0NX642, BIOS 2.7.0 10/30/2010 Oct 23 22:41:15 unRAIDBM kernel: 0000000000000000 ffff880213dc3950 ffffffff8136ad0c 0000000000000000 Oct 23 22:41:15 unRAIDBM kernel: 00000000000001d8 ffff880213dc3988 ffffffff8104a486 ffffffff81273961 Oct 23 22:41:15 unRAIDBM kernel: ffff88007c8f8000 ffff880223880c00 ffff880223880c00 0000000000000000 Oct 23 22:41:15 unRAIDBM kernel: Call Trace: Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8136ad0c>] dump_stack+0x61/0x7e Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8104a486>] warn_slowpath_common+0x8f/0xa8 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273961>] ? _xfs_buf_find+0x7f/0x28c Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8104a543>] warn_slowpath_null+0x15/0x17 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273961>] _xfs_buf_find+0x7f/0x28c Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273b92>] xfs_buf_get_map+0x24/0x12b Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295533>] xfs_trans_get_buf_map+0x80/0xa7 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81255b78>] xfs_btree_get_bufs+0x4b/0x4d Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8124656e>] xfs_alloc_fix_freelist+0x174/0x2de Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81065391>] ? wake_up_q+0x51/0x51 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127377b>] ? xfs_buf_rele+0x3d/0xe9 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81246b01>] xfs_free_extent+0x86/0xed Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295a4c>] xfs_trans_free_extent+0x21/0x58 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81291423>] xlog_recover_process_efi+0x125/0x155 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812914c4>] xlog_recover_process_efis+0x71/0xb5 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8107610b>] ? wake_up_bit+0x1d/0x1f Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127a601>] ? xfs_iget+0x50f/0x54e Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] xlog_recover_finish+0x18/0x8b Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] ? xlog_recover_finish+0x18/0x8b Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8128bb59>] xfs_log_mount_finish+0x20/0x36 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81284dce>] xfs_mountfs+0x601/0x6a8 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812876ce>] xfs_fs_fill_super+0x3fd/0x489 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110c2c3>] mount_bdev+0x141/0x195 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812872d1>] ? xfs_parseargs+0x8c1/0x8c1 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81285c8c>] xfs_fs_mount+0x10/0x12 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110cf34>] mount_fs+0xf/0x84 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81121a51>] vfs_kern_mount+0x65/0xf7 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff811243c7>] do_mount+0x91c/0xa72 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff810ce2ab>] ? strndup_user+0x3a/0x82 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8112470c>] SyS_mount+0x70/0x9c Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8162132e>] entry_SYSCALL_64_fastpath+0x12/0x6d Oct 23 22:41:15 unRAIDBM kernel: ---[ end trace df2c473847fd23d3 ]--- Oct 23 22:41:15 unRAIDBM kernel: XFS (md8): _xfs_buf_find: Block out of range: block 0x7fffffff8, EOFS 0xe8e08870 Oct 23 22:41:15 unRAIDBM kernel: ------------[ cut here ]------------ Oct 23 22:41:15 unRAIDBM kernel: WARNING: CPU: 1 PID: 5625 at fs/xfs/xfs_buf.c:472 _xfs_buf_find+0x7f/0x28c() Oct 23 22:41:15 unRAIDBM kernel: Modules linked in: md_mod tun bonding bnx2 hid_logitech_hidpp ipmi_devintf coretemp kvm_intel kvm ata_piix hid_logitech_dj mptsas mptscsih mptbase scsi_transport_sas ipmi_si acpi_cpufreq [last unloaded: md_mod] Oct 23 22:41:15 unRAIDBM kernel: CPU: 1 PID: 5625 Comm: mount Tainted: G W 4.4.23-unRAID #1 Oct 23 22:41:15 unRAIDBM kernel: Hardware name: Dell Inc. PowerEdge 2900/0NX642, BIOS 2.7.0 10/30/2010 Oct 23 22:41:15 unRAIDBM kernel: 0000000000000000 ffff880213dc3950 ffffffff8136ad0c 0000000000000000 Oct 23 22:41:15 unRAIDBM kernel: 00000000000001d8 ffff880213dc3988 ffffffff8104a486 ffffffff81273961 Oct 23 22:41:15 unRAIDBM kernel: 0000000000000000 ffff880223880c00 ffff880223880c00 ffff8800cac29680 Oct 23 22:41:15 unRAIDBM kernel: Call Trace: Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8136ad0c>] dump_stack+0x61/0x7e Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8104a486>] warn_slowpath_common+0x8f/0xa8 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273961>] ? _xfs_buf_find+0x7f/0x28c Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8104a543>] warn_slowpath_null+0x15/0x17 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273961>] _xfs_buf_find+0x7f/0x28c Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81272a36>] ? xfs_buf_allocate_memory+0x161/0x299 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81273bde>] xfs_buf_get_map+0x70/0x12b Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295533>] xfs_trans_get_buf_map+0x80/0xa7 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81255b78>] xfs_btree_get_bufs+0x4b/0x4d Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8124656e>] xfs_alloc_fix_freelist+0x174/0x2de Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81065391>] ? wake_up_q+0x51/0x51 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127377b>] ? xfs_buf_rele+0x3d/0xe9 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81246b01>] xfs_free_extent+0x86/0xed Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295a4c>] xfs_trans_free_extent+0x21/0x58 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81291423>] xlog_recover_process_efi+0x125/0x155 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812914c4>] xlog_recover_process_efis+0x71/0xb5 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8107610b>] ? wake_up_bit+0x1d/0x1f Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127a601>] ? xfs_iget+0x50f/0x54e Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] xlog_recover_finish+0x18/0x8b Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] ? xlog_recover_finish+0x18/0x8b Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8128bb59>] xfs_log_mount_finish+0x20/0x36 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81284dce>] xfs_mountfs+0x601/0x6a8 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812876ce>] xfs_fs_fill_super+0x3fd/0x489 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110c2c3>] mount_bdev+0x141/0x195 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812872d1>] ? xfs_parseargs+0x8c1/0x8c1 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81285c8c>] xfs_fs_mount+0x10/0x12 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110cf34>] mount_fs+0xf/0x84 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81121a51>] vfs_kern_mount+0x65/0xf7 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff811243c7>] do_mount+0x91c/0xa72 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff810ce2ab>] ? strndup_user+0x3a/0x82 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8112470c>] SyS_mount+0x70/0x9c Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8162132e>] entry_SYSCALL_64_fastpath+0x12/0x6d Oct 23 22:41:15 unRAIDBM kernel: ---[ end trace df2c473847fd23d4 ]--- Oct 23 22:41:15 unRAIDBM kernel: BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8 Oct 23 22:41:15 unRAIDBM kernel: IP: [<ffffffff8129583d>] xfs_trans_binval+0x7/0x80 Oct 23 22:41:15 unRAIDBM kernel: PGD 213ce8067 PUD 213dbe067 PMD 0 Oct 23 22:41:15 unRAIDBM kernel: Oops: 0000 [#1] PREEMPT SMP Oct 23 22:41:15 unRAIDBM kernel: Modules linked in: md_mod tun bonding bnx2 hid_logitech_hidpp ipmi_devintf coretemp kvm_intel kvm ata_piix hid_logitech_dj mptsas mptscsih mptbase scsi_transport_sas ipmi_si acpi_cpufreq [last unloaded: md_mod] Oct 23 22:41:15 unRAIDBM kernel: CPU: 1 PID: 5625 Comm: mount Tainted: G W 4.4.23-unRAID #1 Oct 23 22:41:15 unRAIDBM kernel: Hardware name: Dell Inc. PowerEdge 2900/0NX642, BIOS 2.7.0 10/30/2010 Oct 23 22:41:15 unRAIDBM kernel: task: ffff8800c9af0ac0 ti: ffff880213dc0000 task.ti: ffff880213dc0000 Oct 23 22:41:15 unRAIDBM kernel: RIP: 0010:[<ffffffff8129583d>] [<ffffffff8129583d>] xfs_trans_binval+0x7/0x80 Oct 23 22:41:15 unRAIDBM kernel: RSP: 0018:ffff880213dc3a68 EFLAGS: 00010246 Oct 23 22:41:15 unRAIDBM kernel: RAX: 0000000000000000 RBX: ffff880213dc3b58 RCX: 00000000000e3101 Oct 23 22:41:15 unRAIDBM kernel: RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffff88007c8f8000 Oct 23 22:41:15 unRAIDBM kernel: RBP: ffff880213dc3a78 R08: 0000000000018ed0 R09: ffffffff81273417 Oct 23 22:41:15 unRAIDBM kernel: R10: ffffea00032b0a00 R11: 00000000000c9aab R12: ffff88007c8f8000 Oct 23 22:41:15 unRAIDBM kernel: R13: ffff8802238f46c0 R14: ffff880225f74800 R15: 0000000000000006 Oct 23 22:41:15 unRAIDBM kernel: FS: 00002b1592f88d80(0000) GS:ffff88022fc40000(0000) knlGS:0000000000000000 Oct 23 22:41:15 unRAIDBM kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 23 22:41:15 unRAIDBM kernel: CR2: 00000000000000f8 CR3: 0000000213c82000 CR4: 00000000000006e0 Oct 23 22:41:15 unRAIDBM kernel: Stack: Oct 23 22:41:15 unRAIDBM kernel: ffff880213dc3b58 ffff88007c8f8000 ffff880213dc3b48 ffffffff81246579 Oct 23 22:41:15 unRAIDBM kernel: 0000000200000002 ffffffff13dc3b08 ffff88022611b380 0000000000000000 Oct 23 22:41:15 unRAIDBM kernel: ffff8800c9af0ac0 0000000013dc3ab0 0000000000000001 ffff8800c9af0ac0 Oct 23 22:41:15 unRAIDBM kernel: Call Trace: Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81246579>] xfs_alloc_fix_freelist+0x17f/0x2de Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81065391>] ? wake_up_q+0x51/0x51 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127377b>] ? xfs_buf_rele+0x3d/0xe9 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81246b01>] xfs_free_extent+0x86/0xed Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81295a4c>] xfs_trans_free_extent+0x21/0x58 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81291423>] xlog_recover_process_efi+0x125/0x155 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812914c4>] xlog_recover_process_efis+0x71/0xb5 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8107610b>] ? wake_up_bit+0x1d/0x1f Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8127a601>] ? xfs_iget+0x50f/0x54e Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] xlog_recover_finish+0x18/0x8b Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81294866>] ? xlog_recover_finish+0x18/0x8b Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8128bb59>] xfs_log_mount_finish+0x20/0x36 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81284dce>] xfs_mountfs+0x601/0x6a8 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812876ce>] xfs_fs_fill_super+0x3fd/0x489 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110c2c3>] mount_bdev+0x141/0x195 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff812872d1>] ? xfs_parseargs+0x8c1/0x8c1 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81285c8c>] xfs_fs_mount+0x10/0x12 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8110cf34>] mount_fs+0xf/0x84 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff81121a51>] vfs_kern_mount+0x65/0xf7 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff811243c7>] do_mount+0x91c/0xa72 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff810ce2ab>] ? strndup_user+0x3a/0x82 Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8112470c>] SyS_mount+0x70/0x9c Oct 23 22:41:15 unRAIDBM kernel: [<ffffffff8162132e>] entry_SYSCALL_64_fastpath+0x12/0x6d Oct 23 22:41:15 unRAIDBM kernel: Code: 01 8b 50 78 89 d6 83 ce 0a 80 e2 80 89 70 78 75 12 55 89 ca 44 89 ce 48 89 c7 48 89 e5 e8 c8 94 ff ff 5d c3 55 48 89 e5 41 54 53 <48> 8b 9e f8 00 00 00 f6 43 78 04 75 67 49 89 fc 48 89 f7 e8 fd Oct 23 22:41:15 unRAIDBM kernel: RIP [<ffffffff8129583d>] xfs_trans_binval+0x7/0x80 Oct 23 22:41:15 unRAIDBM kernel: RSP <ffff880213dc3a68> Oct 23 22:41:15 unRAIDBM kernel: CR2: 00000000000000f8 Oct 23 22:41:15 unRAIDBM kernel: ---[ end trace df2c473847fd23d5 ]--- After doing this, I have no apparent option but to run a xfs_repair -L /dev/md8 after a fresh reboot to see if that can repair the drive.... wish me luck. Quote Link to comment
KcWeBBy Posted October 24, 2016 Author Share Posted October 24, 2016 And success.. One inode of 0 bytes lost, but no other evidence of corruption. Just to recap, incase anyone else has this issue. I edited /boot/config/disks.cfg to make Autostart = "no" I rebooted and started the array in maintenance mode via the gui. I ran xfs_repair -v /dev/md8 ..the results indicated nothing could be done due to log metadata needing to be replayed.. it suggested a try mounting to replay, and then if that didn't work the dreaded -L I attempted (and failed) to mount the affected disk, in this case /dev/md8.. ...the failure caused me to have to hard boot (after the dump I copied in my last post) I started the array in maintenance mode again via the gui, and this time via terminal i did: xfs_repair -L /dev/md8 20 minutes later, it reported success and I mounted the drive and checked files. After this check was successful, I rebooted, upgraded to 6.2.2 and am now bringing my array online for good this time. Hope this helps someone else. Until next time. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.