TheWooginator Posted January 21, 2018 Share Posted January 21, 2018 (edited) Hey guys. I'm working through moving data from an external drive onto my array via Krusader and I've been getting intermittent freezes on the system. Some of it still seems to respond (I just successfully downloaded and installed the SSD Trim Plugin) but the system simply won't respond when trying to do a clean reboot. I was able to collect diagnostic data and I'm waiting to see if there's something I can do short of doing a push-button reset which would restart my 10 hr parity check. Initially it seemed like some of the freezing was due to the cache drive running out of space while moving data into the array, so I ran the mover function overnight to give me a solid 200gb of space to "move into". My last dump was about 100gb and it froze up unexpectedly somewhere at the 95% ish complete mark. Cycling docker on/off didn't work and restarting/shutting down Krusader doesn't appear to do anything. I've attached my diagnostic zip file. Please let me know f there's anything else that would be beneficial to have. tower-diagnostics-20180121-1110.zip EDIT: Adding this to top post for visibility Snagged it! I ran a logger on my iMac to keep an eye on it in case it crashed on me and I was able to grab this before it froze up and forced me to reset it: Jan 24 10:39:31 Tower root: Fix Common Problems Version 2018.01.21Jan 24 10:39:32 Tower root: Fix Common Problems: /var/log currently 2 % fullJan 24 10:39:32 Tower root: Fix Common Problems: rootfs (/) currently 5 % fullJan 24 10:49:32 Tower root: Fix Common Problems Version 2018.01.21Jan 24 10:49:33 Tower root: Fix Common Problems: /var/log currently 2 % fullJan 24 10:49:33 Tower root: Fix Common Problems: rootfs (/) currently 5 % fullJan 24 10:59:33 Tower root: Fix Common Problems Version 2018.01.21Jan 24 10:59:34 Tower root: Fix Common Problems: /var/log currently 2 % fullJan 24 10:59:34 Tower root: Fix Common Problems: rootfs (/) currently 5 % fullJan 24 11:00:01 Tower root: mover: startedJan 24 11:02:28 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000088Jan 24 11:02:28 Tower kernel: IP: account_page_dirtied+0xaf/0x13bJan 24 11:02:28 Tower kernel: PGD 0 P4D 0 Jan 24 11:02:28 Tower kernel: Oops: 0000 [#1] PREEMPT SMP PTIJan 24 11:02:28 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf e1000e i2c_i801 i2c_core ahci mxm_wmi wmi_bmof libahci ptp wmi pps_core buttonJan 24 11:02:28 Tower kernel: CPU: 4 PID: 12156 Comm: kworker/u24:6 Not tainted 4.14.13-unRAID #1Jan 24 11:02:28 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014Jan 24 11:02:28 Tower kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helperJan 24 11:02:28 Tower kernel: task: ffff8807005d7000 task.stack: ffffc9000fafc000Jan 24 11:02:28 Tower kernel: RIP: 0010:account_page_dirtied+0xaf/0x13bJan 24 11:02:28 Tower kernel: RSP: 0018:ffffc9000faff9e8 EFLAGS: 00010047Jan 24 11:02:28 Tower kernel: RAX: 0000000000000000 RBX: ffffea0009172900 RCX: 0000000000024b90Jan 24 11:02:28 Tower kernel: RDX: ffff8807fcc19400 RSI: 000000000000000f RDI: ffff88081fff9000Jan 24 11:02:28 Tower kernel: RBP: ffff8807f79c8458 R08: 0000000000024b80 R09: 0000000000000000Jan 24 11:02:28 Tower kernel: R10: ffff8807f69f8368 R11: 000000000007c439 R12: ffff8807f69f8378Jan 24 11:02:28 Tower kernel: R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000Jan 24 11:02:28 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff300000(0000) knlGS:0000000000000000Jan 24 11:02:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088 CR3: 0000000004c0a006 CR4: 00000000000606e0Jan 24 11:02:28 Tower kernel: Call Trace:Jan 24 11:02:28 Tower kernel: __set_page_dirty_nobuffers+0x98/0x12cJan 24 11:02:28 Tower kernel: set_extent_buffer_dirty+0x6a/0x76Jan 24 11:02:28 Tower kernel: btrfs_mark_buffer_dirty+0x75/0x98Jan 24 11:02:28 Tower kernel: __btrfs_cow_block+0x49e/0x4b8Jan 24 11:02:28 Tower kernel: btrfs_cow_block+0x106/0x114Jan 24 11:02:28 Tower kernel: btrfs_search_slot+0x330/0x83cJan 24 11:02:28 Tower kernel: btrfs_del_csums+0xaa/0x340Jan 24 11:02:28 Tower kernel: ? release_extent_buffer+0x7e/0x85Jan 24 11:02:28 Tower kernel: __btrfs_free_extent+0x8dc/0x9e8Jan 24 11:02:28 Tower kernel: __btrfs_run_delayed_refs+0xa7f/0xc84Jan 24 11:02:28 Tower kernel: ? kmem_cache_free+0x12e/0x131Jan 24 11:02:28 Tower kernel: btrfs_run_delayed_refs+0x68/0x1e9Jan 24 11:02:28 Tower kernel: delayed_ref_async_start+0x54/0x90Jan 24 11:02:28 Tower kernel: btrfs_worker_helper+0xbc/0x16fJan 24 11:02:28 Tower kernel: process_one_work+0x146/0x239Jan 24 11:02:28 Tower kernel: ? rescuer_thread+0x258/0x258Jan 24 11:02:28 Tower kernel: worker_thread+0x1c3/0x292Jan 24 11:02:28 Tower kernel: kthread+0x10f/0x117Jan 24 11:02:28 Tower kernel: ? kthread_create_on_node+0x3a/0x3aJan 24 11:02:28 Tower kernel: ? SyS_exit_group+0xb/0xbJan 24 11:02:28 Tower kernel: ret_from_fork+0x1f/0x30Jan 24 11:02:28 Tower kernel: Code: 43 38 48 85 c0 74 30 66 66 66 66 90 48 8b 80 c0 02 00 00 65 48 ff 40 78 48 8b 03 48 8b 53 38 48 c1 e8 3a 48 8b 84 c2 b0 03 00 00 <48> 8b 80 88 00 00 00 65 48 ff 40 78 48 89 df be 06 00 00 00 e8 Jan 24 11:02:28 Tower kernel: RIP: account_page_dirtied+0xaf/0x13b RSP: ffffc9000faff9e8Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088Jan 24 11:02:28 Tower kernel: ---[ end trace a3e8bf5e33c2ee49 ]---Jan 24 11:02:28 Tower kernel: note: kworker/u24:6[12156] exited with preempt_count 1Jan 24 11:02:28 Tower kernel: ------------[ cut here ]------------Jan 24 11:02:28 Tower kernel: WARNING: CPU: 4 PID: 12156 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x27/0x281Jan 24 11:02:28 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf e1000e i2c_i801 i2c_core ahci mxm_wmi wmi_bmof libahci ptp wmi pps_core buttonJan 24 11:02:28 Tower kernel: CPU: 4 PID: 12156 Comm: kworker/u24:6 Tainted: G D 4.14.13-unRAID #1Jan 24 11:02:28 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014Jan 24 11:02:28 Tower kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helperJan 24 11:02:28 Tower kernel: task: ffff8807005d7000 task.stack: ffffc9000fafc000Jan 24 11:02:28 Tower kernel: RIP: 0010:rcu_note_context_switch+0x27/0x281Jan 24 11:02:28 Tower kernel: RSP: 0018:ffffc9000faffe68 EFLAGS: 00010002Jan 24 11:02:28 Tower kernel: RAX: 0000000000020000 RBX: ffff8807005d7000 RCX: ffff8807005d7330Jan 24 11:02:28 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000Jan 24 11:02:28 Tower kernel: RBP: ffff8807005d7000 R08: 0000000000000001 R09: ffffffff8104a700Jan 24 11:02:28 Tower kernel: R10: ffffea0004724d80 R11: ffffffff8200fe01 R12: 0000000000000000Jan 24 11:02:28 Tower kernel: R13: 0000000000000000 R14: ffff8807005d75a0 R15: 0000000000020900Jan 24 11:02:28 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff300000(0000) knlGS:0000000000000000Jan 24 11:02:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088 CR3: 0000000004c0a006 CR4: 00000000000606e0Jan 24 11:02:28 Tower kernel: Call Trace:Jan 24 11:02:28 Tower kernel: __schedule+0x88/0x4e9Jan 24 11:02:28 Tower kernel: do_task_dead+0x38/0x3aJan 24 11:02:28 Tower kernel: do_exit+0x896/0x896Jan 24 11:02:28 Tower kernel: rewind_stack_do_exit+0x17/0x20Jan 24 11:02:28 Tower kernel: Code: 5c 41 5d c3 41 56 41 55 41 54 41 89 fc 55 53 65 48 8b 2c 25 00 5c 01 00 e8 c1 e6 ff ff 45 84 e4 75 0b 83 bd 28 03 00 00 00 7e 02 <0f> ff 83 bd 28 03 00 00 00 0f 8e ce 01 00 00 80 bd 2c 03 00 00 Jan 24 11:02:28 Tower kernel: ---[ end trace a3e8bf5e33c2ee4a ]---Jan 24 11:02:35 Tower kernel: traps: emhttpd[6918] trap divide error ip:419f15 sp:14596af71e00 error:0 in emhttpd[400000+26000] ^^This was the last transmission that went through before it froze. If it was continuing to spit out info after that, I don't think there's a way to grab it as it locks up the GUI interface I have set up on the array as well. It ran all night just fine and got fussy as soon as I tried to move some data onto the array through the network. That's what I was in the process of doing when it locked up this time. Edited January 24, 2018 by TheWooginator Quote Link to comment
TheWooginator Posted January 23, 2018 Author Share Posted January 23, 2018 So I went ahead and made sure that the system is on a static IP from the router and I'm still getting errors where the system becomes unresponsive or just shuts me out entirely. It just did it again with no warning while I was moving some data from my iMac to the array via the network. This is getting really frustrating. It seems like it's working fine, running through a parity check, moving data off the cache drives, moving data onto the array via the network, and then it just takes a $h*t. Please help guys. If I can't make this work before the trial period is over I'm likely going to mothball this project and get a QNAP or something. tower-diagnostics-20180122-2205.zip Quote Link to comment
JorgeB Posted January 23, 2018 Share Posted January 23, 2018 You have file corruption on your cache drive, this would point to a hardware problem, like bad RAM or possibly a cable problem with one of your SSDs, if you haven't yet run memtest and also check the output of the below for errors: btrfs dev stats /mnt/cache Quote Link to comment
TheWooginator Posted January 23, 2018 Author Share Posted January 23, 2018 13 hours ago, johnnie.black said: You have file corruption on your cache drive, this would point to a hardware problem, like bad RAM or possibly a cable problem with one of your SSDs, if you haven't yet run memtest and also check the output of the below for errors: btrfs dev stats /mnt/cache Thanks for that. I went ahead and updated the firmware on both SSD's and rebuilt the cache pool, so now we'll see if that helps. Quote Link to comment
JorgeB Posted January 23, 2018 Share Posted January 23, 2018 6 minutes ago, TheWooginator said: I went ahead and updated the firmware on both SSD's I'd be surprised if that was related, did btrfs dev stats show any errors besides the checksum errors? Quote Link to comment
TheWooginator Posted January 23, 2018 Author Share Posted January 23, 2018 15 minutes ago, johnnie.black said: I'd be surprised if that was related, did btrfs dev stats show any errors besides the checksum errors? I'm really new to this. Where am I looking to see these errors? Quote Link to comment
JorgeB Posted January 23, 2018 Share Posted January 23, 2018 3 minutes ago, TheWooginator said: I'm really new to this. Where am I looking to see these errors? SSH into your server or use the console and type: btrfs dev stats /mnt/cache Post the output here. Quote Link to comment
TheWooginator Posted January 23, 2018 Author Share Posted January 23, 2018 3 minutes ago, johnnie.black said: SSH into your server or use the console and type: btrfs dev stats /mnt/cache Post the output here. Linux 4.14.13-unRAID. root@Tower:~# btrfs dev stats /mnt/cache [/dev/sdf1].write_io_errs 0 [/dev/sdf1].read_io_errs 0 [/dev/sdf1].flush_io_errs 0 [/dev/sdf1].corruption_errs 0 [/dev/sdf1].generation_errs 0 [/dev/sde1].write_io_errs 0 [/dev/sde1].read_io_errs 0 [/dev/sde1].flush_io_errs 0 [/dev/sde1].corruption_errs 0 [/dev/sde1].generation_errs 0 root@Tower:~# Quote Link to comment
JorgeB Posted January 23, 2018 Share Posted January 23, 2018 Yeah, forgot you redid your cache, if you start having issues again run that again and look for any non 0 values, you can post them in this thread or start a new one if you need more help. Quote Link to comment
TheWooginator Posted January 23, 2018 Author Share Posted January 23, 2018 Roger. Array is up and running, so we'll see if it lasts the night. Is there anything I should be doing/logging in the meantime so I can capture that actual event if it happens again? Every time it quits on me I lose access and have to do a hard reset, so I can't actually grab the live errors. Quote Link to comment
JorgeB Posted January 23, 2018 Share Posted January 23, 2018 If/when it happens again see if typing diagnostics on the console/SSH works, if it does it will save them to the flash drive. Quote Link to comment
TheWooginator Posted January 24, 2018 Author Share Posted January 24, 2018 Well, it lasted longer this time than any other previous time, so... progress? It bugged out shortly after I decided to experiment with a VM, but it never got past the point of enabling it in the settings. I was prepping to do a Win 10 VM when it crapped out. The really weird part that I still can't figure out is that when it crashes, it locks up my modem which is connected directly to it. Once I power down the array, the modem magically comes back and re-enables the wifi. I'm set up as static at both ends soooooo.... tower-diagnostics-20180124-0023.zip Quote Link to comment
JorgeB Posted January 24, 2018 Share Posted January 24, 2018 Diagnostics are after rebooting, so no help there. Quote Link to comment
TheWooginator Posted January 24, 2018 Author Share Posted January 24, 2018 (edited) Snagged it! I ran a logger on my iMac to keep an eye on it in case it crashed on me and I was able to grab this before it froze up and forced me to reset it: Jan 24 10:39:31 Tower root: Fix Common Problems Version 2018.01.21Jan 24 10:39:32 Tower root: Fix Common Problems: /var/log currently 2 % fullJan 24 10:39:32 Tower root: Fix Common Problems: rootfs (/) currently 5 % fullJan 24 10:49:32 Tower root: Fix Common Problems Version 2018.01.21Jan 24 10:49:33 Tower root: Fix Common Problems: /var/log currently 2 % fullJan 24 10:49:33 Tower root: Fix Common Problems: rootfs (/) currently 5 % fullJan 24 10:59:33 Tower root: Fix Common Problems Version 2018.01.21Jan 24 10:59:34 Tower root: Fix Common Problems: /var/log currently 2 % fullJan 24 10:59:34 Tower root: Fix Common Problems: rootfs (/) currently 5 % fullJan 24 11:00:01 Tower root: mover: startedJan 24 11:02:28 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000088Jan 24 11:02:28 Tower kernel: IP: account_page_dirtied+0xaf/0x13bJan 24 11:02:28 Tower kernel: PGD 0 P4D 0 Jan 24 11:02:28 Tower kernel: Oops: 0000 [#1] PREEMPT SMP PTIJan 24 11:02:28 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf e1000e i2c_i801 i2c_core ahci mxm_wmi wmi_bmof libahci ptp wmi pps_core buttonJan 24 11:02:28 Tower kernel: CPU: 4 PID: 12156 Comm: kworker/u24:6 Not tainted 4.14.13-unRAID #1Jan 24 11:02:28 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014Jan 24 11:02:28 Tower kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helperJan 24 11:02:28 Tower kernel: task: ffff8807005d7000 task.stack: ffffc9000fafc000Jan 24 11:02:28 Tower kernel: RIP: 0010:account_page_dirtied+0xaf/0x13bJan 24 11:02:28 Tower kernel: RSP: 0018:ffffc9000faff9e8 EFLAGS: 00010047Jan 24 11:02:28 Tower kernel: RAX: 0000000000000000 RBX: ffffea0009172900 RCX: 0000000000024b90Jan 24 11:02:28 Tower kernel: RDX: ffff8807fcc19400 RSI: 000000000000000f RDI: ffff88081fff9000Jan 24 11:02:28 Tower kernel: RBP: ffff8807f79c8458 R08: 0000000000024b80 R09: 0000000000000000Jan 24 11:02:28 Tower kernel: R10: ffff8807f69f8368 R11: 000000000007c439 R12: ffff8807f69f8378Jan 24 11:02:28 Tower kernel: R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000Jan 24 11:02:28 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff300000(0000) knlGS:0000000000000000Jan 24 11:02:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088 CR3: 0000000004c0a006 CR4: 00000000000606e0Jan 24 11:02:28 Tower kernel: Call Trace:Jan 24 11:02:28 Tower kernel: __set_page_dirty_nobuffers+0x98/0x12cJan 24 11:02:28 Tower kernel: set_extent_buffer_dirty+0x6a/0x76Jan 24 11:02:28 Tower kernel: btrfs_mark_buffer_dirty+0x75/0x98Jan 24 11:02:28 Tower kernel: __btrfs_cow_block+0x49e/0x4b8Jan 24 11:02:28 Tower kernel: btrfs_cow_block+0x106/0x114Jan 24 11:02:28 Tower kernel: btrfs_search_slot+0x330/0x83cJan 24 11:02:28 Tower kernel: btrfs_del_csums+0xaa/0x340Jan 24 11:02:28 Tower kernel: ? release_extent_buffer+0x7e/0x85Jan 24 11:02:28 Tower kernel: __btrfs_free_extent+0x8dc/0x9e8Jan 24 11:02:28 Tower kernel: __btrfs_run_delayed_refs+0xa7f/0xc84Jan 24 11:02:28 Tower kernel: ? kmem_cache_free+0x12e/0x131Jan 24 11:02:28 Tower kernel: btrfs_run_delayed_refs+0x68/0x1e9Jan 24 11:02:28 Tower kernel: delayed_ref_async_start+0x54/0x90Jan 24 11:02:28 Tower kernel: btrfs_worker_helper+0xbc/0x16fJan 24 11:02:28 Tower kernel: process_one_work+0x146/0x239Jan 24 11:02:28 Tower kernel: ? rescuer_thread+0x258/0x258Jan 24 11:02:28 Tower kernel: worker_thread+0x1c3/0x292Jan 24 11:02:28 Tower kernel: kthread+0x10f/0x117Jan 24 11:02:28 Tower kernel: ? kthread_create_on_node+0x3a/0x3aJan 24 11:02:28 Tower kernel: ? SyS_exit_group+0xb/0xbJan 24 11:02:28 Tower kernel: ret_from_fork+0x1f/0x30Jan 24 11:02:28 Tower kernel: Code: 43 38 48 85 c0 74 30 66 66 66 66 90 48 8b 80 c0 02 00 00 65 48 ff 40 78 48 8b 03 48 8b 53 38 48 c1 e8 3a 48 8b 84 c2 b0 03 00 00 <48> 8b 80 88 00 00 00 65 48 ff 40 78 48 89 df be 06 00 00 00 e8 Jan 24 11:02:28 Tower kernel: RIP: account_page_dirtied+0xaf/0x13b RSP: ffffc9000faff9e8Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088Jan 24 11:02:28 Tower kernel: ---[ end trace a3e8bf5e33c2ee49 ]---Jan 24 11:02:28 Tower kernel: note: kworker/u24:6[12156] exited with preempt_count 1Jan 24 11:02:28 Tower kernel: ------------[ cut here ]------------Jan 24 11:02:28 Tower kernel: WARNING: CPU: 4 PID: 12156 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x27/0x281Jan 24 11:02:28 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf e1000e i2c_i801 i2c_core ahci mxm_wmi wmi_bmof libahci ptp wmi pps_core buttonJan 24 11:02:28 Tower kernel: CPU: 4 PID: 12156 Comm: kworker/u24:6 Tainted: G D 4.14.13-unRAID #1Jan 24 11:02:28 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014Jan 24 11:02:28 Tower kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helperJan 24 11:02:28 Tower kernel: task: ffff8807005d7000 task.stack: ffffc9000fafc000Jan 24 11:02:28 Tower kernel: RIP: 0010:rcu_note_context_switch+0x27/0x281Jan 24 11:02:28 Tower kernel: RSP: 0018:ffffc9000faffe68 EFLAGS: 00010002Jan 24 11:02:28 Tower kernel: RAX: 0000000000020000 RBX: ffff8807005d7000 RCX: ffff8807005d7330Jan 24 11:02:28 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000Jan 24 11:02:28 Tower kernel: RBP: ffff8807005d7000 R08: 0000000000000001 R09: ffffffff8104a700Jan 24 11:02:28 Tower kernel: R10: ffffea0004724d80 R11: ffffffff8200fe01 R12: 0000000000000000Jan 24 11:02:28 Tower kernel: R13: 0000000000000000 R14: ffff8807005d75a0 R15: 0000000000020900Jan 24 11:02:28 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff300000(0000) knlGS:0000000000000000Jan 24 11:02:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088 CR3: 0000000004c0a006 CR4: 00000000000606e0Jan 24 11:02:28 Tower kernel: Call Trace:Jan 24 11:02:28 Tower kernel: __schedule+0x88/0x4e9Jan 24 11:02:28 Tower kernel: do_task_dead+0x38/0x3aJan 24 11:02:28 Tower kernel: do_exit+0x896/0x896Jan 24 11:02:28 Tower kernel: rewind_stack_do_exit+0x17/0x20Jan 24 11:02:28 Tower kernel: Code: 5c 41 5d c3 41 56 41 55 41 54 41 89 fc 55 53 65 48 8b 2c 25 00 5c 01 00 e8 c1 e6 ff ff 45 84 e4 75 0b 83 bd 28 03 00 00 00 7e 02 <0f> ff 83 bd 28 03 00 00 00 0f 8e ce 01 00 00 80 bd 2c 03 00 00 Jan 24 11:02:28 Tower kernel: ---[ end trace a3e8bf5e33c2ee4a ]---Jan 24 11:02:35 Tower kernel: traps: emhttpd[6918] trap divide error ip:419f15 sp:14596af71e00 error:0 in emhttpd[400000+26000] ^^This was the last transmission that went through before it froze. If it was continuing to spit out info after that, I don't think there's a way to grab it as it locks up the GUI interface I have set up on the array as well. It ran all night just fine and got fussy as soon as I tried to move some data onto the array through the network. That's what I was in the process of doing when it locked up this time. Edited January 24, 2018 by TheWooginator Quote Link to comment
JorgeB Posted January 24, 2018 Share Posted January 24, 2018 After rebooting post the output of: btrfs dev stats /mnt/cache Any errors will survive a reboot. Quote Link to comment
TheWooginator Posted January 24, 2018 Author Share Posted January 24, 2018 Same as before. root@Tower:~# btrfs dev stats /mnt/cache [/dev/sdf1].write_io_errs 0 [/dev/sdf1].read_io_errs 0 [/dev/sdf1].flush_io_errs 0 [/dev/sdf1].corruption_errs 0 [/dev/sdf1].generation_errs 0 [/dev/sde1].write_io_errs 0 [/dev/sde1].read_io_errs 0 [/dev/sde1].flush_io_errs 0 [/dev/sde1].corruption_errs 0 [/dev/sde1].generation_errs 0 Quote Link to comment
JorgeB Posted January 24, 2018 Share Posted January 24, 2018 They still look like hardware errors to me, memory or board would be my first candidates. Quote Link to comment
TheWooginator Posted January 24, 2018 Author Share Posted January 24, 2018 This hardware was running fine as a normal PC before it became an UnRaid array. I'm not ruling it out, but I don't think it's hardware. Memtest+ on the boot flash doesn't seem to want to run for me, but the bios seems to think the memory is ok. Quote Link to comment
Chad Kunsman Posted January 24, 2018 Share Posted January 24, 2018 (edited) Chiming in with a similar (if not the exact same) problem. My cache drive is very healthy. Ran the clean/clear plugin before assigning it as cache. No issues at all. Output of btrfs dev stats /mnt/cache is clean. Memtest ran overnight before setting up Unraid a couple weeks ago was clean. What I am doing is an rsync of data from a USB external (unassigned device) over to the array (which is hitting cache drive first for now). While the rsync is running, aspects of the Unraid GUI get very unresponsive. Dashboard, and Docker container tabs often will become unresponsive. Running dockers (Radarr and Sonarr) simply will not load while this copy is running. Cancel the copy and things pick up back to normal. Restart the copy, then it goes to hell again. Appdata for all dockers is on the cache drive, by the way. Edited January 24, 2018 by Chad Kunsman Quote Link to comment
TheWooginator Posted January 24, 2018 Author Share Posted January 24, 2018 Did it again. Moving files onto the array from my iMac through the network. Nothing special. I'm going to continue posting these until someone can give me a definitive answer on what the hell is going on. I'm not a linux expert, so this reads like Swahili to me. Jan 24 12:03:49 Tower root: move: file /mnt/cache/MyMedia/Movies/Power Rangers (2017) [1080p] [YTS.AG]/WWW.YTS.AG.jpgJan 24 12:03:49 Tower root: move_object: /mnt/cache/MyMedia/Movies: Directory not emptyJan 24 12:03:49 Tower root: move_object: /mnt/cache/MyMedia: Directory not emptyJan 24 12:03:49 Tower root: mover: finishedJan 24 12:11:16 Tower emhttpd: req (3): shareMoverSchedule=0+0+*+*+*&shareMoverLogging=yes&changeMover=Apply&csrf_token=****************Jan 24 12:11:16 Tower emhttpd: shcmd (229): /usr/local/sbin/update_cronJan 24 16:08:42 Tower shfs: error: shfs_rmdir, 1517: Directory not empty (39): rmdir: /mnt/cache/appdata/binhex-plexpass/Plex Media Server/Cache/Transcode/Sessions/plex-transcode-fab9f93d4063e672-com-plexapp-android-24983809-a3ee-4af3-bcf6-231800306c36Jan 24 16:28:21 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000080Jan 24 16:28:21 Tower kernel: IP: workingset_eviction+0x40/0x85Jan 24 16:28:21 Tower kernel: PGD 0 P4D 0 Jan 24 16:28:21 Tower kernel: Oops: 0000 [#1] PREEMPT SMP PTIJan 24 16:28:21 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd e1000e intel_cstate intel_uncore intel_rapl_perf i2c_i801 i2c_core ahci libahci ptp mxm_wmi wmi_bmof wmi pps_core buttonJan 24 16:28:21 Tower kernel: CPU: 9 PID: 864 Comm: kswapd0 Not tainted 4.14.13-unRAID #1Jan 24 16:28:21 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014Jan 24 16:28:21 Tower kernel: task: ffff8807fa3b3800 task.stack: ffffc9000364c000Jan 24 16:28:21 Tower kernel: RIP: 0010:workingset_eviction+0x40/0x85Jan 24 16:28:21 Tower kernel: RSP: 0018:ffffc9000364fb88 EFLAGS: 00010047Jan 24 16:28:21 Tower kernel: RAX: 0000000000000000 RBX: ffffea0009172900 RCX: 0000000000000000Jan 24 16:28:21 Tower kernel: RDX: 0000000000000000 RSI: ffff88081fff9000 RDI: ffff8807c03c7470Jan 24 16:28:21 Tower kernel: RBP: ffff8807c03c7488 R08: 0000000000024b80 R09: ffffea0009172801Jan 24 16:28:21 Tower kernel: R10: 0000000000000001 R11: ffffea0009172900 R12: 0000000000000286Jan 24 16:28:21 Tower kernel: R13: 0000000000000001 R14: 0000000000000000 R15: ffff8807c03c7470Jan 24 16:28:21 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff440000(0000) knlGS:0000000000000000Jan 24 16:28:21 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033Jan 24 16:28:21 Tower kernel: CR2: 0000000000000080 CR3: 0000000004c0a002 CR4: 00000000000606e0Jan 24 16:28:21 Tower kernel: Call Trace:Jan 24 16:28:21 Tower kernel: __remove_mapping+0x177/0x1bcJan 24 16:28:21 Tower kernel: shrink_page_list+0x8a5/0xa8fJan 24 16:28:21 Tower kernel: shrink_inactive_list+0x25f/0x3d5Jan 24 16:28:21 Tower kernel: shrink_node_memcg+0x4c9/0x680Jan 24 16:28:21 Tower kernel: ? shrink_node+0xce/0x29bJan 24 16:28:21 Tower kernel: shrink_node+0xce/0x29bJan 24 16:28:21 Tower kernel: kswapd+0x437/0x55aJan 24 16:28:21 Tower kernel: ? __switch_to+0xd4/0x2e8Jan 24 16:28:21 Tower kernel: ? mem_cgroup_shrink_node+0x89/0x89Jan 24 16:28:21 Tower kernel: kthread+0x10f/0x117Jan 24 16:28:21 Tower kernel: ? kthread_create_on_node+0x3a/0x3aJan 24 16:28:21 Tower kernel: ret_from_fork+0x1f/0x30Jan 24 16:28:21 Tower kernel: Code: 66 66 90 0f b7 91 b8 00 00 00 eb 02 31 d2 66 66 66 66 90 48 63 86 40 3a 00 00 48 8b 8c c1 b0 03 00 00 eb 07 48 8d 8e 20 3b 00 00 <48> 39 b1 80 00 00 00 74 07 48 89 b1 80 00 00 00 b8 01 00 00 00 Jan 24 16:28:21 Tower kernel: RIP: workingset_eviction+0x40/0x85 RSP: ffffc9000364fb88Jan 24 16:28:21 Tower kernel: CR2: 0000000000000080Jan 24 16:28:21 Tower kernel: ---[ end trace 68f0b4a0a40ba233 ]---Jan 24 16:28:21 Tower kernel: note: kswapd0[864] exited with preempt_count 1 Quote Link to comment
trurl Posted January 24, 2018 Share Posted January 24, 2018 4 hours ago, Chad Kunsman said: Chiming in with a similar (if not the exact same) problem. If you want support you should start your own thread and leave this one for responses to the original poster. Quote Link to comment
Chad Kunsman Posted January 25, 2018 Share Posted January 25, 2018 (edited) 55 minutes ago, trurl said: If you want support you should start your own thread and leave this one for responses to the original poster. Yes and no. I don't need anyone in this thread to specifically help me or answer my questions at this time. But I was simply stating "I think this same thing is happening to me and perhaps the OP's issue isn't a one-off limited to just them". However if this turns into nothing, then I'll post again under my own thread. Edited January 25, 2018 by Chad Kunsman Quote Link to comment
TheWooginator Posted January 26, 2018 Author Share Posted January 26, 2018 (edited) So no crashes overnight other than when my power actually went out at house briefly which was odd, but these were some errors that popped up multiple times in the middle of the night that did't cause a meltdown: Jan 25 04:49:15 Tower kernel: CPU: 1 PID: 5069 Comm: shfs Tainted: G D W 4.14.13-unRAID #1 Jan 25 04:47:34 Tower kernel: CPU: 4 PID: 350 Comm: khugepaged Tainted: G D W 4.14.13-unRAID #1 It seems that as long as I'm not moving chunks of data onto or around inside of the array, it stays happy. Baby steps. Edited January 26, 2018 by TheWooginator Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.