Jump to content

UnRaid Freezing During Data Transfer - Syslogs Posted


Recommended Posts

Hey guys. I'm working through moving data from an external drive onto my array via Krusader and I've been getting intermittent freezes on the system. Some of it still seems to respond (I just successfully downloaded and installed the SSD Trim Plugin) but the system simply won't respond when trying to do a clean reboot. I was able to collect diagnostic data and I'm waiting to  see if there's something I can do short of doing a push-button reset which would restart my 10 hr parity check. Initially it seemed like some of the freezing was due to the cache drive running out of space while moving data into the array, so I ran the  mover function overnight to give me a solid 200gb of space to "move into". My last dump was about 100gb and it froze up unexpectedly somewhere at the 95% ish complete mark. Cycling docker on/off didn't work and restarting/shutting down Krusader doesn't appear to do anything. I've attached my diagnostic zip file. Please let me know f there's anything else that would be beneficial to have.

tower-diagnostics-20180121-1110.zip

 

EDIT: Adding this to top post for visibility

 

Snagged it! I ran a logger on my iMac to keep an eye on it in case it crashed on me and I was able to grab this before it froze up and forced me to reset it:

 

Jan 24 10:39:31 Tower root: Fix Common Problems Version 2018.01.21
Jan 24 10:39:32 Tower root: Fix Common Problems: /var/log currently 2 % full
Jan 24 10:39:32 Tower root: Fix Common Problems: rootfs (/) currently 5 % full
Jan 24 10:49:32 Tower root: Fix Common Problems Version 2018.01.21
Jan 24 10:49:33 Tower root: Fix Common Problems: /var/log currently 2 % full
Jan 24 10:49:33 Tower root: Fix Common Problems: rootfs (/) currently 5 % full
Jan 24 10:59:33 Tower root: Fix Common Problems Version 2018.01.21
Jan 24 10:59:34 Tower root: Fix Common Problems: /var/log currently 2 % full
Jan 24 10:59:34 Tower root: Fix Common Problems: rootfs (/) currently 5 % full
Jan 24 11:00:01 Tower root: mover: started
Jan 24 11:02:28 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
Jan 24 11:02:28 Tower kernel: IP: account_page_dirtied+0xaf/0x13b
Jan 24 11:02:28 Tower kernel: PGD 0 P4D 0 
Jan 24 11:02:28 Tower kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jan 24 11:02:28 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf e1000e i2c_i801 i2c_core ahci mxm_wmi wmi_bmof libahci ptp wmi pps_core button
Jan 24 11:02:28 Tower kernel: CPU: 4 PID: 12156 Comm: kworker/u24:6 Not tainted 4.14.13-unRAID #1
Jan 24 11:02:28 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014
Jan 24 11:02:28 Tower kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
Jan 24 11:02:28 Tower kernel: task: ffff8807005d7000 task.stack: ffffc9000fafc000
Jan 24 11:02:28 Tower kernel: RIP: 0010:account_page_dirtied+0xaf/0x13b
Jan 24 11:02:28 Tower kernel: RSP: 0018:ffffc9000faff9e8 EFLAGS: 00010047
Jan 24 11:02:28 Tower kernel: RAX: 0000000000000000 RBX: ffffea0009172900 RCX: 0000000000024b90
Jan 24 11:02:28 Tower kernel: RDX: ffff8807fcc19400 RSI: 000000000000000f RDI: ffff88081fff9000
Jan 24 11:02:28 Tower kernel: RBP: ffff8807f79c8458 R08: 0000000000024b80 R09: 0000000000000000
Jan 24 11:02:28 Tower kernel: R10: ffff8807f69f8368 R11: 000000000007c439 R12: ffff8807f69f8378
Jan 24 11:02:28 Tower kernel: R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000
Jan 24 11:02:28 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff300000(0000) knlGS:0000000000000000
Jan 24 11:02:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088 CR3: 0000000004c0a006 CR4: 00000000000606e0
Jan 24 11:02:28 Tower kernel: Call Trace:
Jan 24 11:02:28 Tower kernel: __set_page_dirty_nobuffers+0x98/0x12c
Jan 24 11:02:28 Tower kernel: set_extent_buffer_dirty+0x6a/0x76
Jan 24 11:02:28 Tower kernel: btrfs_mark_buffer_dirty+0x75/0x98
Jan 24 11:02:28 Tower kernel: __btrfs_cow_block+0x49e/0x4b8
Jan 24 11:02:28 Tower kernel: btrfs_cow_block+0x106/0x114
Jan 24 11:02:28 Tower kernel: btrfs_search_slot+0x330/0x83c
Jan 24 11:02:28 Tower kernel: btrfs_del_csums+0xaa/0x340
Jan 24 11:02:28 Tower kernel: ? release_extent_buffer+0x7e/0x85
Jan 24 11:02:28 Tower kernel: __btrfs_free_extent+0x8dc/0x9e8
Jan 24 11:02:28 Tower kernel: __btrfs_run_delayed_refs+0xa7f/0xc84
Jan 24 11:02:28 Tower kernel: ? kmem_cache_free+0x12e/0x131
Jan 24 11:02:28 Tower kernel: btrfs_run_delayed_refs+0x68/0x1e9
Jan 24 11:02:28 Tower kernel: delayed_ref_async_start+0x54/0x90
Jan 24 11:02:28 Tower kernel: btrfs_worker_helper+0xbc/0x16f
Jan 24 11:02:28 Tower kernel: process_one_work+0x146/0x239
Jan 24 11:02:28 Tower kernel: ? rescuer_thread+0x258/0x258
Jan 24 11:02:28 Tower kernel: worker_thread+0x1c3/0x292
Jan 24 11:02:28 Tower kernel: kthread+0x10f/0x117
Jan 24 11:02:28 Tower kernel: ? kthread_create_on_node+0x3a/0x3a
Jan 24 11:02:28 Tower kernel: ? SyS_exit_group+0xb/0xb
Jan 24 11:02:28 Tower kernel: ret_from_fork+0x1f/0x30
Jan 24 11:02:28 Tower kernel: Code: 43 38 48 85 c0 74 30 66 66 66 66 90 48 8b 80 c0 02 00 00 65 48 ff 40 78 48 8b 03 48 8b 53 38 48 c1 e8 3a 48 8b 84 c2 b0 03 00 00 <48> 8b 80 88 00 00 00 65 48 ff 40 78 48 89 df be 06 00 00 00 e8 
Jan 24 11:02:28 Tower kernel: RIP: account_page_dirtied+0xaf/0x13b RSP: ffffc9000faff9e8
Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088
Jan 24 11:02:28 Tower kernel: ---[ end trace a3e8bf5e33c2ee49 ]---
Jan 24 11:02:28 Tower kernel: note: kworker/u24:6[12156] exited with preempt_count 1
Jan 24 11:02:28 Tower kernel: ------------[ cut here ]------------
Jan 24 11:02:28 Tower kernel: WARNING: CPU: 4 PID: 12156 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x27/0x281
Jan 24 11:02:28 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf e1000e i2c_i801 i2c_core ahci mxm_wmi wmi_bmof libahci ptp wmi pps_core button
Jan 24 11:02:28 Tower kernel: CPU: 4 PID: 12156 Comm: kworker/u24:6 Tainted: G D 4.14.13-unRAID #1
Jan 24 11:02:28 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014
Jan 24 11:02:28 Tower kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
Jan 24 11:02:28 Tower kernel: task: ffff8807005d7000 task.stack: ffffc9000fafc000
Jan 24 11:02:28 Tower kernel: RIP: 0010:rcu_note_context_switch+0x27/0x281
Jan 24 11:02:28 Tower kernel: RSP: 0018:ffffc9000faffe68 EFLAGS: 00010002
Jan 24 11:02:28 Tower kernel: RAX: 0000000000020000 RBX: ffff8807005d7000 RCX: ffff8807005d7330
Jan 24 11:02:28 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jan 24 11:02:28 Tower kernel: RBP: ffff8807005d7000 R08: 0000000000000001 R09: ffffffff8104a700
Jan 24 11:02:28 Tower kernel: R10: ffffea0004724d80 R11: ffffffff8200fe01 R12: 0000000000000000
Jan 24 11:02:28 Tower kernel: R13: 0000000000000000 R14: ffff8807005d75a0 R15: 0000000000020900
Jan 24 11:02:28 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff300000(0000) knlGS:0000000000000000
Jan 24 11:02:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088 CR3: 0000000004c0a006 CR4: 00000000000606e0
Jan 24 11:02:28 Tower kernel: Call Trace:
Jan 24 11:02:28 Tower kernel: __schedule+0x88/0x4e9
Jan 24 11:02:28 Tower kernel: do_task_dead+0x38/0x3a
Jan 24 11:02:28 Tower kernel: do_exit+0x896/0x896
Jan 24 11:02:28 Tower kernel: rewind_stack_do_exit+0x17/0x20
Jan 24 11:02:28 Tower kernel: Code: 5c 41 5d c3 41 56 41 55 41 54 41 89 fc 55 53 65 48 8b 2c 25 00 5c 01 00 e8 c1 e6 ff ff 45 84 e4 75 0b 83 bd 28 03 00 00 00 7e 02 <0f> ff 83 bd 28 03 00 00 00 0f 8e ce 01 00 00 80 bd 2c 03 00 00 
Jan 24 11:02:28 Tower kernel: ---[ end trace a3e8bf5e33c2ee4a ]---
Jan 24 11:02:35 Tower kernel: traps: emhttpd[6918] trap divide error ip:419f15 sp:14596af71e00 error:0 in emhttpd[400000+26000]

 

^^This was the last transmission that went through before it froze. If it was continuing to spit out info after that, I don't think there's a way to grab it as it locks up the GUI interface I have set up on the array as well. It ran all night just fine and got fussy as soon as I tried to move some data onto the array through the network. That's what I was in the process of doing when it locked up this time.

Edited by TheWooginator
Link to comment

So I went ahead and made sure that the system is on a static IP from the router and I'm still getting errors where the system becomes unresponsive or just shuts me out entirely. It just did it again with no warning while I was moving some data from my iMac to the array via the network. This is getting really frustrating. It seems like it's working fine, running through a parity check, moving data off the cache drives, moving data onto the array via  the network, and then it just takes a $h*t. Please help guys. If I can't make this work before the trial period is over I'm likely going to mothball this project and get a QNAP or something.

tower-diagnostics-20180122-2205.zip

Link to comment
13 hours ago, johnnie.black said:

You have file corruption on your cache drive, this would point to a hardware problem, like bad RAM or possibly a cable problem with one of your SSDs, if you haven't yet run memtest and also check the output of the below for errors:

 


btrfs dev stats /mnt/cache

 

 

Thanks for that. I went ahead and updated the firmware on both SSD's and rebuilt the cache pool, so now we'll see if that helps.

Link to comment
3 minutes ago, johnnie.black said:

SSH into your server or use the console and type:

 


btrfs dev stats /mnt/cache

 

Post the output here.

Linux 4.14.13-unRAID.
root@Tower:~# btrfs dev stats /mnt/cache
[/dev/sdf1].write_io_errs    0
[/dev/sdf1].read_io_errs     0
[/dev/sdf1].flush_io_errs    0
[/dev/sdf1].corruption_errs  0
[/dev/sdf1].generation_errs  0
[/dev/sde1].write_io_errs    0
[/dev/sde1].read_io_errs     0
[/dev/sde1].flush_io_errs    0
[/dev/sde1].corruption_errs  0
[/dev/sde1].generation_errs  0
root@Tower:~#

Link to comment

Well, it lasted longer this time than any other previous time, so... progress?

It bugged out shortly after I decided to experiment with a VM, but it never got past the point of enabling it in the settings. I was prepping to do a Win 10 VM when it crapped out. The really weird part that I still can't figure out is that when it crashes, it locks up my modem which is connected directly to it. Once I power down the array, the modem magically comes back and re-enables the wifi. I'm set up as static at both ends soooooo....

tower-diagnostics-20180124-0023.zip

Link to comment

Snagged it! I ran a logger on my iMac to keep an eye on it in case it crashed on me and I was able to grab this before it froze up and forced me to reset it:

 

Jan 24 10:39:31 Tower root: Fix Common Problems Version 2018.01.21
Jan 24 10:39:32 Tower root: Fix Common Problems: /var/log currently 2 % full
Jan 24 10:39:32 Tower root: Fix Common Problems: rootfs (/) currently 5 % full
Jan 24 10:49:32 Tower root: Fix Common Problems Version 2018.01.21
Jan 24 10:49:33 Tower root: Fix Common Problems: /var/log currently 2 % full
Jan 24 10:49:33 Tower root: Fix Common Problems: rootfs (/) currently 5 % full
Jan 24 10:59:33 Tower root: Fix Common Problems Version 2018.01.21
Jan 24 10:59:34 Tower root: Fix Common Problems: /var/log currently 2 % full
Jan 24 10:59:34 Tower root: Fix Common Problems: rootfs (/) currently 5 % full
Jan 24 11:00:01 Tower root: mover: started
Jan 24 11:02:28 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
Jan 24 11:02:28 Tower kernel: IP: account_page_dirtied+0xaf/0x13b
Jan 24 11:02:28 Tower kernel: PGD 0 P4D 0 
Jan 24 11:02:28 Tower kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jan 24 11:02:28 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf e1000e i2c_i801 i2c_core ahci mxm_wmi wmi_bmof libahci ptp wmi pps_core button
Jan 24 11:02:28 Tower kernel: CPU: 4 PID: 12156 Comm: kworker/u24:6 Not tainted 4.14.13-unRAID #1
Jan 24 11:02:28 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014
Jan 24 11:02:28 Tower kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
Jan 24 11:02:28 Tower kernel: task: ffff8807005d7000 task.stack: ffffc9000fafc000
Jan 24 11:02:28 Tower kernel: RIP: 0010:account_page_dirtied+0xaf/0x13b
Jan 24 11:02:28 Tower kernel: RSP: 0018:ffffc9000faff9e8 EFLAGS: 00010047
Jan 24 11:02:28 Tower kernel: RAX: 0000000000000000 RBX: ffffea0009172900 RCX: 0000000000024b90
Jan 24 11:02:28 Tower kernel: RDX: ffff8807fcc19400 RSI: 000000000000000f RDI: ffff88081fff9000
Jan 24 11:02:28 Tower kernel: RBP: ffff8807f79c8458 R08: 0000000000024b80 R09: 0000000000000000
Jan 24 11:02:28 Tower kernel: R10: ffff8807f69f8368 R11: 000000000007c439 R12: ffff8807f69f8378
Jan 24 11:02:28 Tower kernel: R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000000
Jan 24 11:02:28 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff300000(0000) knlGS:0000000000000000
Jan 24 11:02:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088 CR3: 0000000004c0a006 CR4: 00000000000606e0
Jan 24 11:02:28 Tower kernel: Call Trace:
Jan 24 11:02:28 Tower kernel: __set_page_dirty_nobuffers+0x98/0x12c
Jan 24 11:02:28 Tower kernel: set_extent_buffer_dirty+0x6a/0x76
Jan 24 11:02:28 Tower kernel: btrfs_mark_buffer_dirty+0x75/0x98
Jan 24 11:02:28 Tower kernel: __btrfs_cow_block+0x49e/0x4b8
Jan 24 11:02:28 Tower kernel: btrfs_cow_block+0x106/0x114
Jan 24 11:02:28 Tower kernel: btrfs_search_slot+0x330/0x83c
Jan 24 11:02:28 Tower kernel: btrfs_del_csums+0xaa/0x340
Jan 24 11:02:28 Tower kernel: ? release_extent_buffer+0x7e/0x85
Jan 24 11:02:28 Tower kernel: __btrfs_free_extent+0x8dc/0x9e8
Jan 24 11:02:28 Tower kernel: __btrfs_run_delayed_refs+0xa7f/0xc84
Jan 24 11:02:28 Tower kernel: ? kmem_cache_free+0x12e/0x131
Jan 24 11:02:28 Tower kernel: btrfs_run_delayed_refs+0x68/0x1e9
Jan 24 11:02:28 Tower kernel: delayed_ref_async_start+0x54/0x90
Jan 24 11:02:28 Tower kernel: btrfs_worker_helper+0xbc/0x16f
Jan 24 11:02:28 Tower kernel: process_one_work+0x146/0x239
Jan 24 11:02:28 Tower kernel: ? rescuer_thread+0x258/0x258
Jan 24 11:02:28 Tower kernel: worker_thread+0x1c3/0x292
Jan 24 11:02:28 Tower kernel: kthread+0x10f/0x117
Jan 24 11:02:28 Tower kernel: ? kthread_create_on_node+0x3a/0x3a
Jan 24 11:02:28 Tower kernel: ? SyS_exit_group+0xb/0xb
Jan 24 11:02:28 Tower kernel: ret_from_fork+0x1f/0x30
Jan 24 11:02:28 Tower kernel: Code: 43 38 48 85 c0 74 30 66 66 66 66 90 48 8b 80 c0 02 00 00 65 48 ff 40 78 48 8b 03 48 8b 53 38 48 c1 e8 3a 48 8b 84 c2 b0 03 00 00 <48> 8b 80 88 00 00 00 65 48 ff 40 78 48 89 df be 06 00 00 00 e8 
Jan 24 11:02:28 Tower kernel: RIP: account_page_dirtied+0xaf/0x13b RSP: ffffc9000faff9e8
Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088
Jan 24 11:02:28 Tower kernel: ---[ end trace a3e8bf5e33c2ee49 ]---
Jan 24 11:02:28 Tower kernel: note: kworker/u24:6[12156] exited with preempt_count 1
Jan 24 11:02:28 Tower kernel: ------------[ cut here ]------------
Jan 24 11:02:28 Tower kernel: WARNING: CPU: 4 PID: 12156 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x27/0x281
Jan 24 11:02:28 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_uncore intel_rapl_perf e1000e i2c_i801 i2c_core ahci mxm_wmi wmi_bmof libahci ptp wmi pps_core button
Jan 24 11:02:28 Tower kernel: CPU: 4 PID: 12156 Comm: kworker/u24:6 Tainted: G D 4.14.13-unRAID #1
Jan 24 11:02:28 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014
Jan 24 11:02:28 Tower kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
Jan 24 11:02:28 Tower kernel: task: ffff8807005d7000 task.stack: ffffc9000fafc000
Jan 24 11:02:28 Tower kernel: RIP: 0010:rcu_note_context_switch+0x27/0x281
Jan 24 11:02:28 Tower kernel: RSP: 0018:ffffc9000faffe68 EFLAGS: 00010002
Jan 24 11:02:28 Tower kernel: RAX: 0000000000020000 RBX: ffff8807005d7000 RCX: ffff8807005d7330
Jan 24 11:02:28 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jan 24 11:02:28 Tower kernel: RBP: ffff8807005d7000 R08: 0000000000000001 R09: ffffffff8104a700
Jan 24 11:02:28 Tower kernel: R10: ffffea0004724d80 R11: ffffffff8200fe01 R12: 0000000000000000
Jan 24 11:02:28 Tower kernel: R13: 0000000000000000 R14: ffff8807005d75a0 R15: 0000000000020900
Jan 24 11:02:28 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff300000(0000) knlGS:0000000000000000
Jan 24 11:02:28 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 24 11:02:28 Tower kernel: CR2: 0000000000000088 CR3: 0000000004c0a006 CR4: 00000000000606e0
Jan 24 11:02:28 Tower kernel: Call Trace:
Jan 24 11:02:28 Tower kernel: __schedule+0x88/0x4e9
Jan 24 11:02:28 Tower kernel: do_task_dead+0x38/0x3a
Jan 24 11:02:28 Tower kernel: do_exit+0x896/0x896
Jan 24 11:02:28 Tower kernel: rewind_stack_do_exit+0x17/0x20
Jan 24 11:02:28 Tower kernel: Code: 5c 41 5d c3 41 56 41 55 41 54 41 89 fc 55 53 65 48 8b 2c 25 00 5c 01 00 e8 c1 e6 ff ff 45 84 e4 75 0b 83 bd 28 03 00 00 00 7e 02 <0f> ff 83 bd 28 03 00 00 00 0f 8e ce 01 00 00 80 bd 2c 03 00 00 
Jan 24 11:02:28 Tower kernel: ---[ end trace a3e8bf5e33c2ee4a ]---
Jan 24 11:02:35 Tower kernel: traps: emhttpd[6918] trap divide error ip:419f15 sp:14596af71e00 error:0 in emhttpd[400000+26000]

 

^^This was the last transmission that went through before it froze. If it was continuing to spit out info after that, I don't think there's a way to grab it as it locks up the GUI interface I have set up on the array as well. It ran all night just fine and got fussy as soon as I tried to move some data onto the array through the network. That's what I was in the process of doing when it locked up this time.

Edited by TheWooginator
Link to comment

Same as before.

root@Tower:~# btrfs dev stats /mnt/cache
[/dev/sdf1].write_io_errs    0
[/dev/sdf1].read_io_errs     0
[/dev/sdf1].flush_io_errs    0
[/dev/sdf1].corruption_errs  0
[/dev/sdf1].generation_errs  0
[/dev/sde1].write_io_errs    0
[/dev/sde1].read_io_errs     0
[/dev/sde1].flush_io_errs    0
[/dev/sde1].corruption_errs  0
[/dev/sde1].generation_errs  0

Link to comment
  • TheWooginator changed the title to UnRaid becoming totally unresponsive - Call Traces

Chiming in with a similar (if not the exact same) problem.

 

My cache drive is very healthy. Ran the clean/clear plugin before assigning it as cache. No issues at all. Output of btrfs dev stats /mnt/cache is clean.

 

Memtest ran overnight before setting up Unraid a couple weeks ago was clean.

 

What I am doing is an rsync of data from a USB external (unassigned device) over to the array (which is hitting cache drive first for now). 

 

While the rsync is running, aspects of the Unraid GUI get very unresponsive. Dashboard, and Docker container tabs often will become unresponsive. Running dockers (Radarr and Sonarr) simply will not load while this copy is running. Cancel the copy and things pick up back to normal. Restart the copy, then it goes to hell again. 

 

Appdata for all dockers is on the cache drive, by the way. 

Edited by Chad Kunsman
Link to comment

Did it again. Moving files onto the array from my iMac through the network. Nothing special. I'm going to continue posting these until someone can give me a definitive answer on what the hell is going on. I'm not a linux expert, so this reads like Swahili to me.

 

Jan 24 12:03:49 Tower root: move: file /mnt/cache/MyMedia/Movies/Power Rangers (2017) [1080p] [YTS.AG]/WWW.YTS.AG.jpg
Jan 24 12:03:49 Tower root: move_object: /mnt/cache/MyMedia/Movies: Directory not empty
Jan 24 12:03:49 Tower root: move_object: /mnt/cache/MyMedia: Directory not empty
Jan 24 12:03:49 Tower root: mover: finished
Jan 24 12:11:16 Tower emhttpd: req (3): shareMoverSchedule=0+0+*+*+*&shareMoverLogging=yes&changeMover=Apply&csrf_token=****************
Jan 24 12:11:16 Tower emhttpd: shcmd (229): /usr/local/sbin/update_cron
Jan 24 16:08:42 Tower shfs: error: shfs_rmdir, 1517: Directory not empty (39): rmdir: /mnt/cache/appdata/binhex-plexpass/Plex Media Server/Cache/Transcode/Sessions/plex-transcode-fab9f93d4063e672-com-plexapp-android-24983809-a3ee-4af3-bcf6-231800306c36
Jan 24 16:28:21 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
Jan 24 16:28:21 Tower kernel: IP: workingset_eviction+0x40/0x85
Jan 24 16:28:21 Tower kernel: PGD 0 P4D 0 
Jan 24 16:28:21 Tower kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jan 24 16:28:21 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod nct6775 hwmon_vid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd e1000e intel_cstate intel_uncore intel_rapl_perf i2c_i801 i2c_core ahci libahci ptp mxm_wmi wmi_bmof wmi pps_core button
Jan 24 16:28:21 Tower kernel: CPU: 9 PID: 864 Comm: kswapd0 Not tainted 4.14.13-unRAID #1
Jan 24 16:28:21 Tower kernel: Hardware name: System manufacturer System Product Name/SABERTOOTH X79, BIOS 4701 05/06/2014
Jan 24 16:28:21 Tower kernel: task: ffff8807fa3b3800 task.stack: ffffc9000364c000
Jan 24 16:28:21 Tower kernel: RIP: 0010:workingset_eviction+0x40/0x85
Jan 24 16:28:21 Tower kernel: RSP: 0018:ffffc9000364fb88 EFLAGS: 00010047
Jan 24 16:28:21 Tower kernel: RAX: 0000000000000000 RBX: ffffea0009172900 RCX: 0000000000000000
Jan 24 16:28:21 Tower kernel: RDX: 0000000000000000 RSI: ffff88081fff9000 RDI: ffff8807c03c7470
Jan 24 16:28:21 Tower kernel: RBP: ffff8807c03c7488 R08: 0000000000024b80 R09: ffffea0009172801
Jan 24 16:28:21 Tower kernel: R10: 0000000000000001 R11: ffffea0009172900 R12: 0000000000000286
Jan 24 16:28:21 Tower kernel: R13: 0000000000000001 R14: 0000000000000000 R15: ffff8807c03c7470
Jan 24 16:28:21 Tower kernel: FS: 0000000000000000(0000) GS:ffff8807ff440000(0000) knlGS:0000000000000000
Jan 24 16:28:21 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 24 16:28:21 Tower kernel: CR2: 0000000000000080 CR3: 0000000004c0a002 CR4: 00000000000606e0
Jan 24 16:28:21 Tower kernel: Call Trace:
Jan 24 16:28:21 Tower kernel: __remove_mapping+0x177/0x1bc
Jan 24 16:28:21 Tower kernel: shrink_page_list+0x8a5/0xa8f
Jan 24 16:28:21 Tower kernel: shrink_inactive_list+0x25f/0x3d5
Jan 24 16:28:21 Tower kernel: shrink_node_memcg+0x4c9/0x680
Jan 24 16:28:21 Tower kernel: ? shrink_node+0xce/0x29b
Jan 24 16:28:21 Tower kernel: shrink_node+0xce/0x29b
Jan 24 16:28:21 Tower kernel: kswapd+0x437/0x55a
Jan 24 16:28:21 Tower kernel: ? __switch_to+0xd4/0x2e8
Jan 24 16:28:21 Tower kernel: ? mem_cgroup_shrink_node+0x89/0x89
Jan 24 16:28:21 Tower kernel: kthread+0x10f/0x117
Jan 24 16:28:21 Tower kernel: ? kthread_create_on_node+0x3a/0x3a
Jan 24 16:28:21 Tower kernel: ret_from_fork+0x1f/0x30
Jan 24 16:28:21 Tower kernel: Code: 66 66 90 0f b7 91 b8 00 00 00 eb 02 31 d2 66 66 66 66 90 48 63 86 40 3a 00 00 48 8b 8c c1 b0 03 00 00 eb 07 48 8d 8e 20 3b 00 00 <48> 39 b1 80 00 00 00 74 07 48 89 b1 80 00 00 00 b8 01 00 00 00 
Jan 24 16:28:21 Tower kernel: RIP: workingset_eviction+0x40/0x85 RSP: ffffc9000364fb88
Jan 24 16:28:21 Tower kernel: CR2: 0000000000000080
Jan 24 16:28:21 Tower kernel: ---[ end trace 68f0b4a0a40ba233 ]---
Jan 24 16:28:21 Tower kernel: note: kswapd0[864] exited with preempt_count 1

Link to comment
55 minutes ago, trurl said:

If you want support you should start your own thread and leave this one for responses to the original poster.

 

Yes and no. I don't need anyone in this thread to specifically help me or answer my questions at this time. But I was simply stating "I think this same thing is happening to me and perhaps the OP's issue isn't a one-off limited to just them". However if this turns into nothing, then I'll post again under my own thread. 

Edited by Chad Kunsman
Link to comment

So no crashes overnight other than when my power actually went out at house briefly which was odd, but these were some errors that popped up multiple times in the middle of the night that did't cause a meltdown:

 

Jan 25 04:49:15 Tower kernel: CPU: 1 PID: 5069 Comm: shfs Tainted: G D W 4.14.13-unRAID #1

Jan 25 04:47:34 Tower kernel: CPU: 4 PID: 350 Comm: khugepaged Tainted: G D W 4.14.13-unRAID #1

It seems that as long as I'm not moving chunks of data onto or around inside of the array, it stays happy. Baby steps.

Edited by TheWooginator
Link to comment
  • TheWooginator changed the title to UnRaid Freezing During Data Transfer - Syslogs Posted

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...