v6b12 - Hard Crash

johnodon · December 21, 2014

I have been experiencing hard crashes fairly frequently...maybe every few days. I finally had the log window open when the last one occurred a few minutes ago. Here is what was captured. Does it lend any clue as to what the culprit is?

I am going to disable my nzbget, nzbdrone and xbmcserver dockers. I need to leave the mariadb one running. The only plugins I have are libvirt, virtman, dynamix and SNAP (which I just removed).

Dec 21 08:33:21 unRAID kernel: perf interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Dec 21 12:02:42 unRAID kernel: br0: port 2(eth1) neighbor 8000.00:25:90:64:a7:d8 lost
Dec 21 12:02:42 unRAID kernel: br0: port 2(eth1) entered listening state
Dec 21 12:02:57 unRAID kernel: br0: port 2(eth1) entered learning state
Dec 21 12:03:12 unRAID kernel: br0: topology change detected, propagating
Dec 21 12:03:12 unRAID kernel: br0: port 2(eth1) entered forwarding state
Dec 21 12:03:22 unRAID kernel: INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 11, t=6002 jiffies, g=306012, c=306011, q=13917)
Dec 21 12:03:22 unRAID kernel: Task dump for CPU 0:
Dec 21 12:03:22 unRAID kernel: swapper/0 R running task 0 0 0 0x00000008
Dec 21 12:03:22 unRAID kernel: ffffffff817ebed8 ffffffff815e64ef ffffffff817ebfd8 ffffffff817fe440
Dec 21 12:03:22 unRAID kernel: 0000000000012c80 ffff88087cd9d8b0 ffffffff817ebe40 ffffffff81085c44
Dec 21 12:03:22 unRAID kernel: 0000000000000005 ffffffff817ebe8c 0000000000000046 ffffffff817ebe48
Dec 21 12:03:22 unRAID kernel: Call Trace:
Dec 21 12:03:22 unRAID kernel: [] ? __schedule+0x43f/0x6a4
Dec 21 12:03:22 unRAID kernel: [] ? tick_broadcast_oneshot_control+0x13d/0x19e
Dec 21 12:03:22 unRAID kernel: [] ? pick_next_task_fair+0x38b/0x40f
Dec 21 12:03:22 unRAID kernel: [] ? cpuidle_enter_state+0x49/0x9d
Dec 21 12:03:22 unRAID kernel: [] ? cpuidle_enter+0x12/0x14
Dec 21 12:03:22 unRAID kernel: [] ? cpu_startup_entry+0x17a/0x22e
Dec 21 12:03:22 unRAID kernel: [] ? rest_init+0x72/0x74
Dec 21 12:03:22 unRAID kernel: [] ? start_kernel+0x400/0x40c
Dec 21 12:03:22 unRAID kernel: [] ? set_init_arg+0x53/0x53
Dec 21 12:03:22 unRAID kernel: [] ? early_idt_handlers+0x120/0x120
Dec 21 12:03:22 unRAID kernel: [] ? x86_64_start_reservations+0x2a/0x2c
Dec 21 12:03:22 unRAID kernel: [] ? x86_64_start_kernel+0xee/0xfb
Dec 21 12:06:22 unRAID kernel: INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 11, t=24007 jiffies, g=306012, c=306011, q=54217)
Dec 21 12:06:22 unRAID kernel: Task dump for CPU 0:
Dec 21 12:06:22 unRAID kernel: swapper/0 R running task 0 0 0 0x00000008
Dec 21 12:06:22 unRAID kernel: ffffffff817ebed8 ffffffff815e64ef ffffffff817ebfd8 ffffffff817fe440
Dec 21 12:06:22 unRAID kernel: 0000000000012c80 ffff88087cd9d8b0 ffffffff817ebe40 ffffffff81085c44
Dec 21 12:06:22 unRAID kernel: 0000000000000005 ffffffff817ebe8c 0000000000000046 ffffffff817ebe48
Dec 21 12:06:22 unRAID kernel: Call Trace:
Dec 21 12:06:22 unRAID kernel: [] ? __schedule+0x43f/0x6a4
Dec 21 12:06:22 unRAID kernel: [] ? tick_broadcast_oneshot_control+0x13d/0x19e
Dec 21 12:06:22 unRAID kernel: [] ? pick_next_task_fair+0x38b/0x40f
Dec 21 12:06:22 unRAID kernel: [] ? cpuidle_enter_state+0x49/0x9d
Dec 21 12:06:22 unRAID kernel: [] ? cpuidle_enter+0x12/0x14
Dec 21 12:06:22 unRAID kernel: [] ? cpu_startup_entry+0x17a/0x22e
Dec 21 12:06:22 unRAID kernel: [] ? rest_init+0x72/0x74
Dec 21 12:06:22 unRAID kernel: [] ? start_kernel+0x400/0x40c

johnodon · December 21, 2014

I just started reading about the "rcu_sched detected stalls on CPUs/tasks" error and have seen quite a few articles about an issue with this in Ubuntu VMs. Here is one: http://www.gossamer-threads.com/lists/openstack/operators/35716

This makes me wonder if my XBMCBuntu VMs are causing the crashes.

EDIT: mhoober has similar (not identical) errors in his syslog and is experiencing crashes: http://lime-technology.com/forum/index.php?topic=37263.0

dgaschk · December 23, 2014

Post the entire log. zip if needed.

johnodon · December 26, 2014

I have not had a single crash since disabling docker containers and removing SNAP. I just re-enabled my NZBGet container to download a movie. Within 15 minutes my server experienced a hard crash. Nothing was written to the syslog just prior to the crash other than the entries showing the container being enabled.

I'm entirely convinced at this point that either Docker and/or more specifically the NZBGet docker is the culprit.

John

SmallwoodDR82 · December 27, 2014

johnodon

You might want to take a look here and here. We are having some similar issues.

http://lime-technology.com/forum/index.php?topic=35788.msg345039#msg345039

http://lime-technology.com/forum/index.php?topic=37311.0

v6b12 - Hard Crash

Recommended Posts

johnodon

Link to comment

johnodon

Link to comment

dgaschk

Link to comment

johnodon

Link to comment

SmallwoodDR82

Link to comment

Archived