CPU Stall Issue


Recommended Posts

I have a persistent & intermittent issue that I've been unable to trace the cause of.

Basically after a random amount of time my SMB shares, PiHole via docker, WebGUI etc all become unavailable.

 

After doing some musical chairs with hardware the issue persists (all headless).
I've tried Asus KCMA-D8 with 2x Opteron 4365 EE, Supermicro H8SCM-F with Opteron 4184 and now I'm using Supermicro X9SRi-F with Xeon E5-2620.  NIC has been the same HP NC560SFP+ (Intel X520-2).

 

Since the only fix so far has been a hard reset of the system, I set up a syslog server to at least catch what was going on.  Please see CPU Stall syslog txt file attached.  Diagnostics zip is after restart.

Side note, remoting in via ipmi to restart server, the system just stops at starting diagnostics collection.

 

This might just be coincidence but after stopping my pihole docker (only one running) I had 5 days of uptime (longest in a while), and several hours after starting it again the issue reappeared again.

 

Any help would be greatly appreciated.

aeryn-sun-diagnostics-20210514-1543.zip CPU Stall syslog.txt

Link to comment

I'm not 100% if this applies as I don't see any reference to macvlan, ip, ipv4 or even net within my call trace.

Spoiler

May 14 14:39:24 Aeryn-Sun kernel: rcu: INFO: rcu_sched self-detected stall on CPU
May 14 14:39:24 Aeryn-Sun kernel: rcu: #0116-....: (59998 ticks this GP) idle=f9a/1/0x4000000000000000 softirq=7599661/7599661 fqs=14993 
May 14 14:39:24 Aeryn-Sun kernel: #011(t=60000 jiffies g=37014061 q=66350)
May 14 14:39:24 Aeryn-Sun kernel: NMI backtrace for cpu 6
May 14 14:39:24 Aeryn-Sun kernel: CPU: 6 PID: 19374 Comm: kworker/u24:2 Tainted: G        W         5.10.28-Unraid #1
May 14 14:39:24 Aeryn-Sun kernel: Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.3 05/29/2018
May 14 14:39:24 Aeryn-Sun kernel: Workqueue: events_power_efficient gc_worker [nf_conntrack]
May 14 14:39:24 Aeryn-Sun kernel: Call Trace:
May 14 14:39:24 Aeryn-Sun kernel: <IRQ>
May 14 14:39:24 Aeryn-Sun kernel: dump_stack+0x6b/0x83
May 14 14:39:24 Aeryn-Sun kernel: ? lapic_can_unplug_cpu+0x8e/0x8e
May 14 14:39:24 Aeryn-Sun kernel: nmi_cpu_backtrace+0x7d/0x8f
May 14 14:39:24 Aeryn-Sun kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3
May 14 14:39:24 Aeryn-Sun kernel: rcu_dump_cpu_stacks+0x9f/0xc6
May 14 14:39:24 Aeryn-Sun kernel: rcu_sched_clock_irq+0x1ec/0x543
May 14 14:39:24 Aeryn-Sun kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
May 14 14:39:24 Aeryn-Sun kernel: update_process_times+0x50/0x6e
May 14 14:39:24 Aeryn-Sun kernel: tick_sched_timer+0x36/0x64
May 14 14:39:24 Aeryn-Sun kernel: __hrtimer_run_queues+0xb7/0x10b
May 14 14:39:24 Aeryn-Sun kernel: ? tick_sched_do_timer+0x39/0x39
May 14 14:39:24 Aeryn-Sun kernel: hrtimer_interrupt+0x8d/0x15b
May 14 14:39:24 Aeryn-Sun kernel: __sysvec_apic_timer_interrupt+0x5d/0x68
May 14 14:39:24 Aeryn-Sun kernel: asm_call_irq_on_stack+0x12/0x20
May 14 14:39:24 Aeryn-Sun kernel: </IRQ>
May 14 14:39:24 Aeryn-Sun kernel: sysvec_apic_timer_interrupt+0x71/0x95
May 14 14:39:24 Aeryn-Sun kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
May 14 14:39:24 Aeryn-Sun kernel: RIP: 0010:nf_ct_tuplehash_to_ctrack+0xd/0xe [nf_conntrack]
May 14 14:39:24 Aeryn-Sun kernel: Code: 75 04 48 89 50 08 c3 48 8b 06 48 89 77 08 48 89 07 a8 01 48 89 3e 75 04 48 89 78 08 c3 0f b6 47 37 48 6b c0 c8 48 8d 44 07 f0 <c3> 48 8b 87 b8 00 00 00 48 85 c0 74 12 40 0f b6 f6 0f b6 14 30 84
May 14 14:39:24 Aeryn-Sun kernel: RSP: 0018:ffffc90001737e40 EFLAGS: 00000282
May 14 14:39:24 Aeryn-Sun kernel: RAX: ffff888daa8b9b80 RBX: 0000000000000000 RCX: ffff88815b180000
May 14 14:39:24 Aeryn-Sun kernel: RDX: 000000011baf306e RSI: ffffc90001737e5c RDI: ffff888daa8b9bc8
May 14 14:39:24 Aeryn-Sun kernel: RBP: 0000000000005649 R08: 0000000000000000 R09: 0000746e65696369
May 14 14:39:24 Aeryn-Sun kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: ffffffffa01b95a0
May 14 14:39:24 Aeryn-Sun kernel: R13: 000000001e8d77fc R14: ffff888daa8b9bc8 R15: ffff888daa8b9b80
May 14 14:39:24 Aeryn-Sun kernel: gc_worker+0x9a/0x240 [nf_conntrack]
May 14 14:39:24 Aeryn-Sun kernel: process_one_work+0x13c/0x1d5
May 14 14:39:24 Aeryn-Sun kernel: worker_thread+0x18b/0x22f
May 14 14:39:24 Aeryn-Sun kernel: ? process_scheduled_works+0x27/0x27
May 14 14:39:24 Aeryn-Sun kernel: kthread+0xe5/0xea
May 14 14:39:24 Aeryn-Sun kernel: ? __kthread_bind_mask+0x57/0x57
May 14 14:39:24 Aeryn-Sun kernel: ret_from_fork+0x22/0x30
May 14 14:39:28 Aeryn-Sun kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 6-... } 61871 jiffies s: 1533 root: 0x40/.
May 14 14:39:28 Aeryn-Sun kernel: rcu: blocking rcu_node structures:
May 14 14:39:28 Aeryn-Sun kernel: Task dump for CPU 6:
May 14 14:39:28 Aeryn-Sun kernel: task:kworker/u24:2   state:R  running task     stack:    0 pid:19374 ppid:     2 flags:0x00004008
May 14 14:39:28 Aeryn-Sun kernel: Workqueue: events_power_efficient gc_worker [nf_conntrack]
May 14 14:39:28 Aeryn-Sun kernel: Call Trace:
May 14 14:39:28 Aeryn-Sun kernel: ? process_one_work+0x13c/0x1d5
May 14 14:39:28 Aeryn-Sun kernel: ? worker_thread+0x18b/0x22f
May 14 14:39:28 Aeryn-Sun kernel: ? process_scheduled_works+0x27/0x27
May 14 14:39:28 Aeryn-Sun kernel: ? kthread+0xe5/0xea
May 14 14:39:28 Aeryn-Sun kernel: ? __kthread_bind_mask+0x57/0x57
May 14 14:39:28 Aeryn-Sun kernel: ? ret_from_fork+0x22/0x30

 

However, assuming this is the same issue I'll keep the unraid system running without docker and see how it fairs.

Setting up vlans not currently an option for me right now. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.