CorvinusLucian Posted May 14, 2021 Share Posted May 14, 2021 I have a persistent & intermittent issue that I've been unable to trace the cause of. Basically after a random amount of time my SMB shares, PiHole via docker, WebGUI etc all become unavailable. After doing some musical chairs with hardware the issue persists (all headless). I've tried Asus KCMA-D8 with 2x Opteron 4365 EE, Supermicro H8SCM-F with Opteron 4184 and now I'm using Supermicro X9SRi-F with Xeon E5-2620. NIC has been the same HP NC560SFP+ (Intel X520-2). Since the only fix so far has been a hard reset of the system, I set up a syslog server to at least catch what was going on. Please see CPU Stall syslog txt file attached. Diagnostics zip is after restart. Side note, remoting in via ipmi to restart server, the system just stops at starting diagnostics collection. This might just be coincidence but after stopping my pihole docker (only one running) I had 5 days of uptime (longest in a while), and several hours after starting it again the issue reappeared again. Any help would be greatly appreciated. aeryn-sun-diagnostics-20210514-1543.zip CPU Stall syslog.txt Quote Link to comment
JorgeB Posted May 14, 2021 Share Posted May 14, 2021 See if this applies to you: https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/ Quote Link to comment
CorvinusLucian Posted May 14, 2021 Author Share Posted May 14, 2021 Thanks for the info, I'll have a deep dive into those later tonight. Quote Link to comment
CorvinusLucian Posted May 16, 2021 Author Share Posted May 16, 2021 I'm not 100% if this applies as I don't see any reference to macvlan, ip, ipv4 or even net within my call trace. Spoiler May 14 14:39:24 Aeryn-Sun kernel: rcu: INFO: rcu_sched self-detected stall on CPU May 14 14:39:24 Aeryn-Sun kernel: rcu: #0116-....: (59998 ticks this GP) idle=f9a/1/0x4000000000000000 softirq=7599661/7599661 fqs=14993 May 14 14:39:24 Aeryn-Sun kernel: #011(t=60000 jiffies g=37014061 q=66350) May 14 14:39:24 Aeryn-Sun kernel: NMI backtrace for cpu 6 May 14 14:39:24 Aeryn-Sun kernel: CPU: 6 PID: 19374 Comm: kworker/u24:2 Tainted: G W 5.10.28-Unraid #1 May 14 14:39:24 Aeryn-Sun kernel: Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.3 05/29/2018 May 14 14:39:24 Aeryn-Sun kernel: Workqueue: events_power_efficient gc_worker [nf_conntrack] May 14 14:39:24 Aeryn-Sun kernel: Call Trace: May 14 14:39:24 Aeryn-Sun kernel: <IRQ> May 14 14:39:24 Aeryn-Sun kernel: dump_stack+0x6b/0x83 May 14 14:39:24 Aeryn-Sun kernel: ? lapic_can_unplug_cpu+0x8e/0x8e May 14 14:39:24 Aeryn-Sun kernel: nmi_cpu_backtrace+0x7d/0x8f May 14 14:39:24 Aeryn-Sun kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 May 14 14:39:24 Aeryn-Sun kernel: rcu_dump_cpu_stacks+0x9f/0xc6 May 14 14:39:24 Aeryn-Sun kernel: rcu_sched_clock_irq+0x1ec/0x543 May 14 14:39:24 Aeryn-Sun kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe May 14 14:39:24 Aeryn-Sun kernel: update_process_times+0x50/0x6e May 14 14:39:24 Aeryn-Sun kernel: tick_sched_timer+0x36/0x64 May 14 14:39:24 Aeryn-Sun kernel: __hrtimer_run_queues+0xb7/0x10b May 14 14:39:24 Aeryn-Sun kernel: ? tick_sched_do_timer+0x39/0x39 May 14 14:39:24 Aeryn-Sun kernel: hrtimer_interrupt+0x8d/0x15b May 14 14:39:24 Aeryn-Sun kernel: __sysvec_apic_timer_interrupt+0x5d/0x68 May 14 14:39:24 Aeryn-Sun kernel: asm_call_irq_on_stack+0x12/0x20 May 14 14:39:24 Aeryn-Sun kernel: </IRQ> May 14 14:39:24 Aeryn-Sun kernel: sysvec_apic_timer_interrupt+0x71/0x95 May 14 14:39:24 Aeryn-Sun kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 May 14 14:39:24 Aeryn-Sun kernel: RIP: 0010:nf_ct_tuplehash_to_ctrack+0xd/0xe [nf_conntrack] May 14 14:39:24 Aeryn-Sun kernel: Code: 75 04 48 89 50 08 c3 48 8b 06 48 89 77 08 48 89 07 a8 01 48 89 3e 75 04 48 89 78 08 c3 0f b6 47 37 48 6b c0 c8 48 8d 44 07 f0 <c3> 48 8b 87 b8 00 00 00 48 85 c0 74 12 40 0f b6 f6 0f b6 14 30 84 May 14 14:39:24 Aeryn-Sun kernel: RSP: 0018:ffffc90001737e40 EFLAGS: 00000282 May 14 14:39:24 Aeryn-Sun kernel: RAX: ffff888daa8b9b80 RBX: 0000000000000000 RCX: ffff88815b180000 May 14 14:39:24 Aeryn-Sun kernel: RDX: 000000011baf306e RSI: ffffc90001737e5c RDI: ffff888daa8b9bc8 May 14 14:39:24 Aeryn-Sun kernel: RBP: 0000000000005649 R08: 0000000000000000 R09: 0000746e65696369 May 14 14:39:24 Aeryn-Sun kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: ffffffffa01b95a0 May 14 14:39:24 Aeryn-Sun kernel: R13: 000000001e8d77fc R14: ffff888daa8b9bc8 R15: ffff888daa8b9b80 May 14 14:39:24 Aeryn-Sun kernel: gc_worker+0x9a/0x240 [nf_conntrack] May 14 14:39:24 Aeryn-Sun kernel: process_one_work+0x13c/0x1d5 May 14 14:39:24 Aeryn-Sun kernel: worker_thread+0x18b/0x22f May 14 14:39:24 Aeryn-Sun kernel: ? process_scheduled_works+0x27/0x27 May 14 14:39:24 Aeryn-Sun kernel: kthread+0xe5/0xea May 14 14:39:24 Aeryn-Sun kernel: ? __kthread_bind_mask+0x57/0x57 May 14 14:39:24 Aeryn-Sun kernel: ret_from_fork+0x22/0x30 May 14 14:39:28 Aeryn-Sun kernel: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 6-... } 61871 jiffies s: 1533 root: 0x40/. May 14 14:39:28 Aeryn-Sun kernel: rcu: blocking rcu_node structures: May 14 14:39:28 Aeryn-Sun kernel: Task dump for CPU 6: May 14 14:39:28 Aeryn-Sun kernel: task:kworker/u24:2 state:R running task stack: 0 pid:19374 ppid: 2 flags:0x00004008 May 14 14:39:28 Aeryn-Sun kernel: Workqueue: events_power_efficient gc_worker [nf_conntrack] May 14 14:39:28 Aeryn-Sun kernel: Call Trace: May 14 14:39:28 Aeryn-Sun kernel: ? process_one_work+0x13c/0x1d5 May 14 14:39:28 Aeryn-Sun kernel: ? worker_thread+0x18b/0x22f May 14 14:39:28 Aeryn-Sun kernel: ? process_scheduled_works+0x27/0x27 May 14 14:39:28 Aeryn-Sun kernel: ? kthread+0xe5/0xea May 14 14:39:28 Aeryn-Sun kernel: ? __kthread_bind_mask+0x57/0x57 May 14 14:39:28 Aeryn-Sun kernel: ? ret_from_fork+0x22/0x30 However, assuming this is the same issue I'll keep the unraid system running without docker and see how it fairs. Setting up vlans not currently an option for me right now. Quote Link to comment
JorgeB Posted May 17, 2021 Share Posted May 17, 2021 18 hours ago, CorvinusLucian said: I don't see any reference to macvlan, ip, ipv4 or even net within my call trace. Yes, but there is nf_conntrack which is usually related. Quote Link to comment
CorvinusLucian Posted May 17, 2021 Author Share Posted May 17, 2021 Good to know thanks Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.