kfpersson Posted November 3, 2021 Share Posted November 3, 2021 Hi Guy´s, I´m running Unraid 6.9.2 and i´m running my server on Supermicro hardware. My Unraid server is getting Kernel panic every other day. Hopefully you can find something in the logs that i have missed. I have tested to update the BIOS/BMC version. I have the "IPMI Support" plugin installed. I have tested to uninstall/remove plugins/docker-apps with no luck. Screenshot_01.png - Kernel panic Screenshot_02.png - Temp, sensor info Screenshot_03.png - Run-Time Critical Stop (There is no FAN6 installed, this is from Supermicro IPMI) Screenshot_04.png - Run-Time Critical Stop (This is from IPMI Support plugin) Regards, kfpersson freesuper-diagnostics-20211103-2155.zip Quote Link to comment
ChatNoir Posted November 3, 2021 Share Posted November 3, 2021 I am not sure there is enough data here for a proper diagnostic, the beginning of the kernel panic and the call trace is not visible. You should setup a syslog server and attach the file after your next crash. Quote Link to comment
JonathanM Posted November 3, 2021 Share Posted November 3, 2021 Do any of your containers have custom IP's? Quote Link to comment
kfpersson Posted November 4, 2021 Author Share Posted November 4, 2021 7 hours ago, ChatNoir said: I am not sure there is enough data here for a proper diagnostic, the beginning of the kernel panic and the call trace is not visible. You should setup a syslog server and attach the file after your next crash. syslog-192.168.1.151.txt Here is one syslog output. 7 hours ago, JonathanM said: Do any of your containers have custom IP's? I have 2 containers with custom IP´s. Both of them is Plex. Quote Link to comment
kfpersson Posted November 26, 2021 Author Share Posted November 26, 2021 System crashed yesterday, this is from the syslogs Nov 25 12:32:59 freesuper kernel: process_one_work+0x13c/0x1d5 Nov 25 12:32:59 freesuper kernel: worker_thread+0x18b/0x22f Nov 25 12:32:59 freesuper kernel: ? process_scheduled_works+0x27/0x27 Nov 25 12:32:59 freesuper kernel: kthread+0xe5/0xea Nov 25 12:32:59 freesuper kernel: ? __kthread_bind_mask+0x57/0x57 Nov 25 12:32:59 freesuper kernel: ret_from_fork+0x22/0x30 Nov 25 12:32:59 freesuper kernel: ---[ end trace 5e2210b63115bdbb ]--- Nov 25 13:07:31 freesuper kernel: rcu: INFO: rcu_sched self-detected stall on CPU Nov 25 13:07:31 freesuper kernel: rcu: #0118-....: (59999 ticks this GP) idle=c06/1/0x4000000000000000 softirq=15921732/15921732 fqs=14776 Nov 25 13:07:31 freesuper kernel: #011(t=60000 jiffies g=45901249 q=59294) Nov 25 13:07:31 freesuper kernel: NMI backtrace for cpu 8 Nov 25 13:07:31 freesuper kernel: CPU: 8 PID: 24015 Comm: kworker/u66:6 Tainted: G W 5.10.28-Unraid #1 Nov 25 13:07:31 freesuper kernel: Hardware name: Supermicro PIO-617R-TLN4F+-ST031/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.3 05/23/2018 Nov 25 13:07:31 freesuper kernel: Workqueue: events_power_efficient gc_worker [nf_conntrack] Nov 25 13:07:31 freesuper kernel: Call Trace: Nov 25 13:07:31 freesuper kernel: dump_stack+0x6b/0x83 Nov 25 13:07:31 freesuper kernel: ? lapic_can_unplug_cpu+0x8e/0x8e Nov 25 13:07:31 freesuper kernel: nmi_cpu_backtrace+0x7d/0x8f Nov 25 13:07:31 freesuper kernel: nmi_trigger_cpumask_backtrace+0x56/0xd3 Nov 25 13:07:31 freesuper kernel: rcu_dump_cpu_stacks+0x9f/0xc6 Nov 25 13:07:31 freesuper kernel: rcu_sched_clock_irq+0x1ec/0x543 Nov 25 13:07:31 freesuper kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe Nov 25 13:07:31 freesuper kernel: update_process_times+0x50/0x6e Nov 25 13:07:31 freesuper kernel: tick_sched_timer+0x36/0x64 Nov 25 13:07:31 freesuper kernel: __hrtimer_run_queues+0xb7/0x10b Nov 25 13:07:31 freesuper kernel: ? tick_sched_do_timer+0x39/0x39 Nov 25 13:07:31 freesuper kernel: hrtimer_interrupt+0x8d/0x15b Nov 25 13:07:31 freesuper kernel: __sysvec_apic_timer_interrupt+0x5d/0x68 Nov 25 13:07:31 freesuper kernel: asm_call_irq_on_stack+0x12/0x20 Nov 25 13:07:31 freesuper kernel: </IRQ> Nov 25 13:07:31 freesuper kernel: sysvec_apic_timer_interrupt+0x71/0x95 Nov 25 13:07:31 freesuper kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 Nov 25 13:07:31 freesuper kernel: RIP: 0010:nf_ct_tuplehash_to_ctrack+0xd/0xe [nf_conntrack] Nov 25 13:07:31 freesuper kernel: Code: 75 04 48 89 50 08 c3 48 8b 06 48 89 77 08 48 89 07 a8 01 48 89 3e 75 04 48 89 78 08 c3 0f b6 47 37 48 6b c0 c8 48 8d 44 07 f0 <c3> 48 8b 87 b8 00 00 00 48 85 c0 74 12 40 0f b6 f6 0f b6 14 30 84 Nov 25 13:07:31 freesuper kernel: RSP: 0018:ffffc9000a81fe40 EFLAGS: 00000282 Nov 25 13:07:31 freesuper kernel: RAX: ffff8881492c5b80 RBX: 0000000000000000 RCX: ffff8888be180000 Nov 25 13:07:31 freesuper kernel: RDX: 000000010636587d RSI: ffffc9000a81fe5c RDI: ffff8881492c5bc8 Nov 25 13:07:31 freesuper kernel: RBP: 000000000000922c R08: 0000000000000000 R09: ffffffffa00e229a Nov 25 13:07:31 freesuper kernel: R10: 8080808080808080 R11: ffff888147baa300 R12: ffffffffa00f85a0 Nov 25 13:07:31 freesuper kernel: R13: 000000007469b99e R14: ffff8881492c5bc8 R15: ffff8881492c5b80 Nov 25 13:07:31 freesuper kernel: ? nf_conntrack_free+0x2b/0x35 [nf_conntrack] Nov 25 13:07:31 freesuper kernel: gc_worker+0x9a/0x240 [nf_conntrack] Nov 25 13:07:31 freesuper kernel: process_one_work+0x13c/0x1d5 Nov 25 13:07:31 freesuper kernel: worker_thread+0x18b/0x22f Nov 25 13:07:31 freesuper kernel: ? process_scheduled_works+0x27/0x27 Nov 25 13:07:31 freesuper kernel: kthread+0xe5/0xea Nov 25 13:07:31 freesuper kernel: ? __kthread_bind_mask+0x57/0x57 Nov 25 13:07:31 freesuper kernel: ret_from_fork+0x22/0x30 Nov 25 13:07:31 freesuper kernel: <IRQ> Quote Link to comment
trurl Posted November 27, 2021 Share Posted November 27, 2021 5 hours ago, kfpersson said: this is from the syslogs zip and attach it all Quote Link to comment
kfpersson Posted November 27, 2021 Author Share Posted November 27, 2021 @trurl attached syslog-192.168.1.151.log.zip Quote Link to comment
Solution trurl Posted November 27, 2021 Solution Share Posted November 27, 2021 Nov 17 13:31:57 freesuper kernel: ------------[ cut here ]------------ Nov 17 13:31:57 freesuper kernel: WARNING: CPU: 3 PID: 28384 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack] Nov 17 13:31:57 freesuper kernel: Modules linked in: macvlan xt_mark xt_nat veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat nf_tables vhost_net tun vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod iptable_mangle ipmi_devintf ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding igb i2c_algo_bit dm_mod dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ipmi_ssif crypto_simd cryptd glue_helper rapl intel_cstate isci i2c_i801 i2c_smbus aacraid intel_uncore libsas nvme ahci i2c_core wmi acpi_ipmi input_leds nvme_core led_class scsi_transport_sas ipmi_si libahci button [last unloaded: i2c_algo_bit] 1 Quote Link to comment
kfpersson Posted November 27, 2021 Author Share Posted November 27, 2021 5 hours ago, trurl said: Nov 17 13:31:57 freesuper kernel: ------------[ cut here ]------------ Nov 17 13:31:57 freesuper kernel: WARNING: CPU: 3 PID: 28384 at net/netfilter/nf_conntrack_core.c:1120 __nf_conntrack_confirm+0x9b/0x1e6 [nf_conntrack] Nov 17 13:31:57 freesuper kernel: Modules linked in: macvlan xt_mark xt_nat veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat nf_tables vhost_net tun vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod iptable_mangle ipmi_devintf ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding igb i2c_algo_bit dm_mod dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ipmi_ssif crypto_simd cryptd glue_helper rapl intel_cstate isci i2c_i801 i2c_smbus aacraid intel_uncore libsas nvme ahci i2c_core wmi acpi_ipmi input_leds nvme_core led_class scsi_transport_sas ipmi_si libahci button [last unloaded: i2c_algo_bit] @trurl thanks i will change that and see if it works. Quote Link to comment
kfpersson Posted December 6, 2021 Author Share Posted December 6, 2021 Dec 3 07:08:42 freesuper kernel: TCP: request_sock_TCP: Possible SYN flooding on port 31302. Sending cookies. Check SNMP counters. Dec 5 00:00:01 freesuper root: mover: started Dec 5 00:00:02 freesuper root: mover: finished Dec 5 04:40:01 freesuper apcupsd[3558]: apcupsd exiting, signal 15 Dec 5 04:40:01 freesuper apcupsd[3558]: apcupsd shutdown succeeded Dec 5 04:40:04 freesuper apcupsd[12396]: apcupsd 3.14.14 (31 May 2016) slackware startup succeeded Dec 5 04:40:04 freesuper apcupsd[12396]: NIS server startup succeeded Dec 6 19:25:37 freesuper kernel: aacraid: Host adapter abort request. Dec 6 19:25:37 freesuper kernel: aacraid: Outstanding commands on (1,1,40,0): Dec 6 19:26:07 freesuper kernel: aacraid: Host adapter abort request. Dec 6 19:26:07 freesuper kernel: aacraid: Outstanding commands on (1,1,40,0): Dec 6 19:26:07 freesuper kernel: BUG: kernel NULL pointer dereference, address: 0000000000000010 Dec 6 19:26:07 freesuper kernel: #PF: supervisor read access in kernel mode Dec 6 19:26:07 freesuper kernel: #PF: error_code(0x0000) - not-present page Dec 6 19:26:07 freesuper kernel: PGD 0 P4D 0 Dec 6 19:26:07 freesuper kernel: Oops: 0000 [#1] SMP PTI Dec 6 19:26:07 freesuper kernel: CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.10.28-Unraid #1 Dec 6 19:26:07 freesuper kernel: Hardware name: Supermicro PIO-617R-TLN4F+-ST031/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.3 05/23/2018 Dec 6 19:26:07 freesuper kernel: RIP: 0010:intel_unmap_sg+0x14/0x68 Dec 6 19:26:07 freesuper kernel: Code: e2 4c 89 ee 5b 4c 89 f7 5d 41 5c 41 5d 41 5e 41 5f e9 ee a0 00 00 41 56 45 31 f6 41 55 41 89 d5 41 54 55 48 89 fd 48 89 f7 53 <4c> 8b 66 10 31 db 49 81 e4 00 f0 ff ff 45 39 ee 7d 28 48 8b 47 10 Dec 6 19:26:07 freesuper kernel: RSP: 0018:ffffc90006560e90 EFLAGS: 00010046 Dec 6 19:26:07 freesuper kernel: RAX: ffffffff8142e256 RBX: ffff8888a19743c0 RCX: 0000000000000002 Dec 6 19:26:07 freesuper kernel: RDX: 000000000000000c RSI: 0000000000000000 RDI: 0000000000000000 Dec 6 19:26:07 freesuper kernel: RBP: ffff88888444d0b8 R08: 0000000000000000 R09: 0000000000000000 Dec 6 19:26:07 freesuper kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8888a0d1ca20 Dec 6 19:26:07 freesuper kernel: R13: 000000000000000c R14: 0000000000000000 R15: 00000000000002b6 Dec 6 19:26:07 freesuper kernel: FS: 0000000000000000(0000) GS:ffff88885fb40000(0000) knlGS:0000000000000000 Dec 6 19:26:07 freesuper kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 6 19:26:07 freesuper kernel: CR2: 0000000000000010 CR3: 000000000200a003 CR4: 00000000000606e0 Dec 6 19:26:07 freesuper kernel: Call Trace: Dec 6 19:26:07 freesuper kernel: <IRQ> Dec 6 19:26:07 freesuper kernel: aac_srb_callback+0x67/0x30d [aacraid] Dec 6 19:26:07 freesuper kernel: aac_intr_normal+0x2dc/0x2ff [aacraid] Dec 6 19:26:07 freesuper kernel: aac_src_intr_message+0x321/0x35d [aacraid] Dec 6 19:26:07 freesuper kernel: __handle_irq_event_percpu+0x36/0xcb Dec 6 19:26:07 freesuper kernel: handle_irq_event_percpu+0x2c/0x6f Dec 6 19:26:07 freesuper kernel: handle_irq_event+0x34/0x51 Dec 6 19:26:07 freesuper kernel: handle_edge_irq+0xb0/0xd0 Dec 6 19:26:07 freesuper kernel: asm_call_irq_on_stack+0x12/0x20 Dec 6 19:26:07 freesuper kernel: </IRQ> Dec 6 19:26:07 freesuper kernel: common_interrupt+0xa5/0x12e Dec 6 19:26:07 freesuper kernel: asm_common_interrupt+0x1e/0x40 Dec 6 19:26:07 freesuper kernel: RIP: 0010:arch_local_irq_enable+0x4/0x8 Dec 6 19:26:07 freesuper kernel: Code: d4 39 18 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 66 66 90 66 90 c3 fa 66 66 90 66 66 90 c3 fb 66 66 90 <66> 66 90 c3 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 Dec 6 19:26:07 freesuper kernel: RSP: 0018:ffffc900000dbea0 EFLAGS: 00000246 Dec 6 19:26:07 freesuper kernel: RAX: ffff88885fb62380 RBX: 0000000000000004 RCX: 000000000000001f Dec 6 19:26:07 freesuper kernel: RDX: 0000000000000000 RSI: 000000003a2e8d5e RDI: 0000000000000000 Dec 6 19:26:07 freesuper kernel: RBP: ffffe8f7ff37fb00 R08: 0003065b6397d9f4 R09: 0003065679be8882 Dec 6 19:26:07 freesuper kernel: R10: 0000000000000772 R11: 071c71c71c71c71c R12: 0003065b6397d9f4 Dec 6 19:26:07 freesuper kernel: R13: ffffffff820c5dc0 R14: 0000000000000004 R15: 0000000000000000 Dec 6 19:26:07 freesuper kernel: cpuidle_enter_state+0x101/0x1c4 Dec 6 19:26:07 freesuper kernel: cpuidle_enter+0x25/0x31 Dec 6 19:26:07 freesuper kernel: do_idle+0x1a6/0x214 Dec 6 19:26:07 freesuper kernel: cpu_startup_entry+0x18/0x1a Dec 6 19:26:07 freesuper kernel: secondary_startup_64_no_verify+0xb0/0xbb Dec 6 19:26:07 freesuper kernel: Modules linked in: xt_mark xt_nat veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat nf_tables vhost_net tun vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod iptable_mangle ipmi_devintf ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding igb i2c_algo_bit sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl isci intel_cstate libsas nvme i2c_i801 aacraid i2c_smbus scsi_transport_sas wmi i2c_core intel_uncore input_leds nvme_core ahci acpi_ipmi led_class libahci ipmi_si button [last unloaded: i2c_algo_bit] Dec 6 19:26:07 freesuper kernel: CR2: 0000000000000010 Dec 6 19:26:07 freesuper kernel: ---[ end trace ed88c8db1c16757e ]--- Just got another crash. looks like the same error. Is it possible to see what docker container does this? Quote Link to comment
JorgeB Posted December 6, 2021 Share Posted December 6, 2021 6 minutes ago, kfpersson said: [aacraid] Not the same, this one was Adaptec controller related. Quote Link to comment
netsrot303 Posted December 29, 2021 Share Posted December 29, 2021 Hello, I also use a Supermicro board (X10SDV-TLN4F) with unraid 6.9.2 and also had this error three times. Do you have a solution ? How's your troubleshooting going @kfpersson? I read the error via the IPMI APP (Event Log). I write the log now on an external syslog server to see more accurate errors. Quote Link to comment
kfpersson Posted December 29, 2021 Author Share Posted December 29, 2021 Hi @netsrot303 ATM i have had 7 days without any problem. The posted problem that i had December 6, was something else. Not because the "Critical Stop" issue. But my solution so far for the "Critical Stop" issue was to: Update the BMC/BIOS Remove old plugins/docker And to follow this post (remove assaigned IP Addresses to containers) Otherwise, share your syslog. 2 Quote Link to comment
netsrot303 Posted December 31, 2021 Share Posted December 31, 2021 So, I have another "OS-critical-stop". I have attached the csv file from the syslog server. 1442_os_critical-stop.csv Quote Link to comment
tiwing Posted December 17, 2022 Share Posted December 17, 2022 bringing this topic back from the dead. Just started to get the exact same error 3 weeks ago. I get about 3 days before having to hard stop and restart - of course invoking a parity check. Not a great way to live. X9DR3-LN4+ in a 36 bay superstore. Did anyone figure this out?? I threw a new motherboard at it, but after reading this it seems might be more likely an issue with my Unraid install? 1 Quote Link to comment
trurl Posted December 18, 2022 Share Posted December 18, 2022 attach diagnostics to your NEXT post in this thread and setup syslog server Quote Link to comment
tiwing Posted December 19, 2022 Share Posted December 19, 2022 (edited) Hi, just experienced another one. diagnostics and syslog attached. Complete non-responsive happened at 4:00:26 Dec 19 and I powered off, and powered back on, at 6:45am Dec 19. There's nothing in syslog immediately prior. I did look at diagnostics but ... I don't know what to look for - would be awesome if some of you smart folks could say what kinds of things you go to first so I can help contribute back to the group rather than just use your brains... !! kscs-fvm2-diagnostics-20221219-0735.zip syslog-192.168.13.50.log Thanks much! tiwing edit: I just realized .... I set up the syslog server on the same box that keeps failing. Perhaps it going offline means lines were not recorded in the syslog properly. I'll spin up my backup box and leave it running .... and repoint the syslog to it. whoops??! Edited December 19, 2022 by tiwing spelling, edit comment Quote Link to comment
tiwing Posted January 3, 2023 Share Posted January 3, 2023 (edited) new syslog from the other server and diagnostics attached. I'm still having the server go completely offline every few days with the same machine logging from IPMI: syslog shows a gap Jan 3 between 5am and 8:30am. Nothing showing up in syslog. edit: is there any way to put v6.9 back on the box for testing purposes? I don't remember having this issue with 6.9.x... syslog-192.168.13.50-20230103_0928xx.log kscs-fvm2-diagnostics-20230103-0934.zip Edited January 3, 2023 by tiwing Q about rollback Quote Link to comment
JorgeB Posted January 3, 2023 Share Posted January 3, 2023 15 minutes ago, tiwing said: Nothing showing up in syslog. Jan 3 09:10:25 kscs-fvm2 kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Jan 3 09:10:25 kscs-fvm2 kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)) Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.