Jump to content

Unraid randomly stalling/crashing


opticon
Go to solution Solved by JorgeB,

Recommended Posts

Over the last 2-3 months I've had issues with unraid randomly stalling (web UI sorta works but very slow, docker containers stop responding) and complete crashes (

 

What I mean by stalling:

  • unraid mgmt web interface works but dashboard stats won't work
  • Restarting docker containers via web interface just spins around
  • SSH responds but hangs when lauching htop
  • It hangs when Initiating a clean reboot/shutdown

 

What i mean by crashes:

  • I've been seeing kernel panics show up in the syslog (but not every time it crashes)
  • See attached for a example

 

Once I reboot the server, sometimes it will crash again during startup or it will work fine for someones only an hour or 2-3 days.

 

I've tried shutting down all containers and only starting them up 1 by 1 each day but haven't been able to figure a cause. 

 

Memtest seems to be ok

I've uploaded my diagnostics if anyone can take a look and point me in the right direction?

nemesis-diagnostics-20231116-0840.zip kernel panic.txt

Edited by opticon
Link to comment
14 hours ago, JorgeB said:

The panic you posted if from a zfs pool, good idea to run memtest, also enable the syslog server to see if it catches more.

 

I swapped to ZFS pools about a month ago because BTFRS was giving me corruption issues because of the server crashes :(

 

I've had memtest freeze on my once (screenshot attached) but since I restarted it, it's completed 1 pass against all CPU cores with 0 errors. It's just over half way through the 2nd again.

 

I swapped the 2x RAM sticks approx 2 months ago when it 1st staretd crashing and it seemed to be fine for a month until it started again. I'm starting to think it's either the board or the CPU now (both are now 10yrs old and been running 24/7 for probably 8yrs)

PXL_20231116_120931772.jpg

Link to comment

I rebuilt the 2 ZFS SSD cache pools and this is the latest the latest kernel trace, this looks to be network hardware or driver related?

 

It seemed to happy 20mins after I re-installed my docker containers

 

Nov 17 22:27:57 Nemesis kernel: ------------[ cut here ]------------
Nov 17 22:27:57 Nemesis kernel: WARNING: CPU: 2 PID: 20471 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: Modules linked in: bluetooth ecdh_generic ecc wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_mark veth xt_nat macvlan nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net tun vhost vhost_iotlb tap md_mod reiserfs xfs tcp_diag inet_diag af_packet it87 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables 8021q garp mrp bridge stp llc bonding tls e1000e alx mdio zfs(PO) zunicode(PO) zzstd(O) zlua(O) intel_rapl_msr zavl(PO) intel_rapl_common icp(PO) x86_pkg_temp_thermal intel_powerclamp i915 coretemp zcommon(PO) kvm_intel znvpair(PO) spl(O) kvm iosf_mbi drm_buddy crct10dif_pclmul i2c_algo_bit crc32_pclmul ttm crc32c_intel ghash_clmulni_intel sha512_ssse3
Nov 17 22:27:57 Nemesis kernel: mei_hdcp mei_pxp drm_display_helper drm_kms_helper drm aesni_intel crypto_simd cryptd rapl intel_cstate intel_uncore i2c_i801 intel_gtt i2c_smbus ahci libahci mei_me agpgart cdc_acm i2c_core input_leds mei led_class syscopyarea sysfillrect sysimgblt fb_sys_fops video fan thermal wmi backlight acpi_pad button acpi_cpufreq unix [last unloaded: md_mod]
Nov 17 22:27:57 Nemesis kernel: CPU: 2 PID: 20471 Comm: Plex Transcoder Tainted: P           O       6.1.49-Unraid #1
Nov 17 22:27:57 Nemesis kernel: Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016
Nov 17 22:27:57 Nemesis kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
Nov 17 22:27:57 Nemesis kernel: RSP: 0000:ffffc90003fc3808 EFLAGS: 00010202
Nov 17 22:27:57 Nemesis kernel: RAX: 0000000000000001 RBX: ffff8882782b0400 RCX: 07fce6acd7c5d903
Nov 17 22:27:57 Nemesis kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8882782b0400
Nov 17 22:27:57 Nemesis kernel: RBP: 0000000000000001 R08: 479ac0e25f22a2fa R09: aaba1118c5017bd7
Nov 17 22:27:57 Nemesis kernel: R10: 22a15cc915a5bd30 R11: ffffc90003fc37d0 R12: ffffffff82a11d00
Nov 17 22:27:57 Nemesis kernel: R13: 000000000001ff9d R14: ffff888298b8f000 R15: 0000000000000000
Nov 17 22:27:57 Nemesis kernel: FS:  0000154c5353c808(0000) GS:ffff88840eb00000(0000) knlGS:0000000000000000
Nov 17 22:27:57 Nemesis kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 17 22:27:57 Nemesis kernel: CR2: 0000154c50285000 CR3: 000000026772c005 CR4: 00000000001706e0
Nov 17 22:27:57 Nemesis kernel: Call Trace:
Nov 17 22:27:57 Nemesis kernel: <TASK>
Nov 17 22:27:57 Nemesis kernel: ? __warn+0xab/0x122
Nov 17 22:27:57 Nemesis kernel: ? report_bug+0x109/0x17e
Nov 17 22:27:57 Nemesis kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: ? handle_bug+0x41/0x6f
Nov 17 22:27:57 Nemesis kernel: ? exc_invalid_op+0x13/0x60
Nov 17 22:27:57 Nemesis kernel: ? asm_exc_invalid_op+0x16/0x20
Nov 17 22:27:57 Nemesis kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: ? nf_nat_inet_fn+0x126/0x1a8 [nf_nat]
Nov 17 22:27:57 Nemesis kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: nf_hook_slow+0x3d/0x96
Nov 17 22:27:57 Nemesis kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 17 22:27:57 Nemesis kernel: NF_HOOK.constprop.0+0x79/0xd9
Nov 17 22:27:57 Nemesis kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 17 22:27:57 Nemesis kernel: ip_sabotage_in+0x52/0x60 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: nf_hook_slow+0x3d/0x96
Nov 17 22:27:57 Nemesis kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
Nov 17 22:27:57 Nemesis kernel: NF_HOOK.constprop.0+0x79/0xd9
Nov 17 22:27:57 Nemesis kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
Nov 17 22:27:57 Nemesis kernel: __netif_receive_skb_one_core+0x77/0x9c
Nov 17 22:27:57 Nemesis kernel: netif_receive_skb+0xbf/0x127
Nov 17 22:27:57 Nemesis kernel: br_handle_frame_finish+0x438/0x472 [bridge]
Nov 17 22:27:57 Nemesis kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Nov 17 22:27:57 Nemesis kernel: br_nf_hook_thresh+0xe5/0x109 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Nov 17 22:27:57 Nemesis kernel: br_nf_pre_routing_finish+0x2c1/0x2ec [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Nov 17 22:27:57 Nemesis kernel: ? NF_HOOK.isra.0+0xe4/0x140 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: br_nf_pre_routing+0x236/0x24a [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: br_handle_frame+0x27a/0x2e0 [bridge]
Nov 17 22:27:57 Nemesis kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Nov 17 22:27:57 Nemesis kernel: __netif_receive_skb_core.constprop.0+0x4fd/0x6e9
Nov 17 22:27:57 Nemesis kernel: ? __build_skb+0x20/0x4e
Nov 17 22:27:57 Nemesis kernel: ? kmem_cache_alloc+0x122/0x14d
Nov 17 22:27:57 Nemesis kernel: __netif_receive_skb_list_core+0x8a/0x11e
Nov 17 22:27:57 Nemesis kernel: netif_receive_skb_list_internal+0x1d2/0x20b
Nov 17 22:27:57 Nemesis kernel: gro_normal_list+0x1d/0x3f
Nov 17 22:27:57 Nemesis kernel: napi_complete_done+0x7b/0x11a
Nov 17 22:27:57 Nemesis kernel: e1000e_poll+0x9e/0x23e [e1000e]
Nov 17 22:27:57 Nemesis kernel: __napi_poll.constprop.0+0x2b/0x124
Nov 17 22:27:57 Nemesis kernel: net_rx_action+0x159/0x24f
Nov 17 22:27:57 Nemesis kernel: __do_softirq+0x129/0x288
Nov 17 22:27:57 Nemesis kernel: __irq_exit_rcu+0x5e/0xb8
Nov 17 22:27:57 Nemesis kernel: common_interrupt+0x3b/0xc1
Nov 17 22:27:57 Nemesis kernel: asm_common_interrupt+0x22/0x40
Nov 17 22:27:57 Nemesis kernel: RIP: 0033:0x154c5284d156
Nov 17 22:27:57 Nemesis kernel: Code: ff 8b 45 00 01 c0 f7 d0 41 89 c7 41 c1 ff 1f 41 31 c7 45 89 fd 48 8b 4c 24 30 41 d3 fd 48 83 c5 04 41 8d 45 01 3d ff ff ff 7f <48> 89 6c 24 18 0f 85 8f 00 00 00 bd 1e 00 00 80 4c 8b 6c 24 10 eb
Nov 17 22:27:57 Nemesis kernel: RSP: 002b:00007ffd82b98140 EFLAGS: 00000293
Nov 17 22:27:57 Nemesis kernel: RAX: 0000000000000001 RBX: 0000000000041fb8 RCX: 0000000000000009
Nov 17 22:27:57 Nemesis kernel: RDX: 0000000000000009 RSI: 4b76378fbaadbe88 RDI: 0000154c4f39d308
Nov 17 22:27:57 Nemesis kernel: RBP: 0000154c4f840560 R08: 0000154c4f39d320 R09: 0000000000000030
Nov 17 22:27:57 Nemesis kernel: R10: 12ecea4610d7f930 R11: 0000154c4f7aeb6c R12: 0000154c4f39d310
Nov 17 22:27:57 Nemesis kernel: R13: 0000000000000000 R14: 0000000000000030 R15: 00000000000000fa
Nov 17 22:27:57 Nemesis kernel: </TASK>
Nov 17 22:27:57 Nemesis kernel: ---[ end trace 0000000000000000 ]---

nemesis-diagnostics-20231118-0753.zip

Link to comment

Just swapped the RAM out again with 2x sticks that I've never used before and it crashes straight after booting

 

Nov 18 08:40:38 Nemesis kernel: ------------[ cut here ]------------
Nov 18 08:40:38 Nemesis kernel: WARNING: CPU: 0 PID: 23353 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: Modules linked in: bluetooth ecdh_generic ecc wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap macvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag af_packet it87 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables 8021q garp mrp bridge stp llc bonding tls e1000e alx mdio zfs(PO) zunicode(PO) zzstd(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal zlua(O) intel_powerclamp i915 zavl(PO) coretemp icp(PO) kvm_intel zcommon(PO) znvpair(PO) kvm spl(O) iosf_mbi crct10dif_pclmul crc32_pclmul drm_buddy i2c_algo_bit crc32c_intel ttm ghash_clmulni_intel sha512_ssse3 aesni_intel
Nov 18 08:40:38 Nemesis kernel: drm_display_helper crypto_simd cryptd drm_kms_helper rapl intel_cstate mei_hdcp mei_pxp drm intel_uncore ahci intel_gtt i2c_i801 i2c_smbus libahci mei_me agpgart mei input_leds i2c_core syscopyarea led_class cdc_acm sysfillrect sysimgblt fb_sys_fops video thermal fan wmi backlight acpi_pad button acpi_cpufreq unix [last unloaded: e1000e]
Nov 18 08:40:38 Nemesis kernel: CPU: 0 PID: 23353 Comm: kworker/u8:0 Tainted: P           O       6.1.49-Unraid #1
Nov 18 08:40:38 Nemesis kernel: Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016
Nov 18 08:40:38 Nemesis kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan]
Nov 18 08:40:38 Nemesis kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
Nov 18 08:40:38 Nemesis kernel: RSP: 0018:ffffc90000003d98 EFLAGS: 00010202
Nov 18 08:40:38 Nemesis kernel: RAX: 0000000000000001 RBX: ffff8881919d0500 RCX: 6c49c93b03265be6
Nov 18 08:40:38 Nemesis kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881919d0500
Nov 18 08:40:38 Nemesis kernel: RBP: 0000000000000001 R08: 7af53b18c069d652 R09: 4867bc0d02b3bdb8
Nov 18 08:40:38 Nemesis kernel: R10: 5e14506fc4f298cc R11: ffffc90000003d60 R12: ffffffff82a11d00
Nov 18 08:40:38 Nemesis kernel: R13: 00000000000393bb R14: ffff88818cec7f00 R15: 0000000000000000
Nov 18 08:40:38 Nemesis kernel: FS:  0000000000000000(0000) GS:ffff888216a00000(0000) knlGS:0000000000000000
Nov 18 08:40:38 Nemesis kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 18 08:40:38 Nemesis kernel: CR2: 000014cf98b99060 CR3: 00000001f3570005 CR4: 00000000001706f0
Nov 18 08:40:38 Nemesis kernel: Call Trace:
Nov 18 08:40:38 Nemesis kernel: <IRQ>
Nov 18 08:40:38 Nemesis kernel: ? __warn+0xab/0x122
Nov 18 08:40:38 Nemesis kernel: ? report_bug+0x109/0x17e
Nov 18 08:40:38 Nemesis kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: ? handle_bug+0x41/0x6f
Nov 18 08:40:38 Nemesis kernel: ? exc_invalid_op+0x13/0x60
Nov 18 08:40:38 Nemesis kernel: ? asm_exc_invalid_op+0x16/0x20
Nov 18 08:40:38 Nemesis kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: ? nf_nat_inet_fn+0x60/0x1a8 [nf_nat]
Nov 18 08:40:38 Nemesis kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: nf_hook_slow+0x3d/0x96
Nov 18 08:40:38 Nemesis kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 18 08:40:38 Nemesis kernel: NF_HOOK.constprop.0+0x79/0xd9
Nov 18 08:40:38 Nemesis kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 18 08:40:38 Nemesis kernel: __netif_receive_skb_one_core+0x77/0x9c
Nov 18 08:40:38 Nemesis kernel: process_backlog+0x8c/0x116
Nov 18 08:40:38 Nemesis kernel: __napi_poll.constprop.0+0x2b/0x124
Nov 18 08:40:38 Nemesis kernel: net_rx_action+0x159/0x24f
Nov 18 08:40:38 Nemesis kernel: __do_softirq+0x129/0x288
Nov 18 08:40:38 Nemesis kernel: do_softirq+0x7f/0xab
Nov 18 08:40:38 Nemesis kernel: </IRQ>
Nov 18 08:40:38 Nemesis kernel: <TASK>
Nov 18 08:40:38 Nemesis kernel: __local_bh_enable_ip+0x4c/0x6b
Nov 18 08:40:38 Nemesis kernel: netif_rx+0x52/0x5a
Nov 18 08:40:38 Nemesis kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Nov 18 08:40:38 Nemesis kernel: ? _raw_spin_unlock+0x14/0x29
Nov 18 08:40:38 Nemesis kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
Nov 18 08:40:38 Nemesis kernel: process_one_work+0x1ab/0x295
Nov 18 08:40:38 Nemesis kernel: worker_thread+0x18b/0x244
Nov 18 08:40:38 Nemesis kernel: ? rescuer_thread+0x281/0x281
Nov 18 08:40:38 Nemesis kernel: kthread+0xe7/0xef
Nov 18 08:40:38 Nemesis kernel: ? kthread_complete_and_exit+0x1b/0x1b
Nov 18 08:40:38 Nemesis kernel: ret_from_fork+0x22/0x30
Nov 18 08:40:38 Nemesis kernel: </TASK>
Nov 18 08:40:38 Nemesis kernel: ---[ end trace 0000000000000000 ]---

Link to comment
  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...