Unraid randomly stalling/crashing


opticon
Go to solution Solved by JorgeB,

Recommended Posts

Over the last 2-3 months I've had issues with unraid randomly stalling (web UI sorta works but very slow, docker containers stop responding) and complete crashes (

 

What I mean by stalling:

  • unraid mgmt web interface works but dashboard stats won't work
  • Restarting docker containers via web interface just spins around
  • SSH responds but hangs when lauching htop
  • It hangs when Initiating a clean reboot/shutdown

 

What i mean by crashes:

  • I've been seeing kernel panics show up in the syslog (but not every time it crashes)
  • See attached for a example

 

Once I reboot the server, sometimes it will crash again during startup or it will work fine for someones only an hour or 2-3 days.

 

I've tried shutting down all containers and only starting them up 1 by 1 each day but haven't been able to figure a cause. 

 

Memtest seems to be ok

I've uploaded my diagnostics if anyone can take a look and point me in the right direction?

nemesis-diagnostics-20231116-0840.zip kernel panic.txt

Edited by opticon
Link to comment
14 hours ago, JorgeB said:

The panic you posted if from a zfs pool, good idea to run memtest, also enable the syslog server to see if it catches more.

 

I swapped to ZFS pools about a month ago because BTFRS was giving me corruption issues because of the server crashes :(

 

I've had memtest freeze on my once (screenshot attached) but since I restarted it, it's completed 1 pass against all CPU cores with 0 errors. It's just over half way through the 2nd again.

 

I swapped the 2x RAM sticks approx 2 months ago when it 1st staretd crashing and it seemed to be fine for a month until it started again. I'm starting to think it's either the board or the CPU now (both are now 10yrs old and been running 24/7 for probably 8yrs)

PXL_20231116_120931772.jpg

Link to comment
  • Solution
10 hours ago, opticon said:

I swapped to ZFS pools about a month ago because BTFRS was giving me corruption issues because of the server crashes

If you are having issues with both btrfs and zfs it basically confirms to me that you have a hardware problem, if it's not RAM it could be CPU or board.

  • Thanks 1
Link to comment

I rebuilt the 2 ZFS SSD cache pools and this is the latest the latest kernel trace, this looks to be network hardware or driver related?

 

It seemed to happy 20mins after I re-installed my docker containers

 

Nov 17 22:27:57 Nemesis kernel: ------------[ cut here ]------------
Nov 17 22:27:57 Nemesis kernel: WARNING: CPU: 2 PID: 20471 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: Modules linked in: bluetooth ecdh_generic ecc wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_mark veth xt_nat macvlan nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net tun vhost vhost_iotlb tap md_mod reiserfs xfs tcp_diag inet_diag af_packet it87 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables 8021q garp mrp bridge stp llc bonding tls e1000e alx mdio zfs(PO) zunicode(PO) zzstd(O) zlua(O) intel_rapl_msr zavl(PO) intel_rapl_common icp(PO) x86_pkg_temp_thermal intel_powerclamp i915 coretemp zcommon(PO) kvm_intel znvpair(PO) spl(O) kvm iosf_mbi drm_buddy crct10dif_pclmul i2c_algo_bit crc32_pclmul ttm crc32c_intel ghash_clmulni_intel sha512_ssse3
Nov 17 22:27:57 Nemesis kernel: mei_hdcp mei_pxp drm_display_helper drm_kms_helper drm aesni_intel crypto_simd cryptd rapl intel_cstate intel_uncore i2c_i801 intel_gtt i2c_smbus ahci libahci mei_me agpgart cdc_acm i2c_core input_leds mei led_class syscopyarea sysfillrect sysimgblt fb_sys_fops video fan thermal wmi backlight acpi_pad button acpi_cpufreq unix [last unloaded: md_mod]
Nov 17 22:27:57 Nemesis kernel: CPU: 2 PID: 20471 Comm: Plex Transcoder Tainted: P           O       6.1.49-Unraid #1
Nov 17 22:27:57 Nemesis kernel: Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016
Nov 17 22:27:57 Nemesis kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
Nov 17 22:27:57 Nemesis kernel: RSP: 0000:ffffc90003fc3808 EFLAGS: 00010202
Nov 17 22:27:57 Nemesis kernel: RAX: 0000000000000001 RBX: ffff8882782b0400 RCX: 07fce6acd7c5d903
Nov 17 22:27:57 Nemesis kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8882782b0400
Nov 17 22:27:57 Nemesis kernel: RBP: 0000000000000001 R08: 479ac0e25f22a2fa R09: aaba1118c5017bd7
Nov 17 22:27:57 Nemesis kernel: R10: 22a15cc915a5bd30 R11: ffffc90003fc37d0 R12: ffffffff82a11d00
Nov 17 22:27:57 Nemesis kernel: R13: 000000000001ff9d R14: ffff888298b8f000 R15: 0000000000000000
Nov 17 22:27:57 Nemesis kernel: FS:  0000154c5353c808(0000) GS:ffff88840eb00000(0000) knlGS:0000000000000000
Nov 17 22:27:57 Nemesis kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 17 22:27:57 Nemesis kernel: CR2: 0000154c50285000 CR3: 000000026772c005 CR4: 00000000001706e0
Nov 17 22:27:57 Nemesis kernel: Call Trace:
Nov 17 22:27:57 Nemesis kernel: <TASK>
Nov 17 22:27:57 Nemesis kernel: ? __warn+0xab/0x122
Nov 17 22:27:57 Nemesis kernel: ? report_bug+0x109/0x17e
Nov 17 22:27:57 Nemesis kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: ? handle_bug+0x41/0x6f
Nov 17 22:27:57 Nemesis kernel: ? exc_invalid_op+0x13/0x60
Nov 17 22:27:57 Nemesis kernel: ? asm_exc_invalid_op+0x16/0x20
Nov 17 22:27:57 Nemesis kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: ? nf_nat_inet_fn+0x126/0x1a8 [nf_nat]
Nov 17 22:27:57 Nemesis kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Nov 17 22:27:57 Nemesis kernel: nf_hook_slow+0x3d/0x96
Nov 17 22:27:57 Nemesis kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 17 22:27:57 Nemesis kernel: NF_HOOK.constprop.0+0x79/0xd9
Nov 17 22:27:57 Nemesis kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 17 22:27:57 Nemesis kernel: ip_sabotage_in+0x52/0x60 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: nf_hook_slow+0x3d/0x96
Nov 17 22:27:57 Nemesis kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
Nov 17 22:27:57 Nemesis kernel: NF_HOOK.constprop.0+0x79/0xd9
Nov 17 22:27:57 Nemesis kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8
Nov 17 22:27:57 Nemesis kernel: __netif_receive_skb_one_core+0x77/0x9c
Nov 17 22:27:57 Nemesis kernel: netif_receive_skb+0xbf/0x127
Nov 17 22:27:57 Nemesis kernel: br_handle_frame_finish+0x438/0x472 [bridge]
Nov 17 22:27:57 Nemesis kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Nov 17 22:27:57 Nemesis kernel: br_nf_hook_thresh+0xe5/0x109 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Nov 17 22:27:57 Nemesis kernel: br_nf_pre_routing_finish+0x2c1/0x2ec [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Nov 17 22:27:57 Nemesis kernel: ? NF_HOOK.isra.0+0xe4/0x140 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: br_nf_pre_routing+0x236/0x24a [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
Nov 17 22:27:57 Nemesis kernel: br_handle_frame+0x27a/0x2e0 [bridge]
Nov 17 22:27:57 Nemesis kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
Nov 17 22:27:57 Nemesis kernel: __netif_receive_skb_core.constprop.0+0x4fd/0x6e9
Nov 17 22:27:57 Nemesis kernel: ? __build_skb+0x20/0x4e
Nov 17 22:27:57 Nemesis kernel: ? kmem_cache_alloc+0x122/0x14d
Nov 17 22:27:57 Nemesis kernel: __netif_receive_skb_list_core+0x8a/0x11e
Nov 17 22:27:57 Nemesis kernel: netif_receive_skb_list_internal+0x1d2/0x20b
Nov 17 22:27:57 Nemesis kernel: gro_normal_list+0x1d/0x3f
Nov 17 22:27:57 Nemesis kernel: napi_complete_done+0x7b/0x11a
Nov 17 22:27:57 Nemesis kernel: e1000e_poll+0x9e/0x23e [e1000e]
Nov 17 22:27:57 Nemesis kernel: __napi_poll.constprop.0+0x2b/0x124
Nov 17 22:27:57 Nemesis kernel: net_rx_action+0x159/0x24f
Nov 17 22:27:57 Nemesis kernel: __do_softirq+0x129/0x288
Nov 17 22:27:57 Nemesis kernel: __irq_exit_rcu+0x5e/0xb8
Nov 17 22:27:57 Nemesis kernel: common_interrupt+0x3b/0xc1
Nov 17 22:27:57 Nemesis kernel: asm_common_interrupt+0x22/0x40
Nov 17 22:27:57 Nemesis kernel: RIP: 0033:0x154c5284d156
Nov 17 22:27:57 Nemesis kernel: Code: ff 8b 45 00 01 c0 f7 d0 41 89 c7 41 c1 ff 1f 41 31 c7 45 89 fd 48 8b 4c 24 30 41 d3 fd 48 83 c5 04 41 8d 45 01 3d ff ff ff 7f <48> 89 6c 24 18 0f 85 8f 00 00 00 bd 1e 00 00 80 4c 8b 6c 24 10 eb
Nov 17 22:27:57 Nemesis kernel: RSP: 002b:00007ffd82b98140 EFLAGS: 00000293
Nov 17 22:27:57 Nemesis kernel: RAX: 0000000000000001 RBX: 0000000000041fb8 RCX: 0000000000000009
Nov 17 22:27:57 Nemesis kernel: RDX: 0000000000000009 RSI: 4b76378fbaadbe88 RDI: 0000154c4f39d308
Nov 17 22:27:57 Nemesis kernel: RBP: 0000154c4f840560 R08: 0000154c4f39d320 R09: 0000000000000030
Nov 17 22:27:57 Nemesis kernel: R10: 12ecea4610d7f930 R11: 0000154c4f7aeb6c R12: 0000154c4f39d310
Nov 17 22:27:57 Nemesis kernel: R13: 0000000000000000 R14: 0000000000000030 R15: 00000000000000fa
Nov 17 22:27:57 Nemesis kernel: </TASK>
Nov 17 22:27:57 Nemesis kernel: ---[ end trace 0000000000000000 ]---

nemesis-diagnostics-20231118-0753.zip

Link to comment

Just swapped the RAM out again with 2x sticks that I've never used before and it crashes straight after booting

 

Nov 18 08:40:38 Nemesis kernel: ------------[ cut here ]------------
Nov 18 08:40:38 Nemesis kernel: WARNING: CPU: 0 PID: 23353 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: Modules linked in: bluetooth ecdh_generic ecc wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap macvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod tcp_diag inet_diag af_packet it87 hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables 8021q garp mrp bridge stp llc bonding tls e1000e alx mdio zfs(PO) zunicode(PO) zzstd(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal zlua(O) intel_powerclamp i915 zavl(PO) coretemp icp(PO) kvm_intel zcommon(PO) znvpair(PO) kvm spl(O) iosf_mbi crct10dif_pclmul crc32_pclmul drm_buddy i2c_algo_bit crc32c_intel ttm ghash_clmulni_intel sha512_ssse3 aesni_intel
Nov 18 08:40:38 Nemesis kernel: drm_display_helper crypto_simd cryptd drm_kms_helper rapl intel_cstate mei_hdcp mei_pxp drm intel_uncore ahci intel_gtt i2c_i801 i2c_smbus libahci mei_me agpgart mei input_leds i2c_core syscopyarea led_class cdc_acm sysfillrect sysimgblt fb_sys_fops video thermal fan wmi backlight acpi_pad button acpi_cpufreq unix [last unloaded: e1000e]
Nov 18 08:40:38 Nemesis kernel: CPU: 0 PID: 23353 Comm: kworker/u8:0 Tainted: P           O       6.1.49-Unraid #1
Nov 18 08:40:38 Nemesis kernel: Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016
Nov 18 08:40:38 Nemesis kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan]
Nov 18 08:40:38 Nemesis kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
Nov 18 08:40:38 Nemesis kernel: RSP: 0018:ffffc90000003d98 EFLAGS: 00010202
Nov 18 08:40:38 Nemesis kernel: RAX: 0000000000000001 RBX: ffff8881919d0500 RCX: 6c49c93b03265be6
Nov 18 08:40:38 Nemesis kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881919d0500
Nov 18 08:40:38 Nemesis kernel: RBP: 0000000000000001 R08: 7af53b18c069d652 R09: 4867bc0d02b3bdb8
Nov 18 08:40:38 Nemesis kernel: R10: 5e14506fc4f298cc R11: ffffc90000003d60 R12: ffffffff82a11d00
Nov 18 08:40:38 Nemesis kernel: R13: 00000000000393bb R14: ffff88818cec7f00 R15: 0000000000000000
Nov 18 08:40:38 Nemesis kernel: FS:  0000000000000000(0000) GS:ffff888216a00000(0000) knlGS:0000000000000000
Nov 18 08:40:38 Nemesis kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 18 08:40:38 Nemesis kernel: CR2: 000014cf98b99060 CR3: 00000001f3570005 CR4: 00000000001706f0
Nov 18 08:40:38 Nemesis kernel: Call Trace:
Nov 18 08:40:38 Nemesis kernel: <IRQ>
Nov 18 08:40:38 Nemesis kernel: ? __warn+0xab/0x122
Nov 18 08:40:38 Nemesis kernel: ? report_bug+0x109/0x17e
Nov 18 08:40:38 Nemesis kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: ? handle_bug+0x41/0x6f
Nov 18 08:40:38 Nemesis kernel: ? exc_invalid_op+0x13/0x60
Nov 18 08:40:38 Nemesis kernel: ? asm_exc_invalid_op+0x16/0x20
Nov 18 08:40:38 Nemesis kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: ? nf_nat_inet_fn+0x60/0x1a8 [nf_nat]
Nov 18 08:40:38 Nemesis kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Nov 18 08:40:38 Nemesis kernel: nf_hook_slow+0x3d/0x96
Nov 18 08:40:38 Nemesis kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 18 08:40:38 Nemesis kernel: NF_HOOK.constprop.0+0x79/0xd9
Nov 18 08:40:38 Nemesis kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Nov 18 08:40:38 Nemesis kernel: __netif_receive_skb_one_core+0x77/0x9c
Nov 18 08:40:38 Nemesis kernel: process_backlog+0x8c/0x116
Nov 18 08:40:38 Nemesis kernel: __napi_poll.constprop.0+0x2b/0x124
Nov 18 08:40:38 Nemesis kernel: net_rx_action+0x159/0x24f
Nov 18 08:40:38 Nemesis kernel: __do_softirq+0x129/0x288
Nov 18 08:40:38 Nemesis kernel: do_softirq+0x7f/0xab
Nov 18 08:40:38 Nemesis kernel: </IRQ>
Nov 18 08:40:38 Nemesis kernel: <TASK>
Nov 18 08:40:38 Nemesis kernel: __local_bh_enable_ip+0x4c/0x6b
Nov 18 08:40:38 Nemesis kernel: netif_rx+0x52/0x5a
Nov 18 08:40:38 Nemesis kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Nov 18 08:40:38 Nemesis kernel: ? _raw_spin_unlock+0x14/0x29
Nov 18 08:40:38 Nemesis kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
Nov 18 08:40:38 Nemesis kernel: process_one_work+0x1ab/0x295
Nov 18 08:40:38 Nemesis kernel: worker_thread+0x18b/0x244
Nov 18 08:40:38 Nemesis kernel: ? rescuer_thread+0x281/0x281
Nov 18 08:40:38 Nemesis kernel: kthread+0xe7/0xef
Nov 18 08:40:38 Nemesis kernel: ? kthread_complete_and_exit+0x1b/0x1b
Nov 18 08:40:38 Nemesis kernel: ret_from_fork+0x22/0x30
Nov 18 08:40:38 Nemesis kernel: </TASK>
Nov 18 08:40:38 Nemesis kernel: ---[ end trace 0000000000000000 ]---

Link to comment
  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.