Airwu Posted January 31 Share Posted January 31 My Unraid server has been unexpectedly rebooting over the past week. I unsure about the cause of the problem and how to fix it. Please help me. tower-diagnostics-20240131-2256.zip Quote Link to comment
itimpi Posted January 31 Share Posted January 31 Unexpected reboots are normally hardware related (e.g. PSU or CPU overheating). The syslog in the diagnostics is the RAM version that starts afresh every time the system is booted. You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what leads up to a crash. The mirror to flash option is the easiest to set up, but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field. Quote Link to comment
Airwu Posted February 1 Author Share Posted February 1 I has been set it up. I'll update it with the next unexpectedly reboot. THX Quote Link to comment
Airwu Posted February 5 Author Share Posted February 5 I found this in syslog Quote Feb 5 04:30:25 Tower kernel: ------------[ cut here ]------------ Feb 5 04:30:25 Tower kernel: WARNING: CPU: 3 PID: 0 at net/netfilter/nf_nat_core.c:594 nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Feb 5 04:30:25 Tower kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle iptable_mangle vhost_net tun vhost vhost_iotlb tap af_packet xt_nat xt_tcpudp macvlan xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype br_netfilter xfs ip6table_nat md_mod tcp_diag inet_diag iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp bridge stp llc i915 zfs(PO) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) iosf_mbi drm_buddy coretemp i2c_algo_bit ttm zzstd(O) kvm_intel drm_display_helper drm_kms_helper zlua(O) kvm zavl(PO) icp(PO) drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO) intel_gtt crypto_simd spl(O) Feb 5 04:30:25 Tower kernel: mei_hdcp mei_pxp wmi_bmof cryptd rapl intel_cstate intel_uncore tpm_crb ixgbe agpgart nvme i2c_i801 r8169 tpm_tis mei_me ahci syscopyarea i2c_smbus sysfillrect xfrm_algo video i2c_core realtek mdio tpm_tis_core sysimgblt libahci mei nvme_core vmd thermal fb_sys_fops fan wmi tpm backlight intel_pmc_core acpi_pad acpi_tad button unix Feb 5 04:30:25 Tower kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: P W O 6.1.64-Unraid #1 Feb 5 04:30:25 Tower kernel: Hardware name: Default string Default string/MS-Terminator B660M, BIOS H3.41G 04/29/2022 Feb 5 04:30:25 Tower kernel: RIP: 0010:nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Feb 5 04:30:25 Tower kernel: Code: a8 80 75 26 48 8d 73 58 48 8d 7c 24 20 e8 18 cb 34 00 48 8d 43 0c 4c 8b bb 88 00 00 00 48 89 44 24 18 eb 54 0f ba e0 08 73 07 <0f> 0b e9 75 06 00 00 48 8d 73 58 48 8d 7c 24 20 e8 eb ca 34 00 48 Feb 5 04:30:25 Tower kernel: RSP: 0018:ffffc9000025c718 EFLAGS: 00010282 Feb 5 04:30:25 Tower kernel: RAX: 0000000000000180 RBX: ffff88853c2ac200 RCX: ffff888108562d40 Feb 5 04:30:25 Tower kernel: RDX: 0000000000000000 RSI: ffffc9000025c7fc RDI: ffff88853c2ac200 Feb 5 04:30:25 Tower kernel: RBP: ffffc9000025c7e0 R08: 000000000809080a R09: 0000000000000000 Feb 5 04:30:25 Tower kernel: R10: 0000000000000158 R11: 0000000000000000 R12: ffffc9000025c7fc Feb 5 04:30:25 Tower kernel: R13: 0000000000000000 R14: ffffc9000025c8d8 R15: 0000000000000001 Feb 5 04:30:25 Tower kernel: FS: 0000000000000000(0000) GS:ffff888c4f4c0000(0000) knlGS:0000000000000000 Feb 5 04:30:25 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 5 04:30:25 Tower kernel: CR2: 000055e285ea6020 CR3: 000000000420a000 CR4: 0000000000752ee0 Feb 5 04:30:25 Tower kernel: PKRU: 55555554 Feb 5 04:30:25 Tower kernel: Call Trace: Feb 5 04:30:25 Tower kernel: <IRQ> Feb 5 04:30:25 Tower kernel: ? __warn+0xab/0x122 Feb 5 04:30:25 Tower kernel: ? report_bug+0x109/0x17e Feb 5 04:30:25 Tower kernel: ? nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Feb 5 04:30:25 Tower kernel: ? handle_bug+0x41/0x6f Feb 5 04:30:25 Tower kernel: ? exc_invalid_op+0x13/0x60 Feb 5 04:30:25 Tower kernel: ? asm_exc_invalid_op+0x16/0x20 Feb 5 04:30:25 Tower kernel: ? nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Feb 5 04:30:25 Tower kernel: ? nf_nat_setup_info+0x44/0x7d1 [nf_nat] Feb 5 04:30:25 Tower kernel: ? xt_write_recseq_end+0xf/0x1c [ip_tables] Feb 5 04:30:25 Tower kernel: ? __local_bh_enable_ip+0x56/0x6b Feb 5 04:30:25 Tower kernel: ? ipt_do_table+0x575/0x5ba [ip_tables] Feb 5 04:30:25 Tower kernel: ? xt_write_recseq_end+0xf/0x1c [ip_tables] Feb 5 04:30:25 Tower kernel: ? __local_bh_enable_ip+0x56/0x6b Feb 5 04:30:25 Tower kernel: __nf_nat_alloc_null_binding+0x66/0x81 [nf_nat] Feb 5 04:30:25 Tower kernel: nf_nat_inet_fn+0xc0/0x1a8 [nf_nat] Feb 5 04:30:25 Tower kernel: nf_nat_ipv4_local_in+0x2a/0xaa [nf_nat] Feb 5 04:30:25 Tower kernel: nf_hook_slow+0x3a/0x96 Feb 5 04:30:25 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Feb 5 04:30:25 Tower kernel: NF_HOOK.constprop.0+0x79/0xd9 Feb 5 04:30:25 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Feb 5 04:30:25 Tower kernel: ip_sabotage_in+0x4f/0x60 [br_netfilter] Feb 5 04:30:25 Tower kernel: nf_hook_slow+0x3a/0x96 Feb 5 04:30:25 Tower kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8 Feb 5 04:30:25 Tower kernel: NF_HOOK.constprop.0+0x79/0xd9 Feb 5 04:30:25 Tower kernel: ? ip_rcv_finish_core.constprop.0+0x3e8/0x3e8 Feb 5 04:30:25 Tower kernel: __netif_receive_skb_one_core+0x77/0x9c Feb 5 04:30:25 Tower kernel: netif_receive_skb+0xbf/0x127 Feb 5 04:30:25 Tower kernel: br_handle_frame_finish+0x43a/0x474 [bridge] Feb 5 04:30:25 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge] Feb 5 04:30:25 Tower kernel: br_nf_hook_thresh+0xe2/0x109 [br_netfilter] Feb 5 04:30:25 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge] Feb 5 04:30:25 Tower kernel: br_nf_pre_routing_finish+0x2c1/0x2ec [br_netfilter] Feb 5 04:30:25 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge] Feb 5 04:30:25 Tower kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter] Feb 5 04:30:25 Tower kernel: br_nf_pre_routing+0x236/0x24a [br_netfilter] Feb 5 04:30:25 Tower kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter] Feb 5 04:30:25 Tower kernel: br_handle_frame+0x277/0x2e0 [bridge] Feb 5 04:30:25 Tower kernel: ? br_pass_frame_up+0xdd/0xdd [bridge] Feb 5 04:30:25 Tower kernel: __netif_receive_skb_core.constprop.0+0x4fa/0x6e9 Feb 5 04:30:25 Tower kernel: __netif_receive_skb_list_core+0x8a/0x11e Feb 5 04:30:25 Tower kernel: netif_receive_skb_list_internal+0x1d2/0x20b Feb 5 04:30:25 Tower kernel: gro_normal_list+0x1d/0x3f Feb 5 04:30:25 Tower kernel: napi_complete_done+0x7b/0x11a Feb 5 04:30:25 Tower kernel: ixgbe_poll+0xdb6/0xe7d [ixgbe] Feb 5 04:30:25 Tower kernel: __napi_poll.constprop.0+0x28/0x124 Feb 5 04:30:25 Tower kernel: net_rx_action+0x159/0x24f Feb 5 04:30:25 Tower kernel: __do_softirq+0x126/0x288 Feb 5 04:30:25 Tower kernel: __irq_exit_rcu+0x5e/0xb8 Feb 5 04:30:25 Tower kernel: common_interrupt+0x9b/0xc1 Feb 5 04:30:25 Tower kernel: </IRQ> Feb 5 04:30:25 Tower kernel: <TASK> Feb 5 04:30:25 Tower kernel: asm_common_interrupt+0x22/0x40 Feb 5 04:30:25 Tower kernel: RIP: 0010:cpuidle_enter_state+0x11d/0x202 Feb 5 04:30:25 Tower kernel: Code: 2b ff 9f ff 45 84 ff 74 1b 9c 58 0f 1f 40 00 0f ba e0 09 73 08 0f 0b fa 0f 1f 44 00 00 31 ff e8 a3 c0 a4 ff fb 0f 1f 44 00 00 <45> 85 e4 0f 88 ba 00 00 00 48 8b 04 24 49 63 cc 48 6b d1 68 49 29 Feb 5 04:30:25 Tower kernel: RSP: 0018:ffffc90000177e98 EFLAGS: 00000246 Feb 5 04:30:25 Tower kernel: RAX: ffff888c4f4c0000 RBX: ffff888c4f4f6400 RCX: 0000000000000000 Feb 5 04:30:25 Tower kernel: RDX: 00014dbbc1f1c880 RSI: ffffffff820d7e01 RDI: ffffffff820d830a Feb 5 04:30:25 Tower kernel: RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 Feb 5 04:30:25 Tower kernel: R10: 0000000000000020 R11: 000000000000135e R12: 0000000000000004 Feb 5 04:30:25 Tower kernel: R13: ffffffff82320640 R14: 00014dbbc1f1c880 R15: 0000000000000000 Feb 5 04:30:25 Tower kernel: ? cpuidle_enter_state+0xf7/0x202 Feb 5 04:30:25 Tower kernel: cpuidle_enter+0x2a/0x38 Feb 5 04:30:25 Tower kernel: do_idle+0x18d/0x1fb Feb 5 04:30:25 Tower kernel: cpu_startup_entry+0x2a/0x2c Feb 5 04:30:25 Tower kernel: start_secondary+0x101/0x101 Feb 5 04:30:25 Tower kernel: secondary_startup_64_no_verify+0xce/0xdb Feb 5 04:30:25 Tower kernel: </TASK> Feb 5 04:30:25 Tower kernel: ---[ end trace 0000000000000000 ]--- Can you help me, thank you Quote Link to comment
JorgeB Posted February 5 Share Posted February 5 Try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot. Quote Link to comment
Airwu Posted February 6 Author Share Posted February 6 I has been set it up. I'll update it with the next unexpectedly reboot. THX Quote Link to comment
Airwu Posted February 9 Author Share Posted February 9 I have new unexpectedly😂 Feb 9 00:12:47 Tower kernel: VERIFY3(mutex_owner(&buf->b_evict_lock) == NULL) failed (ffff888353ee9d90 == 0000000000000000) Feb 9 00:12:47 Tower kernel: PANIC at arc.c:1228:buf_dest() Feb 9 00:12:47 Tower kernel: Showing stack for process 953 Feb 9 00:12:47 Tower kernel: CPU: 0 PID: 953 Comm: dbuf_evict Tainted: P W O 6.1.64-Unraid #1 Feb 9 00:12:47 Tower kernel: Hardware name: Default string Default string/MS-Terminator B660M, BIOS H3.41G 04/29/2022 Feb 9 00:12:47 Tower kernel: Call Trace: Feb 9 00:12:47 Tower kernel: <TASK> Feb 9 00:12:47 Tower kernel: dump_stack_lvl+0x44/0x5c Feb 9 00:12:47 Tower kernel: spl_panic+0xd0/0xe8 [spl] Feb 9 00:12:47 Tower kernel: ? __slab_free+0x83/0x229 Feb 9 00:12:47 Tower kernel: ? spl_kmem_cache_destroy+0x14a/0x1b7 [spl] Feb 9 00:12:47 Tower kernel: ? _raw_spin_lock+0x13/0x1c Feb 9 00:12:47 Tower kernel: buf_dest+0x30/0x3f [zfs] Feb 9 00:12:47 Tower kernel: spl_kmem_cache_free+0x25/0x1a5 [spl] Feb 9 00:12:47 Tower kernel: arc_buf_destroy+0xb2/0xe2 [zfs] Feb 9 00:12:47 Tower kernel: ? __thread_exit+0x13/0x13 [spl] Feb 9 00:12:47 Tower kernel: dbuf_destroy+0x30/0x3b8 [zfs] Feb 9 00:12:47 Tower kernel: ? dbuf_evict_one+0x11c/0x11c [zfs] Feb 9 00:12:47 Tower kernel: ? __thread_exit+0x13/0x13 [spl] Feb 9 00:12:47 Tower kernel: dbuf_evict_one+0xff/0x11c [zfs] Feb 9 00:12:47 Tower kernel: dbuf_evict_thread+0xbb/0x119 [zfs] Feb 9 00:12:47 Tower kernel: thread_generic_wrapper+0x57/0x65 [spl] Feb 9 00:12:47 Tower kernel: kthread+0xe4/0xef Feb 9 00:12:47 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b Feb 9 00:12:47 Tower kernel: ret_from_fork+0x1f/0x30 Feb 9 00:12:47 Tower kernel: </TASK> Quote Link to comment
Airwu Posted February 9 Author Share Posted February 9 6 minutes ago, trurl said: Post new diagnostics I'm going out and have to go home in 7 days, I can’t connect to my server. I'll update it after 7 days. Thank you. Quote Link to comment
Airwu Posted February 9 Author Share Posted February 9 BTW, I can’t login to web management page, but I can login to ssh. Is there any way I can run diagnostics in SSH? Quote Link to comment
trurl Posted February 9 Share Posted February 9 Click the diagnostics link in our posts. Quote Link to comment
JorgeB Posted February 9 Share Posted February 9 That call trace shows a zfs filesystem crashing, likely due to filesystem corruption. Quote Link to comment
Airwu Posted February 9 Author Share Posted February 9 3 hours ago, JorgeB said: That call trace shows a zfs filesystem crashing, likely due to filesystem corruption. Last check completed on Friday, 2024-02-09, 19:14 (today) No error How I can check the filesystem? Quote Link to comment
JorgeB Posted February 9 Share Posted February 9 Zfs doesn't have an fsck, you'd need to backup and recreate the pool, if it keeps happening it could indicate an underlying hardware issue, like bad RAM. Quote Link to comment
trurl Posted February 9 Share Posted February 9 3 hours ago, JorgeB said: like bad RAM You should do memtest before doing anything else, just in case. Bad RAM is a very serious issue. Quote Link to comment
Airwu Posted February 10 Author Share Posted February 10 zpool status -v show there are 3 errors, after I run scrub there still have 1 error: errors: Permanent errors have been detected in the following files: /mnt/disk1/backup/ios/00008120-001135C021EB401E/7c/7cf081b7fe531b449dc5827f985bdddf11cd996a This file can delete, I’ll try to delete it. When I go home, I’ll try memtest . Quote Link to comment
Airwu Posted February 10 Author Share Posted February 10 After I delete /mnt/disk1/backup/ios/00008120-001135C021EB401E/7c/7cf081b7fe531b449dc5827f985bdddf11cd996a , zpool still shows: errors: Permanent errors have been detected in the following files: disk1/backup:<0x22dc2> 😂 Quote Link to comment
JorgeB Posted February 10 Share Posted February 10 That can mean an error on an snapshot or metadata, if it's metadata you will need to recreate the pool, recommend running memtest before. Quote Link to comment
Airwu Posted February 10 Author Share Posted February 10 Thank you, I will try it later Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.