shushi1010 Posted October 8, 2023 Share Posted October 8, 2023 (edited) 大概从2个月前开始出现这个问题,一直找不到解决办法,随机崩溃死机,无法通过IP访问,必须强制关机重启才可以恢复。 syslog overse-diagnostics-20231010-0839.zip Edited October 10, 2023 by shushi1010 Quote Link to comment
JackieWu Posted October 8, 2023 Share Posted October 8, 2023 (edited) 你的日志里面出现过 page fault 错误,这个错误跟内存有关: Sep 27 05:51:16 Overse kernel: BUG: unable to handle page fault for address: ffffffff10d96206 Sep 27 05:51:16 Overse kernel: #PF: supervisor write access in kernel mode Sep 27 05:51:16 Overse kernel: #PF: error_code(0x0002) - not-present page Sep 27 05:51:16 Overse kernel: PGD 420e067 P4D 420e067 PUD 0 Sep 27 05:51:16 Overse kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI Sep 27 05:51:16 Overse kernel: CPU: 6 PID: 160 Comm: kswapd0 Tainted: P U W O 6.1.49-Unraid #1 Sep 27 05:51:16 Overse kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C82/MAG B460M MORTAR WIFI (MS-7C82), BIOS 1.10 05/18/2020 Sep 27 05:51:16 Overse kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x152/0x1cf Sep 27 05:51:16 Overse kernel: Code: b9 01 00 00 00 f0 0f b1 0b 74 76 eb cc c1 ee 12 83 e0 03 ff ce 48 c1 e0 05 48 63 f6 48 05 80 e1 02 00 48 03 04 f5 c0 ea 17 82 <48> 89 10 8b 42 08 85 c0 75 04 f3 90 eb f5 48 8b 32 48 85 f6 74 bc Sep 27 05:51:16 Overse kernel: RSP: 0018:ffffc9000069faf0 EFLAGS: 00010286 Sep 27 05:51:16 Overse kernel: RAX: ffffffff10d96206 RBX: ffff888157b4aee8 RCX: 00000000001c0000 Sep 27 05:51:16 Overse kernel: RDX: ffff88901f3ae180 RSI: 0000000000003148 RDI: ffff888157b4aee8 Sep 27 05:51:16 Overse kernel: RBP: 0000000000000006 R08: 0000000000000000 R09: 000000000000018f Sep 27 05:51:16 Overse kernel: R10: ffff88868948b800 R11: 0000000000000000 R12: ffff88901f3ae180 Sep 27 05:51:16 Overse kernel: R13: 0000000000000000 R14: ffff888157b4ae90 R15: 0000000000000000 Sep 27 05:51:16 Overse kernel: FS: 0000000000000000(0000) GS:ffff88901f380000(0000) knlGS:0000000000000000 Sep 27 05:51:16 Overse kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 27 05:51:16 Overse kernel: CR2: ffffffff10d96206 CR3: 0000000154ffe003 CR4: 00000000007706e0 Sep 27 05:51:16 Overse kernel: PKRU: 55555554 Sep 27 05:51:16 Overse kernel: Call Trace: Sep 27 05:51:16 Overse kernel: <TASK> Sep 27 05:51:16 Overse kernel: ? __die_body+0x1a/0x5c Sep 27 05:51:16 Overse kernel: ? page_fault_oops+0x329/0x376 Sep 27 05:51:16 Overse kernel: ? fixup_exception+0x22/0x24b Sep 27 05:51:16 Overse kernel: ? exc_page_fault+0xf4/0x11d Sep 27 05:51:16 Overse kernel: ? asm_exc_page_fault+0x22/0x30 Sep 27 05:51:16 Overse kernel: ? native_queued_spin_lock_slowpath+0x152/0x1cf Sep 27 05:51:16 Overse kernel: do_raw_spin_lock+0x14/0x1a Sep 27 05:51:16 Overse kernel: shrink_lock_dentry+0xa1/0xea Sep 27 05:51:16 Overse kernel: shrink_dentry_list+0x3d/0xba Sep 27 05:51:16 Overse kernel: prune_dcache_sb+0x51/0x73 Sep 27 05:51:16 Overse kernel: super_cache_scan+0xf4/0x17c Sep 27 05:51:16 Overse kernel: do_shrink_slab+0x188/0x2a1 Sep 27 05:51:16 Overse kernel: shrink_slab+0x1f9/0x267 Sep 27 05:51:16 Overse kernel: shrink_node+0x318/0x549 Sep 27 05:51:16 Overse kernel: balance_pgdat+0x4e9/0x6a2 Sep 27 05:51:16 Overse kernel: ? newidle_balance+0x289/0x30a Sep 27 05:51:16 Overse kernel: kswapd+0x2f0/0x333 Sep 27 05:51:16 Overse kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20 Sep 27 05:51:16 Overse kernel: ? balance_pgdat+0x6a2/0x6a2 Sep 27 05:51:16 Overse kernel: kthread+0xe4/0xef Sep 27 05:51:16 Overse kernel: ? kthread_complete_and_exit+0x1b/0x1b Sep 27 05:51:16 Overse kernel: ret_from_fork+0x1f/0x30 Sep 27 05:51:16 Overse kernel: </TASK> 建议你可以尝试检测下内存,方法参考: Edited October 8, 2023 by JackieWu Quote Link to comment
shushi1010 Posted October 10, 2023 Author Share Posted October 10, 2023 (edited) On 2023/10/8 at PM3点54分, JackieWu said: 你的日志里面出现过页面错误错误,这个错误跟内存有关: 建议您可以尝试检测下内存,方法参考: 感谢大佬回复,我按照你的方式检测了内存,3次都是PASS,还有其他原因会导致这个问题么? Edited October 10, 2023 by shushi1010 Quote Link to comment
JackieWu Posted October 10, 2023 Share Posted October 10, 2023 (edited) 5 hours ago, shushi1010 said: 感谢大佬回复,我按照你的方式检测了内存,3次都是PASS,还有其他原因会导致这个问题么? 目前依然是不定时崩溃吗,如果还有这个情况的话请把日志上传上来(记得开启日志服务器保存日志)。 另外关于内存,我之前遇到过一个特殊的情况,就是系统崩溃是由于内存引起的,但是当时检测内存没有问题。所以即使内存检测 OK 也不一定可以排除内存问题,当然这种情况比较少见,所以目前也还是继续观察看看。 Edited October 10, 2023 by JackieWu Quote Link to comment
shushi1010 Posted October 10, 2023 Author Share Posted October 10, 2023 24 minutes ago, JackieWu said: 目前依然是不定时崩溃吗,如果还有这个情况的话请把日志上传上来(记得开启日志服务器保存日志)。 另外关于内存,我之前遇到过一个特殊的情况,就是系统崩溃是由于内存引起的,但是当时检测内存没有问题。所以即使内存检测 OK 也不一定可以排除内存问题,当然这种情况比较少见,所以目前也还是继续观察看看。 是的,5号左右刚崩溃一次,今天刚开机估计过几天还是要崩溃,日志见附件,看了以下好像跟之前崩溃报错一样 syslog-127.0.0.1.log Quote Link to comment
JackieWu Posted October 10, 2023 Share Posted October 10, 2023 12 minutes ago, shushi1010 said: 是的,5号左右刚崩溃一次,今天刚开机估计过几天还是要崩溃,日志见附件,看了以下好像跟之前崩溃报错一样 syslog-127.0.0.1.log 3.78 MB · 0 downloads 问下,你说的崩溃是指 unraid 完全无法访问了(无法 ping、ssh 无法登录),还是说可能只是 webui 进不去但是 SSH 可以登录之类的情况,如果是后者的话其实不是 unraid 崩溃,那就得从其他方向排查。 Quote Link to comment
shushi1010 Posted October 10, 2023 Author Share Posted October 10, 2023 1 hour ago, JackieWu said: 问下,你说的崩溃是指 unraid 完全无法访问了(无法 ping、ssh 无法登录),还是说可能只是 webui 进不去但是 SSH 可以登录之类的情况,如果是后者的话其实不是 unraid 崩溃,那就得从其他方向排查。 完全无法访问,路由器上也看不到IP,屏幕上显示报错信息,但是无法登录无法退出,只能强制关机重启 Quote Link to comment
JackieWu Posted October 10, 2023 Share Posted October 10, 2023 3 hours ago, shushi1010 said: 是的,5号左右刚崩溃一次,今天刚开机估计过几天还是要崩溃,日志见附件,看了以下好像跟之前崩溃报错一样 syslog-127.0.0.1.log 3.78 MB · 0 downloads 5 号的日志里面有关于 macvlan 的内核日志报错,根据你的说法和目前比较常见的失联问题,猜测可能是由于此错误造成的系统崩溃。你用的是 6.12.4 版本,这个版本有个改动是可以将桥接功能给关掉来解决 macvlan call trace 问题(但这个问题可能与你的问题不相关),你可以尝试利用这一手段去解决,具体的方法可以参考我的博客《6.12.4 关于失联问题的解决办法以及相关更新说明》。 Oct 5 17:00:33 Overse kernel: ------------[ cut here ]------------ Oct 5 17:00:33 Overse kernel: WARNING: CPU: 8 PID: 13757 at net/netfilter/nf_nat_core.c:594 nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Oct 5 17:00:33 Overse kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter nct6683 xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iosf_mbi drm_buddy i2c_algo_bit kvm ttm drm_display_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_kms_helper aesni_intel crypto_simd drm cryptd btusb btrtl btbcm btintel mei_hdcp mei_pxp intel_gtt rapl bluetooth mpt3sas nvme intel_cstate i2c_i801 agpgart intel_wmi_thunderbolt Oct 5 17:00:33 Overse kernel: i2c_smbus wmi_bmof mxm_wmi mei_me raid_class syscopyarea r8169 ahci intel_uncore ecdh_generic nvme_core i2c_core mei sysfillrect scsi_transport_sas joydev libahci ecc realtek sysimgblt thermal fb_sys_fops fan video wmi backlight intel_pmc_core acpi_pad acpi_tad button unix Oct 5 17:00:33 Overse kernel: CPU: 8 PID: 13757 Comm: kworker/u24:0 Tainted: P U W O 6.1.49-Unraid #1 Oct 5 17:00:33 Overse kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C82/MAG B460M MORTAR WIFI (MS-7C82), BIOS 1.10 05/18/2020 Oct 5 17:00:33 Overse kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan] Oct 5 17:00:33 Overse kernel: RIP: 0010:nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Oct 5 17:00:33 Overse kernel: Code: a8 80 75 26 48 8d 73 58 48 8d 7c 24 20 e8 18 1b 3b 00 48 8d 43 0c 4c 8b bb 88 00 00 00 48 89 44 24 18 eb 54 0f ba e0 08 73 07 <0f> 0b e9 75 06 00 00 48 8d 73 58 48 8d 7c 24 20 e8 eb 1a 3b 00 48 Oct 5 17:00:33 Overse kernel: RSP: 0018:ffffc90000304c78 EFLAGS: 00010282 Oct 5 17:00:33 Overse kernel: RAX: 0000000000000180 RBX: ffff888a85b1b700 RCX: ffff88812ec82e00 Oct 5 17:00:33 Overse kernel: RDX: 0000000000000000 RSI: ffffc90000304d5c RDI: ffff888a85b1b700 Oct 5 17:00:33 Overse kernel: RBP: ffffc90000304d40 R08: 00000000d41fa8c0 R09: 0000000000000000 Oct 5 17:00:33 Overse kernel: R10: 0000000000000098 R11: 0000000000000000 R12: ffffc90000304d5c Oct 5 17:00:33 Overse kernel: R13: 0000000000000000 R14: ffffc90000304e40 R15: 0000000000000001 Oct 5 17:00:33 Overse kernel: FS: 0000000000000000(0000) GS:ffff88901f400000(0000) knlGS:0000000000000000 Oct 5 17:00:33 Overse kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 5 17:00:33 Overse kernel: CR2: 00007f67dc05e000 CR3: 00000002f1452004 CR4: 00000000007706e0 Oct 5 17:00:33 Overse kernel: PKRU: 55555554 Oct 5 17:00:33 Overse kernel: Call Trace: Oct 5 17:00:33 Overse kernel: <IRQ> Oct 5 17:00:33 Overse kernel: ? __warn+0xab/0x122 Oct 5 17:00:33 Overse kernel: ? report_bug+0x109/0x17e Oct 5 17:00:33 Overse kernel: ? nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Oct 5 17:00:33 Overse kernel: ? handle_bug+0x41/0x6f Oct 5 17:00:33 Overse kernel: ? exc_invalid_op+0x13/0x60 Oct 5 17:00:33 Overse kernel: ? asm_exc_invalid_op+0x16/0x20 Oct 5 17:00:33 Overse kernel: ? nf_nat_setup_info+0x8c/0x7d1 [nf_nat] Oct 5 17:00:33 Overse kernel: ? nf_nat_setup_info+0x44/0x7d1 [nf_nat] Oct 5 17:00:33 Overse kernel: ? xt_write_recseq_end+0xf/0x1c [ip_tables] Oct 5 17:00:33 Overse kernel: ? __local_bh_enable_ip+0x56/0x6b Oct 5 17:00:33 Overse kernel: ? ipt_do_table+0x57a/0x5bf [ip_tables] Oct 5 17:00:33 Overse kernel: ? xt_write_recseq_end+0xf/0x1c [ip_tables] Oct 5 17:00:33 Overse kernel: __nf_nat_alloc_null_binding+0x66/0x81 [nf_nat] Oct 5 17:00:33 Overse kernel: nf_nat_inet_fn+0xc0/0x1a8 [nf_nat] Oct 5 17:00:33 Overse kernel: nf_nat_ipv4_local_in+0x2a/0xaa [nf_nat] Oct 5 17:00:33 Overse kernel: nf_hook_slow+0x3a/0x96 Oct 5 17:00:33 Overse kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Oct 5 17:00:33 Overse kernel: NF_HOOK.constprop.0+0x79/0xd9 Oct 5 17:00:33 Overse kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Oct 5 17:00:33 Overse kernel: __netif_receive_skb_one_core+0x77/0x9c Oct 5 17:00:33 Overse kernel: process_backlog+0x8c/0x116 Oct 5 17:00:33 Overse kernel: __napi_poll.constprop.0+0x28/0x124 Oct 5 17:00:33 Overse kernel: net_rx_action+0x159/0x24f Oct 5 17:00:33 Overse kernel: __do_softirq+0x126/0x288 Oct 5 17:00:33 Overse kernel: do_softirq+0x7f/0xab Oct 5 17:00:33 Overse kernel: </IRQ> Oct 5 17:00:33 Overse kernel: <TASK> Oct 5 17:00:33 Overse kernel: __local_bh_enable_ip+0x4c/0x6b Oct 5 17:00:33 Overse kernel: netif_rx+0x52/0x5a Oct 5 17:00:33 Overse kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Oct 5 17:00:33 Overse kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Oct 5 17:00:33 Overse kernel: process_one_work+0x1a8/0x295 Oct 5 17:00:33 Overse kernel: worker_thread+0x18b/0x244 Oct 5 17:00:33 Overse kernel: ? rescuer_thread+0x281/0x281 Oct 5 17:00:33 Overse kernel: kthread+0xe4/0xef Oct 5 17:00:33 Overse kernel: ? kthread_complete_and_exit+0x1b/0x1b Oct 5 17:00:33 Overse kernel: ret_from_fork+0x1f/0x30 Oct 5 17:00:33 Overse kernel: </TASK> Oct 5 17:00:33 Overse kernel: ---[ end trace 0000000000000000 ]--- Quote Link to comment
Solution anpple Posted October 10, 2023 Solution Share Posted October 10, 2023 系统完全失联,根据你的报错就是RAM频率和CPU时序冲突的问题。 排除ES版u的问题,解决如下: 1 关闭BIOS内存超频XMP,或者尝试降频 2 重新拔插内存,互换插槽 3 换其他同品牌条再测试 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.