Jump to content

今天unraid频繁死机,尝试了好多方法都无法解决问题,大佬们帮我看一下日志,看看到底是什么问题。


Recommended Posts

死机前日志如下,大佬们帮我分析一下,感谢!

Apr  4 20:02:56 Tower autofan: Highest disk temp is 32C, adjusting fan speed from: 91 (35% @ 1394rpm) to: 74 (29% @ 1194rpm)
Apr  4 20:02:57 Tower autofan: Highest disk temp is 32C, adjusting fan speed from: 91 (35% @ 1407rpm) to: 74 (29% @ 1189rpm)
Apr  4 20:02:57 Tower autofan: Highest disk temp is 32C, adjusting fan speed from: 91 (35% @ 1394rpm) to: 74 (29% @ 1194rpm)
Apr  4 20:02:58 Tower root: Fix Common Problems Version 2024.03.29
### [PREVIOUS LINE REPEATED 8 TIMES] ###
Apr  4 20:03:24 Tower flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update
Apr  4 20:03:31 Tower root: Fix Common Problems Version 2024.03.29
Apr  4 20:04:00 Tower kernel: br0: port 3(vnet1) entered blocking state
Apr  4 20:04:00 Tower kernel: br0: port 3(vnet1) entered disabled state
Apr  4 20:04:00 Tower kernel: device vnet1 entered promiscuous mode
Apr  4 20:04:00 Tower kernel: br0: port 3(vnet1) entered blocking state
Apr  4 20:04:00 Tower kernel: br0: port 3(vnet1) entered forwarding state
Apr  4 20:04:01 Tower autofan: Highest disk temp is 33C, adjusting fan speed from: 74 (29% @ 1186rpm) to: 91 (35% @ 1401rpm)
Apr  4 20:04:02 Tower autofan: Highest disk temp is 33C, adjusting fan speed from: 74 (29% @ 1197rpm) to: 91 (35% @ 1403rpm)
Apr  4 20:04:02 Tower autofan: Highest disk temp is 33C, adjusting fan speed from: 74 (29% @ 1172rpm) to: 91 (35% @ 1407rpm)
Apr  4 20:15:27 Tower kernel: traps: zfs[20341] general protection fault ip:15114d172e10 sp:7ffeaaa15df0 error:0 in ld-2.37.so[15114d165000+27000]
Apr  4 20:15:50 Tower kernel: br0: port 3(vnet1) entered disabled state
Apr  4 20:15:50 Tower kernel: device vnet1 left promiscuous mode
Apr  4 20:15:50 Tower kernel: br0: port 3(vnet1) entered disabled state
Apr  4 20:24:12 Tower kernel: traps: notify[15755] general protection fault ip:8c013c sp:7fff7e543870 error:0 in php[600000+3b3000]
Apr  4 20:28:38 Tower kernel: BUG: unable to handle page fault for address: ffff8881c39a56a0
Apr  4 20:28:38 Tower kernel: #PF: supervisor read access in kernel mode
Apr  4 20:28:38 Tower kernel: #PF: error_code(0x0009) - reserved bit violation
Apr  4 20:28:38 Tower kernel: PGD 2c01067 P4D 2c01067 PUD 1c3940063 PMD 1c3941063 PTE 808b0001c39a5063
Apr  4 20:28:38 Tower kernel: Oops: 0009 [#1] PREEMPT SMP PTI
Apr  4 20:28:38 Tower kernel: CPU: 5 PID: 29455 Comm: smartctl_type Tainted: P           O       6.1.79-Unraid #1
Apr  4 20:28:38 Tower kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B47/Z370 TOMAHAWK (MS-7B47), BIOS 1.60 11/19/2018
Apr  4 20:28:38 Tower kernel: RIP: 0010:___slab_alloc+0x242/0x6fe
Apr  4 20:28:38 Tower kernel: Code: 89 c6 49 89 c7 fa 0f 1f 44 00 00 e8 0d 56 65 00 4c 8b 6b 10 4d 85 ed 75 ac 48 8b 44 24 30 48 89 43 10 e8 f6 55 65 00 8b 45 28 <49> 8b 04 04 48 81 43 08 00 01 00 00 48 89 03 e8 df 55 65 00 41 0f
Apr  4 20:28:38 Tower kernel: RSP: 0018:ffffc9002c7afc00 EFLAGS: 00010002
Apr  4 20:28:38 Tower kernel: RAX: 0000000000000020 RBX: ffff88902eb72e10 RCX: 0000000080400040
Apr  4 20:28:38 Tower kernel: RDX: ffff888366fb2308 RSI: ffffffff820d8b42 RDI: ffffffff820d904b
Apr  4 20:28:38 Tower kernel: RBP: ffff8881001ea400 R08: 0000000000000000 R09: 0000000080400040
Apr  4 20:28:38 Tower kernel: R10: ffff888366fb2308 R11: ffff888366fb230c R12: ffff8881c39a5680
Apr  4 20:28:38 Tower kernel: R13: 0000000000000200 R14: 00000000ffffffff R15: 0000000000000202
Apr  4 20:28:38 Tower kernel: FS:  000014ca66913640(0000) GS:ffff88902eb40000(0000) knlGS:0000000000000000
Apr  4 20:28:38 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  4 20:28:38 Tower kernel: CR2: ffff8881c39a56a0 CR3: 000000093ab42006 CR4: 00000000003726e0
Apr  4 20:28:38 Tower kernel: Call Trace:
Apr  4 20:28:38 Tower kernel: <TASK>
Apr  4 20:28:38 Tower kernel: ? __die_body+0x1a/0x5c
Apr  4 20:28:38 Tower kernel: ? page_fault_oops+0x329/0x376
Apr  4 20:28:38 Tower kernel: ? fixup_exception+0x22/0x24b
Apr  4 20:28:38 Tower kernel: ? exc_page_fault+0xf4/0x11d
Apr  4 20:28:38 Tower kernel: ? asm_exc_page_fault+0x22/0x30
Apr  4 20:28:38 Tower kernel: ? ___slab_alloc+0x242/0x6fe
Apr  4 20:28:38 Tower kernel: ? anon_vma_clone+0x4d/0x134
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Apr  4 20:28:38 Tower kernel: __slab_alloc.constprop.0+0x5a/0x85
Apr  4 20:28:38 Tower kernel: ? anon_vma_clone+0x4d/0x134
Apr  4 20:28:38 Tower kernel: kmem_cache_alloc+0x97/0x14d
Apr  4 20:28:38 Tower kernel: anon_vma_clone+0x4d/0x134
Apr  4 20:28:38 Tower kernel: __split_vma+0x95/0x16b
Apr  4 20:28:38 Tower kernel: mprotect_fixup+0x1c2/0x2bf
Apr  4 20:28:38 Tower kernel: do_mprotect_pkey+0x27b/0x348
Apr  4 20:28:38 Tower kernel: ? preempt_latency_start+0x1e/0x46
Apr  4 20:28:38 Tower kernel: ? up_read+0x47/0x5d
Apr  4 20:28:38 Tower kernel: ? do_user_addr_fault+0x35a/0x48d
Apr  4 20:28:38 Tower kernel: __x64_sys_mprotect+0x19/0x20
Apr  4 20:28:38 Tower kernel: do_syscall_64+0x68/0x81
Apr  4 20:28:38 Tower kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Apr  4 20:28:38 Tower kernel: RIP: 0033:0x14ca6a5f9e97
Apr  4 20:28:38 Tower kernel: Code: 00 00 00 b8 0b 00 00 00 0f 05 48 3d 01 f0 ff ff 73 01 c3 48 8d 0d f9 33 01 00 f7 d8 89 01 48 83 c8 ff c3 b8 0a 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8d 0d d9 33 01 00 f7 d8 89 01 48 83
Apr  4 20:28:38 Tower kernel: RSP: 002b:00007ffcf0334058 EFLAGS: 00000206 ORIG_RAX: 000000000000000a
Apr  4 20:28:38 Tower kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014ca6a5f9e97
Apr  4 20:28:38 Tower kernel: RDX: 0000000000000001 RSI: 0000000000001000 RDI: 000014ca6a528000
Apr  4 20:28:38 Tower kernel: RBP: 00007ffcf0334160 R08: 0000000000000000 R09: 000014ca69ad9860
Apr  4 20:28:38 Tower kernel: R10: 000014ca6a451240 R11: 0000000000000206 R12: 0000000000000000
Apr  4 20:28:38 Tower kernel: R13: 00007ffcf0334130 R14: 000014ca6a451258 R15: 000014ca6a5c9f30
Apr  4 20:28:38 Tower kernel: </TASK>
Apr  4 20:28:38 Tower kernel: Modules linked in: af_packet vhost_net tun vhost tap kvm_intel kvm bluetooth ecdh_generic ecc xt_nat veth nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype br_netfilter cmac cifs asn1_decoder cifs_arc4 cifs_md4 dns_resolver xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_iotlb xfs md_mod nfsd auth_rpcgss oid_registry lockd grace sunrpc tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls ixgbe xfrm_algo mdio r8169 realtek zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) intel_rapl_msr znvpair(PO) intel_rapl_common spl(O) x86_pkg_temp_thermal intel_powerclamp coretemp i915 iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel mei_pxp mei_hdcp drm ghash_clmulni_intel
Apr  4 20:28:38 Tower kernel: sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel mxm_wmi crypto_simd cryptd nvme intel_gtt rapl mei_me i2c_i801 agpgart intel_cstate i2c_smbus syscopyarea ahci sysfillrect sysimgblt e1000e intel_uncore i2c_core nvme_core mei libahci fb_sys_fops thermal fan video wmi backlight intel_pmc_core acpi_pad button unix [last unloaded: kvm]
Apr  4 20:28:38 Tower kernel: CR2: ffff8881c39a56a0
Apr  4 20:28:38 Tower kernel: ---[ end trace 0000000000000000 ]---
Apr  4 20:28:38 Tower kernel: RIP: 0010:___slab_alloc+0x242/0x6fe
Apr  4 20:28:38 Tower kernel: Code: 89 c6 49 89 c7 fa 0f 1f 44 00 00 e8 0d 56 65 00 4c 8b 6b 10 4d 85 ed 75 ac 48 8b 44 24 30 48 89 43 10 e8 f6 55 65 00 8b 45 28 <49> 8b 04 04 48 81 43 08 00 01 00 00 48 89 03 e8 df 55 65 00 41 0f
Apr  4 20:28:38 Tower kernel: RSP: 0018:ffffc9002c7afc00 EFLAGS: 00010002
Apr  4 20:28:38 Tower kernel: RAX: 0000000000000020 RBX: ffff88902eb72e10 RCX: 0000000080400040
Apr  4 20:28:38 Tower kernel: RDX: ffff888366fb2308 RSI: ffffffff820d8b42 RDI: ffffffff820d904b
Apr  4 20:28:38 Tower kernel: RBP: ffff8881001ea400 R08: 0000000000000000 R09: 0000000080400040
Apr  4 20:28:38 Tower kernel: R10: ffff888366fb2308 R11: ffff888366fb230c R12: ffff8881c39a5680
Apr  4 20:28:38 Tower kernel: R13: 0000000000000200 R14: 00000000ffffffff R15: 0000000000000202
Apr  4 20:28:38 Tower kernel: FS:  000014ca66913640(0000) GS:ffff88902eb40000(0000) knlGS:0000000000000000
Apr  4 20:28:38 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  4 20:28:38 Tower kernel: CR2: ffff8881c39a56a0 CR3: 000000093ab42006 CR4: 00000000003726e0
Apr  4 20:28:38 Tower kernel: note: smartctl_type[29455] exited with irqs disabled
Apr  4 20:28:38 Tower kernel: note: smartctl_type[29455] exited with preempt_count 1
 

Link to comment

有没有完整的日志,有的话请上传完整的日志文件,参考:

 

 

目前你提供的日志里面有关于内存方面的信息,所以有可能是内存造成的,建议你假期时间多的话去检测下内存,方法参考(另外也参考下面这个帖子里面的信息检查下 BIOS 里面有没有开启超频相关的选项,包括内存超频和CPU超频,有的都关掉):

 

 

 

Link to comment
52 minutes ago, JackieWu said:

有没有完整的日志,有的话请上传完整的日志文件,参考:

 

 

目前你提供的日志里面有关于内存方面的信息,所以有可能是内存造成的,建议你假期时间多的话去检测下内存,方法参考(另外也参考下面这个帖子里面的信息检查下 BIOS 里面有没有开启超频相关的选项,包括内存超频和CPU超频,有的都关掉):

 

 

 

感谢解答,目前没有开启DOCKER中transmission容器,使用至现在都正常,怀疑是该下载容器问题?迟些时候我再测试下内存看看。全部LOG已上传附件,麻烦帮忙再看看。

tower-diagnostics-20240404-2050.zip

Link to comment
21 minutes ago, wander606 said:

感谢解答,目前没有开启DOCKER中transmission容器,使用至现在都正常,怀疑是该下载容器问题?迟些时候我再测试下内存看看。全部LOG已上传附件,麻烦帮忙再看看。

tower-diagnostics-20240404-2050.zip 160.62 kB · 0 downloads

 

你是不是使用 Unassigned Devices 插件挂载了其他服务器上面的 NFS / SMB 共享到 Unraid 之后,将这些挂载的共享映射到 Transmission 或者 Qbittorrent 里面?

Link to comment
15 minutes ago, JackieWu said:

 

你是不是使用 Unassigned Devices 插件挂载了其他服务器上面的 NFS / SMB 共享到 Unraid 之后,将这些挂载的共享映射到 Transmission 或者 Qbittorrent 里面?

是的,我还有一台truenas,用Unassigned Devices 插件挂载了2个smb共享到unraid,一个映射到luckyBackup,一个映射到Qbittorrent,并未映射至transmission。目前开启qbittorrent,未开启transmission,开机2小时未出现死机现象(之前一般十几分钟就会死机无法访问webui)

Link to comment
3 minutes ago, wander606 said:

是的,我还有一台truenas,用Unassigned Devices 插件挂载了2个smb共享到unraid,一个映射到luckyBackup,一个映射到Qbittorrent,并未映射至transmission。目前开启qbittorrent,未开启transmission,开机2小时未出现死机现象(之前一般十几分钟就会死机无法访问webui)

 

有空的话建议检测一下内存。

 

然后在映射 Unassigned Devices 挂载的共享到 QB 或者其他 Docker 容器的时候,将 Access Mode(访问模式)改成“读写-从属”:

 

Snipaste_2024-04-04_23-38-33.thumb.png.d0a4631facc4152d05ee331471b0528d.png

Link to comment
3 minutes ago, wander606 said:

单独测试两根内存都显示PASS,都插上就报错,大佬有没有解决办法?

 

这种是兼容性问题,没有什么好的办法,只能是换内存了。

 

另外也有可能是内存频率问题导致,如果你内存运行在较高的频率,你可以尝试在 BIOS 降低内存频率看看。

Edited by JackieWu
Link to comment
2 minutes ago, JackieWu said:

 

这种是兼容性问题,没有什么好的办法,只能是换内存了。

好吧,我再找找看有没有办法,问题是这2条内存是去年5月份买的,稳定用到现在已经近一年了,今天突然出现这样的问题,我也是很头疼。感谢大佬!

Link to comment
1 minute ago, wander606 said:

好吧,我再找找看有没有办法,问题是这2条内存是去年5月份买的,稳定用到现在已经近一年了,今天突然出现这样的问题,我也是很头疼。感谢大佬!

 

另外也有可能是内存频率问题导致,如果你内存运行在较高的频率,你可以尝试在 BIOS 降低内存频率看看。

Link to comment
10 minutes ago, JackieWu said:

 

另外也有可能是内存频率问题导致,如果你内存运行在较高的频率,你可以尝试在 BIOS 降低内存频率看看。

目前尝试更换插槽,不组成双通道,频率保持内存默认2666不变,目前测试半程,未有报错,好像有戏,明天起床看看成果,到时候来论坛上图,希望能给碰到相同问题的朋友一点思路。

Link to comment

???我的帖子咋没了,刚回复完你,编辑完后续。我也是内存问题,前前后后花了一个月,如果有售后,建议确认是内存问题的话直接换吧,没有售后可以再试试,不过能用的概率感觉很小。我也是memtest单条通过,双条故障,但是实际是只插单条实际运行也会报错,只是时间长一点才报错,换了主力机内存什么问题都没有了,现在京东售后更换完了,正常运行4天,没有任何报错。实在是太费时间了!

Link to comment

昨天尝试内存安装在主板2、3号插槽,取消双通道,频率参数保持默认2666,跑了一夜的测试无出错,目前先开机进入unraid,长期测试下系统的稳定性,有相同问题的小伙伴可以关注下我后期的稳定性,一个月、一个季度后我会再来发帖的哦!101.thumb.jpg.bf9007873f0cdc90d8d9fd87ce5d4410.jpg

Link to comment
6 hours ago, zwh1015 said:

???我的帖子咋没了,刚回复完你,编辑完后续。我也是内存问题,前前后后花了一个月,如果有售后,建议确认是内存问题的话直接换吧,没有售后可以再试试,不过能用的概率感觉很小。我也是memtest单条通过,双条故障,但是实际是只插单条实际运行也会报错,只是时间长一点才报错,换了主力机内存什么问题都没有了,现在京东售后更换完了,正常运行4天,没有任何报错。实在是太费时间了!

大佬可以看下我上面的

Link to comment
15 hours ago, JackieWu said:

 

有空的话建议检测一下内存。

 

然后在映射 Unassigned Devices 挂载的共享到 QB 或者其他 Docker 容器的时候,将 Access Mode(访问模式)改成“读写-从属”:

 

Snipaste_2024-04-04_23-38-33.thumb.png.d0a4631facc4152d05ee331471b0528d.png

多问一下,Access Mode中 Read/Write 和 Read/Write - Slave两种模式的区别是什么?我有一个docker 容器挂载的目录是/mnt/ ,是否需要修改设置为Read/Write - Slave模式?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...