Jump to content

6.12.2版本死机,只能强制重启


Go to solution Solved by JackieWu,

Recommended Posts

tower-diagnostics-20230828-1259.zip运行了大概半个月死机了,路由器显示unraid未上线,hdmi连显示器没反应,只能强制重启了,很担心我的硬盘😭

失联前的最后一段日志如下,请大佬看看我这是什么问题

Aug 27 13:17:08 tower webGUI: Successful login user root from 192.168.31.173
Aug 27 13:21:13 tower kernel: docker0: port 2(veth70c60bf) entered blocking state
Aug 27 13:21:13 tower kernel: docker0: port 2(veth70c60bf) entered disabled state
Aug 27 13:21:13 tower kernel: device veth70c60bf entered promiscuous mode
Aug 27 13:21:14 tower kernel: eth0: renamed from veth9b721fc
Aug 27 13:21:14 tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Aug 27 13:21:14 tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth70c60bf: link becomes ready
Aug 27 13:21:14 tower kernel: docker0: port 2(veth70c60bf) entered blocking state
Aug 27 13:21:14 tower kernel: docker0: port 2(veth70c60bf) entered forwarding state
Aug 27 13:22:46 tower webGUI: Successful login user root from 192.168.31.173
Aug 27 13:24:35 tower kernel: general protection fault, probably for non-canonical address 0xffff088204573860: 0000 [#1] PREEMPT SMP NOPTI
Aug 27 13:24:35 tower kernel: CPU: 7 PID: 9040 Comm: dockerd Tainted: P     U     O       6.1.36-Unraid #1
Aug 27 13:24:35 tower kernel: Hardware name: Maxsun MS-TZZ H610ITX 2.5G/MS-TZZ H610ITX 2.5G, BIOS 5.27 03/31/2023
Aug 27 13:24:35 tower kernel: RIP: 0010:evict+0x7d/0x150
Aug 27 13:24:35 tower kernel: Code: 48 8d ab 10 01 00 00 48 39 c5 74 43 48 8b 43 28 48 8d b8 40 05 00 00 e8 4c fe 61 00 48 8b 83 18 01 00 00 48 8b 93 10 01 00 00 <48> 89 42 08 48 89 10 48 8b 43 28 48 89 ab 10 01 00 00 48 89 ab 18
Aug 27 13:24:35 tower kernel: RSP: 0018:ffffc90000807c28 EFLAGS: 00010246
Aug 27 13:24:35 tower kernel: RAX: ffff888204573858 RBX: ffff888204573748 RCX: ffffffff81e41ca0
Aug 27 13:24:35 tower kernel: RDX: ffff088204573858 RSI: ffff8882045737c8 RDI: ffff88813e68e540
Aug 27 13:24:35 tower kernel: RBP: ffff888204573858 R08: ffff88815fec9b98 R09: ffffffff813b856c
Aug 27 13:24:35 tower kernel: R10: ffff888190548240 R11: 0000000000000005 R12: ffffffff81e41ca0
Aug 27 13:24:35 tower kernel: R13: ffff888117ef6718 R14: ffff888143f20000 R15: ffff8883f9bf2e68
Aug 27 13:24:35 tower kernel: FS:  00001540e1477700(0000) GS:ffff8884b09c0000(0000) knlGS:0000000000000000
Aug 27 13:24:35 tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 27 13:24:35 tower kernel: CR2: 000000c00069d360 CR3: 000000015f7a0000 CR4: 0000000000750ee0
Aug 27 13:24:35 tower kernel: PKRU: 55555554
Aug 27 13:24:35 tower kernel: Call Trace:
Aug 27 13:24:35 tower kernel: <TASK>
Aug 27 13:24:35 tower kernel: ? __die_body+0x1a/0x5c
Aug 27 13:24:35 tower kernel: ? die_addr+0x38/0x51
Aug 27 13:24:35 tower kernel: ? exc_general_protection+0x30f/0x345
Aug 27 13:24:35 tower kernel: ? asm_exc_general_protection+0x22/0x30
Aug 27 13:24:35 tower kernel: ? __clear_extent_bit+0x314/0x329
Aug 27 13:24:35 tower kernel: ? evict+0x7d/0x150
Aug 27 13:24:35 tower kernel: ? evict+0x6f/0x150
Aug 27 13:24:35 tower kernel: __dentry_kill+0xcb/0x131
Aug 27 13:24:35 tower kernel: shrink_dentry_list+0xaa/0xba
Aug 27 13:24:35 tower kernel: shrink_dcache_parent+0xf3/0x118
Aug 27 13:24:35 tower kernel: d_invalidate+0x74/0xdd
Aug 27 13:24:35 tower kernel: btrfs_delete_subvolume+0x409/0x528
Aug 27 13:24:35 tower kernel: btrfs_ioctl_snap_destroy+0x42a/0x50c
Aug 27 13:24:35 tower kernel: btrfs_ioctl+0x246/0x2883
Aug 27 13:24:35 tower kernel: ? __do_sys_newfstatat+0x35/0x5c
Aug 27 13:24:35 tower kernel: vfs_ioctl+0x1b/0x2f
Aug 27 13:24:35 tower kernel: __do_sys_ioctl+0x52/0x78
Aug 27 13:24:35 tower kernel: do_syscall_64+0x68/0x81
Aug 27 13:24:35 tower kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Aug 27 13:24:35 tower kernel: RIP: 0033:0x40468e
Aug 27 13:24:35 tower kernel: Code: 48 89 6c 24 38 48 8d 6c 24 38 e8 0d 00 00 00 48 8b 6c 24 38 48 83 c4 40 c3 cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
Aug 27 13:24:35 tower kernel: RSP: 002b:000000c000f05a28 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
Aug 27 13:24:35 tower kernel: RAX: ffffffffffffffda RBX: 00000000000000da RCX: 000000000040468e
Aug 27 13:24:35 tower kernel: RDX: 000000c000f05bb0 RSI: 000000005000940f RDI: 00000000000000da
Aug 27 13:24:35 tower kernel: RBP: 000000c000f05a68 R08: 0000000000000000 R09: 0000000000000000
Aug 27 13:24:35 tower kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 000000c0014b4f10
Aug 27 13:24:35 tower kernel: R13: 0000000000000000 R14: 000000c0096a1ba0 R15: ffffffffffffffff
Aug 27 13:24:35 tower kernel: </TASK>
Aug 27 13:24:35 tower kernel: Modules linked in: xt_REDIRECT xt_mark ts_bm xt_string af_packet ccp xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat xt_addrtype br_netfilter bridge xfs xt_MASQUERADE ip6table_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag i915 iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 i2c_i801 btusb aesni_intel mei_hdcp mei_pxp i2c_smbus btrtl btbcm wmi_bmof crypto_simd
Aug 27 13:24:35 tower kernel: cryptd rapl intel_cstate intel_uncore btintel bluetooth i2c_core mei_me nvme r8169 tpm_crb nvme_core mei realtek tpm_tis tpm_tis_core video ahci input_leds ecdh_generic joydev led_class ecc libahci thermal wmi fan tpm backlight intel_pmc_core acpi_tad acpi_pad button unix
Aug 27 13:24:35 tower kernel: ---[ end trace 0000000000000000 ]---
Aug 27 13:24:35 tower kernel: RIP: 0010:evict+0x7d/0x150
Aug 27 13:24:35 tower kernel: Code: 48 8d ab 10 01 00 00 48 39 c5 74 43 48 8b 43 28 48 8d b8 40 05 00 00 e8 4c fe 61 00 48 8b 83 18 01 00 00 48 8b 93 10 01 00 00 <48> 89 42 08 48 89 10 48 8b 43 28 48 89 ab 10 01 00 00 48 89 ab 18
Aug 27 13:24:35 tower kernel: RSP: 0018:ffffc90000807c28 EFLAGS: 00010246
Aug 27 13:24:35 tower kernel: RAX: ffff888204573858 RBX: ffff888204573748 RCX: ffffffff81e41ca0
Aug 27 13:24:35 tower kernel: RDX: ffff088204573858 RSI: ffff8882045737c8 RDI: ffff88813e68e540
Aug 27 13:24:35 tower kernel: RBP: ffff888204573858 R08: ffff88815fec9b98 R09: ffffffff813b856c
Aug 27 13:24:35 tower kernel: R10: ffff888190548240 R11: 0000000000000005 R12: ffffffff81e41ca0
Aug 27 13:24:35 tower kernel: R13: ffff888117ef6718 R14: ffff888143f20000 R15: ffff8883f9bf2e68
Aug 27 13:24:35 tower kernel: FS:  00001540e1477700(0000) GS:ffff8884b09c0000(0000) knlGS:0000000000000000
Aug 27 13:24:35 tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 27 13:24:35 tower kernel: CR2: 000000c00069d360 CR3: 000000015f7a0000 CR4: 0000000000750ee0
Aug 27 13:24:35 tower kernel: PKRU: 55555554
Aug 27 13:24:35 tower kernel: note: dockerd[9040] exited with preempt_count 1

 

Edited by reknew
Link to comment
  • Solution

这段内核报错涉及到 btrfs 文件系统和内存,不太好能确定是什么问题造成的,不过我建议你最好能检测一下内存(Unraid 自带有 memtest86 内存检测,或者你也可以参考这里的检测方法),如果内存没问题可以尝试升级到 6.12.3 版本。

 

你发上来的压缩包没有包含失联前的日志(8月27日),只有失联后的(8月28日),不过里面有一些关于你缓存池的文件系统报错:

 

Aug 28 07:42:27 tower kernel: XFS (nvme0n1p1): Metadata corruption detected at xfs_dinode_verify+0xa0/0x732 [xfs], inode 0x801eeefa dinode
Aug 28 07:42:27 tower kernel: XFS (nvme0n1p1): Unmount and run xfs_repair
Aug 28 07:42:27 tower kernel: XFS (nvme0n1p1): First 128 bytes of corrupted metadata buffer:
Aug 28 07:42:27 tower kernel: 00000000: 49 4e 41 ff 03 01 00 00 00 00 03 e8 00 00 00 64  INA............d
Aug 28 07:42:27 tower kernel: 00000010: 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00  ................
Aug 28 07:42:27 tower kernel: 00000020: 35 41 ec 29 aa d6 9b 7c 35 42 3c 55 9e 8b ef f6  5A.)...|5B<U....
Aug 28 07:42:27 tower kernel: 00000030: 35 44 42 d3 93 b1 c6 ed 00 00 00 00 00 00 00 5e  5DB............^
Aug 28 07:42:27 tower kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Aug 28 07:42:27 tower kernel: 00000050: 00 00 25 01 00 00 00 00 00 00 00 00 b4 e0 ec 16  ..%.............
Aug 28 07:42:27 tower kernel: 00000060: ff ff ff ff 63 4a cd c2 00 00 00 00 00 00 00 0e  ....cJ..........
Aug 28 07:42:27 tower kernel: 00000070: 00 00 00 07 00 04 74 2f 00 00 00 00 00 00 00 08  ......t/........
Aug 28 07:42:27 tower kernel: XFS (nvme0n1p1): Metadata corruption detected at xfs_dinode_verify+0xa0/0x732 [xfs], inode 0x801eeefa dinode

 

但这个报错应该跟失联问题关系不大,建议你修复下缓存池的文件系统。

Edited by JackieWu
  • Thanks 1
Link to comment
  • 2 weeks later...
On 8/29/2023 at 12:34 AM, JackieWu said:

这段内核报错涉及到 btrfs 文件系统和内存,不太好能确定是什么问题造成的,不过我建议你最好能检测一下内存(Unraid 自带有 memtest86 内存检测,或者你也可以参考这里的检测方法),如果内存没问题可以尝试升级到 6.12.3 版本。

 

你发上来的压缩包没有包含失联前的日志(8月27日),只有失联后的(8月28日),不过里面有一些关于你缓存池的文件系统报错:

 

Aug 28 07:42:27 tower kernel: XFS (nvme0n1p1): Metadata corruption detected at xfs_dinode_verify+0xa0/0x732 [xfs], inode 0x801eeefa dinode
Aug 28 07:42:27 tower kernel: XFS (nvme0n1p1): Unmount and run xfs_repair
Aug 28 07:42:27 tower kernel: XFS (nvme0n1p1): First 128 bytes of corrupted metadata buffer:
Aug 28 07:42:27 tower kernel: 00000000: 49 4e 41 ff 03 01 00 00 00 00 03 e8 00 00 00 64  INA............d
Aug 28 07:42:27 tower kernel: 00000010: 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00  ................
Aug 28 07:42:27 tower kernel: 00000020: 35 41 ec 29 aa d6 9b 7c 35 42 3c 55 9e 8b ef f6  5A.)...|5B<U....
Aug 28 07:42:27 tower kernel: 00000030: 35 44 42 d3 93 b1 c6 ed 00 00 00 00 00 00 00 5e  5DB............^
Aug 28 07:42:27 tower kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Aug 28 07:42:27 tower kernel: 00000050: 00 00 25 01 00 00 00 00 00 00 00 00 b4 e0 ec 16  ..%.............
Aug 28 07:42:27 tower kernel: 00000060: ff ff ff ff 63 4a cd c2 00 00 00 00 00 00 00 0e  ....cJ..........
Aug 28 07:42:27 tower kernel: 00000070: 00 00 00 07 00 04 74 2f 00 00 00 00 00 00 00 08  ......t/........
Aug 28 07:42:27 tower kernel: XFS (nvme0n1p1): Metadata corruption detected at xfs_dinode_verify+0xa0/0x732 [xfs], inode 0x801eeefa dinode

 

但这个报错应该跟失联问题关系不大,建议你修复下缓存池的文件系统。

用memtest测试发现内存确实有问题,哪怕降频到2133也会零星报错,已经申请换货了,另外文件系统的报错也修复了,谢谢大佬!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...