Jump to content

Would a failing Intel CPU show in the Unraid Log?


Go to solution Solved by JorgeB,

Recommended Posts

I’m wondering if the crashes on 13th and 14th gen CPUs would show somehow in the Unraid log as a kernel panic or other error.  I’ve been troubleshooting panic-type errors for months on my new 13th gen CPU and wondering if my CPU could be experiencing the issues.  Anyone have experience here?

Link to comment

I'm not sure it will get logged. If you log into the management console when your system is locked up, you will see it dumped to stdout and it whql / hardware failure.

 

I read you can go into your bios and disable all p-cores and only run on e-cores.  If you system is stable your CPU needs RMAing, its toast and has been destroyed by the defect where too much voltage is set to the P cores.

 

When this Intel catastrophe came to light i stopped all encoding and pinned things down to minimal CPU usage to avoid the system being fully loaded.  It would be great i we could get the microcode bundled with unraid as I'm not optimistic motherboard manufacturers will be releasing bios updates with any sense of urgency:

 

Edited by scs3jb
extra info
Link to comment

Here's an example of the call traces I am seeing in my logs.  I get these off and on every few days, and they don't always cause a crash immediately:  

 

Aug  9 05:00:43 Tower kernel: ------------[ cut here ]------------
Aug  9 05:00:43 Tower kernel: Can't encode file handler for inotify: 255
Aug  9 05:00:43 Tower kernel: WARNING: CPU: 4 PID: 22231 at fs/notify/fdinfo.c:55 show_mark_fhandle+0x73/0xe2
Aug  9 05:00:43 Tower kernel: Modules linked in: xt_mark veth wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha udp_diag xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat xt_nat xt_tcpudp iptable_mangle vhost_net tun vhost vhost_iotlb tap ipvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod tcp_diag inet_diag nct6683 ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc intel_rapl_msr intel_rapl_common i915 x86_pkg_temp_thermal intel_powerclamp zfs(PO) coretemp kvm_intel kvm iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper zunicode(PO) drm_kms_helper zzstd(O) crct10dif_pclmul crc32_pclmul drm crc32c_intel ghash_clmulni_intel sha512_ssse3 zlua(O) sha256_ssse3 sha1_ssse3 aesni_intel sr_mod cdrom zavl(PO) mei_hdcp crypto_simd mei_pxp intel_gtt
Aug  9 05:00:43 Tower kernel: cryptd icp(PO) rapl zcommon(PO) znvpair(PO) spl(O) mxm_wmi wmi_bmof intel_cstate intel_uncore nvme i2c_i801 agpgart ahci mei_me i2c_smbus input_leds mpt3sas igc i2c_core mei nvme_core led_class joydev libahci syscopyarea sysfillrect raid_class sysimgblt scsi_transport_sas fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight acpi_pad intel_pmc_core acpi_tad button unix
Aug  9 05:00:43 Tower kernel: CPU: 4 PID: 22231 Comm: lsof Tainted: P           O       6.1.79-Unraid #1
Aug  9 05:00:43 Tower kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D30/MPG Z690 FORCE WIFI (MS-7D30), BIOS A.G0 01/19/2024
Aug  9 05:00:43 Tower kernel: RIP: 0010:show_mark_fhandle+0x73/0xe2
Aug  9 05:00:43 Tower kernel: Code: ff 00 00 00 89 c1 74 04 85 c0 79 22 80 3d 71 ec 10 01 00 75 5e 89 ce 48 c7 c7 4e 7c 0e 82 c6 05 5f ec 10 01 01 e8 f5 bb de ff <0f> 0b eb 45 89 44 24 0c 8b 44 24 04 48 89 ef 31 db 48 c7 c6 8c 7c
Aug  9 05:00:43 Tower kernel: RSP: 0018:ffffc9004c727c28 EFLAGS: 00010286
Aug  9 05:00:43 Tower kernel: RAX: 0000000000000000 RBX: ffff888d786d9c70 RCX: 0000000000000027
Aug  9 05:00:43 Tower kernel: RDX: 0000000000000002 RSI: ffffffff820d8b42 RDI: 00000000ffffffff
Aug  9 05:00:43 Tower kernel: RBP: ffff8881070acf78 R08: 0000000000000000 R09: ffffffff829533f0
Aug  9 05:00:43 Tower kernel: R10: 00003fffffffffff R11: ffff88987f7b1b42 R12: ffff8881070acf78
Aug  9 05:00:43 Tower kernel: R13: ffff8881070acf78 R14: ffffffff81281eba R15: ffff888108d7cc78
Aug  9 05:00:43 Tower kernel: FS:  00001496a286ae00(0000) GS:ffff88981f300000(0000) knlGS:0000000000000000
Aug  9 05:00:43 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  9 05:00:43 Tower kernel: CR2: 00000000004dd788 CR3: 0000000995492000 CR4: 0000000000752ee0
Aug  9 05:00:43 Tower kernel: PKRU: 55555554
Aug  9 05:00:43 Tower kernel: Call Trace:
Aug  9 05:00:43 Tower kernel: 
Aug  9 05:00:43 Tower kernel: ? __warn+0xab/0x122
Aug  9 05:00:43 Tower kernel: ? report_bug+0x109/0x17e
Aug  9 05:00:43 Tower kernel: ? show_mark_fhandle+0x73/0xe2
Aug  9 05:00:43 Tower kernel: ? handle_bug+0x41/0x6f
Aug  9 05:00:43 Tower kernel: ? exc_invalid_op+0x13/0x60
Aug  9 05:00:43 Tower kernel: ? asm_exc_invalid_op+0x16/0x20
Aug  9 05:00:43 Tower kernel: ? fanotify_fdinfo+0xfd/0xfd
Aug  9 05:00:43 Tower kernel: ? show_mark_fhandle+0x73/0xe2
Aug  9 05:00:43 Tower kernel: ? show_mark_fhandle+0x73/0xe2
Aug  9 05:00:43 Tower kernel: ? fanotify_fdinfo+0xfd/0xfd
Aug  9 05:00:43 Tower kernel: ? seq_vprintf+0x33/0x49
Aug  9 05:00:43 Tower kernel: ? seq_printf+0x53/0x6e
Aug  9 05:00:43 Tower kernel: ? preempt_latency_start+0x2b/0x46
Aug  9 05:00:43 Tower kernel: inotify_fdinfo+0x83/0xaa
Aug  9 05:00:43 Tower kernel: show_fdinfo.isra.0+0x63/0xab
Aug  9 05:00:43 Tower kernel: seq_show+0x13f/0x15d
Aug  9 05:00:43 Tower kernel: seq_read_iter+0x169/0x346
Aug  9 05:00:43 Tower kernel: ? slab_post_alloc_hook+0x4d/0x15e
Aug  9 05:00:43 Tower kernel: seq_read+0x92/0xbc
Aug  9 05:00:43 Tower kernel: vfs_read+0xa4/0x19f
Aug  9 05:00:43 Tower kernel: ? __do_sys_newfstatat+0x35/0x5c
Aug  9 05:00:43 Tower kernel: ksys_read+0x76/0xc2
Aug  9 05:00:43 Tower kernel: do_syscall_64+0x68/0x81
Aug  9 05:00:43 Tower kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Aug  9 05:00:43 Tower kernel: RIP: 0033:0x1496a2af6afd
Aug  9 05:00:43 Tower kernel: Code: 31 c0 e9 e6 fe ff ff 50 48 8d 3d 36 a1 0a 00 e8 49 15 02 00 66 0f 1f 84 00 00 00 00 00 80 3d e1 ca 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
Aug  9 05:00:43 Tower kernel: RSP: 002b:00007fff152c5f18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Aug  9 05:00:43 Tower kernel: RAX: ffffffffffffffda RBX: 00000000004362c0 RCX: 00001496a2af6afd
Aug  9 05:00:43 Tower kernel: RDX: 0000000000000400 RSI: 000000000046d410 RDI: 0000000000000005
Aug  9 05:00:43 Tower kernel: RBP: 00001496a2bd8600 R08: 0000000000000001 R09: 0000000000000000
Aug  9 05:00:43 Tower kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 000000000000000a
Aug  9 05:00:43 Tower kernel: R13: 0000000000000a68 R14: 00001496a2bd7d00 R15: 0000000000000a68
Aug  9 05:00:43 Tower kernel: 
Aug  9 05:00:43 Tower kernel: ---[ end trace 0000000000000000 ]---

 

Memtest passes after 24 hours (this was several months ago now), the board was RMA'd when I got it new, so now I'm wondering if it's the CPU after all.  I have a 13500, non-K model, which is not even listed as at risk (for now), but I'm still wondering about the moderate instability.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...