December 10, 2025Dec 10 I have been running Unraid for many years now on a supermicro server. I built a new system earlier this year and am having a recurring issue where the system locks up every 1-3 weeks and requires a hard reboot. When it locks apps slowly fail, the GUI becomes unresponsive, and I can't get an SSH session to load (it authenticates but times out opening a terminal window). I can't even get the screen to load on the IPMI card. The IPMI power functions still work, so I can hard reboot. I have lots of PC troubleshooting experience and built a number of systems over the years, so I can normally figure out hardware issues. I have checked everything I can think of for this issue without success, so need some help to figure out where this error could be coming from.System:Asus WS W680-ACE IPMIi7-14700K64GB Ram (Crucial 64GB DDR5 RAM Kit (2x32GB), 4800MHz CL40)LSI 9300-16i 16-Port 12Gb/sLenovo 00MM862 Intel X550-T2 2-Port 10GbE 10GBase-T PCIe 3.0 x43x SSD arrays (NVME RAID 1 for apps/VMs, SATA SSD RAID1 for download cache, 4x SATA SSD ZFS for immich data)1x primary HDD array (6 array drives + 2 parity drives)850 watt PSU, so not lack of power to spin drivesUnraid 7.2.22 VMs and multiple dockers runningWhat I have done so far:Set TDP limits in BIOS for this CPU to prevent overheating. Now temps rarely hit 80 under heavy loads and quickly recover. Idle is ~45-50CDid 24hr memtest to see if I got bad RAM, but passed all testsReplaced the Unraid boot drive after an error that sounded like a bad flash drive. Still got a lock up a week later.Always has been plugged into an UPS to make sure its not a power issue.Disabled C states that forums said could be an issue. This is an always on server, so no need to sleep.Smacked head against wall...Setup Syslog to write to an array share to save the flash drive and get a more persistent log when it locks up.Snippet of the Syslog file. Replaced tailscale IP with <TS_IP>Dec 9 14:33:10 Atlantis nginx: 2025/12/09 14:33:10 [error] 19740#19740: *232323 open() "/usr/local/emhttp/plugins/dynamix.vm.manager/novnc/defaults.json" failed (2: No such file or directory) while sending to client, client: <TS_IP>, server: , request: "GET /plugins/dynamix.vm.manager/novnc/defaults.json HTTP/1.1", host: "<TS_IP>", referrer: "http://<TS_IP>/plugins/dynamix.vm.manager/vnc.html?v=1746576860&resize=scale&autoconnect=true&host=<TS_IP>&port=&path=/wsproxy/5700/"Dec 9 14:33:10 Atlantis nginx: 2025/12/09 14:33:10 [error] 19740#19740: *232323 open() "/usr/local/emhttp/plugins/dynamix.vm.manager/novnc/mandatory.json" failed (2: No such file or directory) while sending to client, client: <TS_IP>, server: , request: "GET /plugins/dynamix.vm.manager/novnc/mandatory.json HTTP/1.1", host: "<TS_IP>", referrer: "http://<TS_IP>/plugins/dynamix.vm.manager/vnc.html?v=1746576860&resize=scale&autoconnect=true&host=<TS_IP>&port=&path=/wsproxy/5700/"Dec 9 14:33:59 Atlantis kernel: br0: port 2(vnet5) entered disabled stateDec 9 14:33:59 Atlantis kernel: vnet5 (unregistering): left allmulticast modeDec 9 14:33:59 Atlantis kernel: vnet5 (unregistering): left promiscuous modeDec 9 14:33:59 Atlantis kernel: br0: port 2(vnet5) entered disabled stateDec 9 14:33:59 Atlantis kernel: br3: port 2(vnet6) entered disabled stateDec 9 14:33:59 Atlantis kernel: vnet6 (unregistering): left allmulticast modeDec 9 14:33:59 Atlantis kernel: vnet6 (unregistering): left promiscuous modeDec 9 14:33:59 Atlantis kernel: br3: port 2(vnet6) entered disabled stateDec 9 14:33:59 Atlantis virtiofsd[956558]: Client disconnected, shutting downDec 9 14:35:48 Atlantis kernel: Oops: general protection fault, probably for non-canonical address 0xbfff888101905000: 0000 [#1] PREEMPT SMP NOPTIDec 9 14:35:48 Atlantis kernel: CPU: 2 UID: 0 PID: 16361 Comm: node /usr/local Tainted: P O 6.12.54-Unraid #1Dec 9 14:35:48 Atlantis kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULEDec 9 14:35:48 Atlantis kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 4101 12/03/2024Dec 9 14:35:48 Atlantis kernel: RIP: 0010:plist_add+0x39/0xd0Dec 9 14:35:48 Atlantis kernel: Code: 18 49 39 d1 74 02 0f 0b 48 8b 50 08 4c 8d 58 08 49 39 d3 74 02 0f 0b 48 8b 16 48 39 d6 0f 84 83 00 00 00 48 8b 16 31 ed 8b 18 <4c> 8b 42 f8 48 8d 4a e8 48 89 ce 49 83 e8 08 4c 89 c2 3b 1e 7d 0cDec 9 14:35:48 Atlantis kernel: RSP: 0018:ffffc9000150fd28 EFLAGS: 00010246Dec 9 14:35:48 Atlantis kernel: RAX: ffffc9000150fd88 RBX: 0000000000000064 RCX: ffff888102d8e480Dec 9 14:35:48 Atlantis kernel: RDX: bfff888101905008 RSI: ffff888101905008 RDI: ffffc9000150fd88Dec 9 14:35:48 Atlantis kernel: RBP: 0000000000000000 R08: ffffc9000150fd80 R09: ffffc9000150fda0Dec 9 14:35:48 Atlantis kernel: R10: ffff888101905008 R11: ffffc9000150fd90 R12: 0000000000000000Dec 9 14:35:48 Atlantis kernel: R13: ffff888101905000 R14: 0000000000000000 R15: 000000000729c568Dec 9 14:35:48 Atlantis kernel: FS: 000014d5fa420880(0000) GS:ffff88903f280000(0000) knlGS:0000000000000000Dec 9 14:35:48 Atlantis kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033Dec 9 14:35:48 Atlantis kernel: CR2: 000000000051d000 CR3: 0000000109020001 CR4: 0000000000772ef0Dec 9 14:35:48 Atlantis kernel: PKRU: 55555554Dec 9 14:35:48 Atlantis kernel: Call Trace:Dec 9 14:35:48 Atlantis kernel: <TASK>Dec 9 14:35:48 Atlantis kernel: __futex_queue+0x45/0x50Dec 9 14:35:48 Atlantis kernel: futex_wait_queue+0x32/0x80Dec 9 14:35:48 Atlantis kernel: __futex_wait+0x7f/0xe0Dec 9 14:35:48 Atlantis kernel: ? __pfx_futex_wake_mark+0x10/0x10Dec 9 14:35:48 Atlantis kernel: futex_wait+0x64/0x100Dec 9 14:35:48 Atlantis kernel: ? futex_wake+0x13a/0x170Dec 9 14:35:48 Atlantis kernel: do_futex+0xdc/0x160Dec 9 14:35:48 Atlantis kernel: __do_sys_futex+0x114/0x140Dec 9 14:35:48 Atlantis kernel: ? fpregs_assert_state_consistent+0x1f/0x50Dec 9 14:35:48 Atlantis kernel: do_syscall_64+0x68/0xe0Dec 9 14:35:48 Atlantis kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7eDec 9 14:35:48 Atlantis kernel: RIP: 0033:0x14d5fa4c93aeDec 9 14:35:48 Atlantis kernel: Code: 08 0f 85 e5 46 ff ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 80 00 00 00 00 48 83 ec 08Dec 9 14:35:48 Atlantis kernel: RSP: 002b:00007fffc8f7d5d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000caDec 9 14:35:48 Atlantis kernel: RAX: ffffffffffffffda RBX: 000000000729c548 RCX: 000014d5fa4c93aeDec 9 14:35:48 Atlantis kernel: RDX: 0000000000000000 RSI: 0000000000000189 RDI: 000000000729c568Dec 9 14:35:48 Atlantis kernel: RBP: 000000000729c564 R08: 0000000000000000 R09: 00000000ffffffffDec 9 14:35:48 Atlantis kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000Dec 9 14:35:48 Atlantis kernel: R13: 000000000729c4f8 R14: 0000000000000000 R15: 0000000000000000Dec 9 14:35:48 Atlantis kernel: </TASK>Dec 9 14:35:48 Atlantis kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha udp_diag af_packet xt_set ip_set nft_chain_nat nft_compat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle iptable_mangle vhost_net vhost vhost_iotlb tap ipvlan br_netfilter xt_nat nf_conntrack_netlink veth xt_conntrack xfrm_user xt_addrtype xt_MASQUERADE xt_tcpudp xt_mark nfsd auth_rpcgss lockd grace sunrpc tun nf_tables nfnetlink ip6table_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 md_mod ntfs3 tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls ixgbe xfrm_algo mdio igc xe intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel drm_gpuvm drm_exec gpu_sched drm_ttm_helper drm_suballoc_helper zfs(PO) kvm i915 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intelDec 9 14:35:48 Atlantis kernel: sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl mei_pxp mei_hdcp intel_cstate iosf_mbi drm_buddy ast ttm drm_shmem_helper drm_display_helper i2c_algo_bit drm_kms_helper spl(O) intel_gtt mpt3sas i2c_i801 drm mei_me tpm_crb ipmi_ssif agpgart i2c_smbus raid_class intel_uncore sr_mod tpm_tis input_leds tpm_tis_core mei vmd scsi_transport_sas cdrom joydev wmi_bmof led_class acpi_ipmi i2c_core thermal fan video tpm ipmi_si libaescfb ecdh_generic wmi ecc backlight acpi_pad acpi_tad button [last unloaded: xfrm_algo]Dec 9 14:35:48 Atlantis kernel: ---[ end trace 0000000000000000 ]---atlantis-diagnostics-20251209-1553.zip 2025-12-09_syslog-Atlantis.log Edited December 10, 2025Dec 10 by PJPorch Fixed CPU model number
December 10, 2025Dec 10 7 hours ago, PJPorch said:i7-14900KThis would be my main suspect, since it's one of the models most affected by the Intel 13/14 gen issue, look for a BIOS update; it may help if the CPU is not too far gone.Also, and becuase memtest is only definitive if it finds errors, and since you have multiple RAM sticks, try using the server with just one, if the same try with the other one, that will basically rule out bad RAM.
December 10, 2025Dec 10 Author Thanks for the reply. The CPU was one of the few items I didn't have a solid way of testing, but floated in the back of my mind after all the microcode news. I built this system around June and the first thing I always do on a system build is update all BIOS and firmware of components, so I have to assume it has the correct microcode for this CPU. This plus the manual setting of TDP in the BIOS "should" have prevented this CPU from killing itself, but you never know.Are you aware of any way to confirm if the CPU has been damaged? I don't know of a benchmark or test that can confirm it.I will try to run with 1 RAM stick at a time to see if that helps and if it still fails I will swap CPUs.I have a i7-12700k in another system that I might have to borrow just to see if that fixes it as a last resort.
December 10, 2025Dec 10 47 minutes ago, PJPorch said:Are you aware of any way to confirm if the CPU has been damaged?I believe there's a Windows tool you could boot with a Windows live flash drive, though IIRC there have been multiple confirmed cases of the tool not detecting issues, but the users confirming the problem was resolved after replacing the CPU
December 10, 2025Dec 10 Author Ok. I'll attack some of this testing and report back. Thanks for your suggestions.
December 18, 2025Dec 18 Author Swapped the CPU to a i7-12700k, confirmed I was on the latest BIOS, and figured out how to update the Intel ME firmware without using the windows installer. If it makes it for 3-4wks without a crash I think that confirms the CPU was the culprit.
December 19, 2025Dec 19 Author After updating Intel ME firmware, the BIOS and replaced the CPU with a brand new i7-12700k the system worked for about 24hrs. It locked up again and after a forced reboot the iGPU vanished. Noticed this because multiple dockers referenced /dev/dri/renderD128. I don't think RAM failures could make a feature of a CPU vanish, so can probably rule that out. The only conclusion I can make is that the MB is the issue. Unless I am missing something else.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.