Namarath Posted March 3, 2022 Share Posted March 3, 2022 (edited) My server was running fine for some time now. Here are the specs: Unraid version: 6.9.2 Asus Prime B350-PLUS Ryzen 7 1700 @ 3000 MHz 32 GB DDR4 with 4 Dimms (2x 8GB @ 3000 MHz + 2x 8GB @ 3200 MHz) running at 3000 MHz The CPU was overclocked to 3.7 GHz before, as I used my gaming setup as VM on the server. Since moving to a dedicated gaming rig, I restored all overclocking settings in the BIOS to stock values. After this the server started to randomly freeze up - usually daily. When this happens it is apparently still running (case lights are up ) but is not accessible in any way, since the network stack just stops working. Only way to bring it back is to hard reset the device. Since this behavior started I'm getting following error messages in the syslog: Mar 3 08:39:49 Nexus kernel: mce: [Hardware Error]: Machine check events logged Mar 3 08:39:49 Nexus kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 5: bea0000000000108 Mar 3 08:39:49 Nexus kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff813c3054 MISC d012000100000000 SYND 4d000000 IPID 500b000000000 Mar 3 08:39:49 Nexus kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1646293169 SOCKET 0 APIC 6 microcode 8001138 After seeing this, I run an memtest check overnight, which did not bring up any errors. I attached diagnostics. It is however from a running system, i.e. NOT taken after a crash, as like I said when the server crashes, it crashes for good and I cannot access any logs. Only changes between a perfectly running system and one crashing often is reverting the CPU to stock settings and exchanging the crappy PSU for a good one. Maybe one more thing: I used two of the RAM sticks in my new gaming rig for a moment, before the new ram arrived. After that the sticks were put back into the server. At the same time memtest did not detect any errors - I do know this does not mean there are none, but still. My ideas for further troubleshooting are: - run the server with only 2 RAM sticks at a time to see if this changes anything - resetting BIOS settings to default, in case I f*** something up cleaning the overclocking Any further ideas? Especially about the error message, as I don't really get what it is trying to tell me nexus-diagnostics-20220303-2019.zip Edited March 3, 2022 by Namarath Quote Link to comment
JorgeB Posted March 4, 2022 Share Posted March 4, 2022 https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.