flallnatural Posted April 8, 2023 Share Posted April 8, 2023 (edited) Hello, In the past month I have started getting random reboots and MCE Hardware errors in my log. Running a intel i9-10850k on an Asrock Z590 Pro 4 with 64gb of RAM. I have an HBA card and a Nvidia P2000 installed and nothing else. I changed out my HBA card the last time I posted about this thinking that was the issue since I was having drives disappearing. At the time I had also done a memtest at that time and it passed. Since then the errors have continued. I have attached my diagnostics after the latest reboot that happened last night (Apr 8th around 1 AM it seems). The server was nearing completion of a parity sync and the reboot occurred. I have seen this behavior at least 2 times before. Also have the syslog server running for the past couple weeks to see if I can catch anything. I am starting to wonder if the motherboard has gone bad somehow. The HBA card issue happened on PCIEx16 slot 1 and after replacing the HBA card with a brand new one it has been ok with no drive issues. However since then, one of the hardware errors/reboots was preceded by a transcoder error with my nvidia gpu on PCIEx16 slot 2. The latest reboot does not show either of the same errors I have seen before. Some help would be greatly appreciated. unraid-diagnostics-20230408-0907.zip syslog-10.10.20.11.log Edited April 8, 2023 by flallnatural Quote Link to comment
Squid Posted April 8, 2023 Share Posted April 8, 2023 5 minutes ago, flallnatural said: i9-10850k First thing to do is to ensure that your TDW is set correctly within the BIOS. And NOT Auto. In your case it should be set to 125W. "Auto" will usually run your CPU in an inherent overclock situation where it will completely ignore the TDP limits of your processor. Quote Link to comment
flallnatural Posted April 8, 2023 Author Share Posted April 8, 2023 4 minutes ago, Squid said: First thing to do is to ensure that your TDW is set correctly within the BIOS. And NOT Auto. In your case it should be set to 125W. "Auto" will usually run your CPU in an inherent overclock situation where it will completely ignore the TDP limits of your processor. Hm ok I've had it on Auto for well over a year now with no issues with Auto on the CPU ratio limit too. I will change that first thing. Any other recommendations? I'm not sure how else to test for these hardware errors after changes other than letting it run for a while and waiting for a reboot to happen because it always seems stable for a while everything working great and then boom reboot. Quote Link to comment
Squid Posted April 8, 2023 Share Posted April 8, 2023 Stop overclocking your memory via the XMP profile. Run it instead at the speed you actually bought. 2133 not 3600 (ie: run at SPD speed not XMP), and also run memtest for minimum of a pass or 2 Quote Link to comment
flallnatural Posted April 8, 2023 Author Share Posted April 8, 2023 38 minutes ago, Squid said: Stop overclocking your memory via the XMP profile. Run it instead at the speed you actually bought. 2133 not 3600 (ie: run at SPD speed not XMP), and also run memtest for minimum of a pass or 2 Alright I updated BIOS to latest version, and from stock settings I set the power limit and no XMP. No other changes in BIOS. Running the memtest now so lets see what happens. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.