montag Posted August 25, 2020 Share Posted August 25, 2020 (edited) Hi, got this error for the first time, wondering if someone might be able to let me know if it's cause for concern or if there's a fix. As a sidenote I'm new to unraid and this board has been an invaluable resource for me, thanks in advance if you can help. Here's the error Aug 25 12:51:13 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 25 12:51:13 Tower kernel: [Hardware Error]: Corrected error, no action required. Aug 25 12:51:13 Tower kernel: [Hardware Error]: CPU:0 (17:71:0) MC25_STATUS[-|CE|MiscV|-|-|-|-|-|-|CECC]: 0x98004000003e0000 Aug 25 12:51:13 Tower kernel: [Hardware Error]: IPID: 0x000100ff03830400 Aug 25 12:51:13 Tower kernel: [Hardware Error]: Bank 25 is reserved. Aug 25 12:51:13 Tower kernel: [Hardware Error]: cache level: RESV, tx: INSN Aug 25 12:51:48 Tower ntpd[2385]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Aug 25 12:59:00 Tower root: Fix Common Problems Version 2020.08.02 Aug 25 12:59:07 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server I found a similar thread here, which is a very similar set up to my own --------------EDIT-------------- After a reboot the error seems to have gone away, I had to reboot several times prior to the error, and it only occurred once. Maybe a glitch? Regardless seems okay now after an extended test in Fix Common Problems. Will run memtest86 later tonight for good measure. --------------EDIT 2-------------- Nope it's back, and here's an update from the logs, after I had installed mcelog as Fix Common Problems had suggested: Aug 25 14:21:26 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 25 14:21:26 Tower kernel: [Hardware Error]: Corrected error, no action required. Aug 25 14:21:26 Tower kernel: [Hardware Error]: CPU:0 (17:71:0) MC25_STATUS[-|CE|MiscV|-|-|-|-|-|-|CECC]: 0x98004000003e0000 Aug 25 14:21:26 Tower kernel: [Hardware Error]: IPID: 0x000100ff03830400 Aug 25 14:21:26 Tower kernel: [Hardware Error]: Bank 25 is reserved. Aug 25 14:21:26 Tower kernel: [Hardware Error]: cache level: RESV, tx: INSN Aug 25 14:23:00 Tower root: Fix Common Problems Version 2020.08.02 Aug 25 14:23:06 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server Aug 25 14:23:06 Tower root: mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. Please use the edac_mce_amd module instead. Aug 25 14:23:06 Tower root: CPU is unsupported Edited August 25, 2020 by montag Quote Link to comment
montag Posted August 25, 2020 Author Share Posted August 25, 2020 Just in case it helps: Asus TUF plus / Ryzen 3950X / 4x G.Skill 8GB 3600MHz Quote Link to comment
JorgeB Posted August 26, 2020 Share Posted August 26, 2020 Don't overclock your RAM, it's a known issue with Ryzen, see here. 2 Quote Link to comment
montag Posted August 26, 2020 Author Share Posted August 26, 2020 Thanks for the response and yup, I think this is the issue, I dug more and actually found that thread last night and made the changes to the bios (just set back to factory default) and it seems to have cleared it up. I was going to post in a few days if it was clear and mark this as solved. The strange thing is that I had been running the RAM @ 3600MHz for like 20+ days with no issue, but it just seemed to kick in after a reboot. I had mucked around with some PBO settings too somewhat recently to see if I could get a bit more performance out of the CPU... but I'll just play it safe. Thanks again. 1 Quote Link to comment
sfisher_x Posted October 10, 2020 Share Posted October 10, 2020 (edited) Hi, For my case I has able to resolve the errors by making the following adjustments. I've had no errors since doing to following: I manually set my RAM's voltage to 1.4V and manually set XMP(DOCP) to level 1. edit: my ram is running at 3200Mhz Edited October 11, 2020 by sfisher_x Quote Link to comment
Maddeen Posted December 13, 2020 Share Posted December 13, 2020 Hi, i got that error today for the first time after months without problems, but I didn't had mcelog installed. I installed it now for further errors. But for now - is there any chances to see whats causes this error without the mcelog? Or is it a 99% chance that in my case (Ryzen 7 with GSkill RAM @3600Mhz) its the RAM too? 🙈 Thanks - have a nice sunday. Quote Link to comment
Fastcompjason Posted March 31 Share Posted March 31 I know that this seemed like am almost dead post (issue), but I have also been experiencing this issue and I have latest BIOS "MSI PRESTIGE X570 CREATION (MS-7C36), BIOS 1.M0 10/14/2023". I have had this system running for a few YEARS now and only recently (last few months) have been having SEVERE system stability issues. I recently enabled the syslog mirror to USB and found that my last system crash (last week) had these following lines as the last entries in the log before it powered itself down. Mar 24 08:32:12 Cherry-Pit kernel: mce: [Hardware Error]: Machine check events logged Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: Corrected error, no action required. Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: CPU:0 (17:71:0) MC25_STATUS[-|CE|MiscV|-|-|-|-|CECC|-|-|-]: 0x98004000003e0000 Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: IPID: 0x000100ff03830400 Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: Platform Security Processor Ext. Error Code: 62 Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: cache level: RESV, tx: INSN Does anyone have anything to help me get to a fix for this? Thanks in advance Quote Link to comment
Squid Posted March 31 Share Posted March 31 2 hours ago, Fastcompjason said: Corrected error, no action required Quote Link to comment
Gex2501 Posted April 10 Share Posted April 10 Running Unraid 6.12.10 Hey all, fellow Unraider here in need of some help! I recently upgraded my server with a new CPU, RAM and HBA card. I have started to see the same kernel: mce: [Hardware Error]: Machine check events logged errors in my syslog. Before I go unplugging things and constantly rebooting to find the cause. Can someone please check out my diagnostics/logs and maybe narrow down the search for me? I thinking one of my new DIMMs is bad but could it be one of the CPU cores? Additionally I'm seeing high CPU_IOWAIT in glances. I figure my HDDs are just being slow but if you have any input on that issue as well I'd appreciate it. I have attached my unraid diagnotics as well as a snippet from the mcelog and the full mcelog. @Squid If you could take a look too I'd very much appreciate it! 🙏 mcelog.zip aorus-diagnostics-20240409-2040.zip Quote Link to comment
Gex2501 Posted April 10 Share Posted April 10 Whelp, server has crashed. Can anyone make heads or tails of this output on my monitor? See attached screenshot... Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.