Jump to content

Machine Check Events detected on your server


Recommended Posts

Hi, got this error for the first time, wondering if someone might be able to let me know if it's cause for concern or if there's a fix. As a sidenote I'm new to unraid and this board has been an invaluable resource for me, thanks in advance if you can help.

 

Here's the error

 

Aug 25 12:51:13 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 25 12:51:13 Tower kernel: [Hardware Error]: Corrected error, no action required.
Aug 25 12:51:13 Tower kernel: [Hardware Error]: CPU:0 (17:71:0) MC25_STATUS[-|CE|MiscV|-|-|-|-|-|-|CECC]: 0x98004000003e0000
Aug 25 12:51:13 Tower kernel: [Hardware Error]: IPID: 0x000100ff03830400
Aug 25 12:51:13 Tower kernel: [Hardware Error]: Bank 25 is reserved.
Aug 25 12:51:13 Tower kernel: [Hardware Error]: cache level: RESV, tx: INSN
Aug 25 12:51:48 Tower ntpd[2385]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Aug 25 12:59:00 Tower root: Fix Common Problems Version 2020.08.02
Aug 25 12:59:07 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server

 

I found a similar thread here, which is a very similar set up to my own

 

--------------EDIT--------------

 

After a reboot the error seems to have gone away, I had to reboot several times prior to the error, and it only occurred once. Maybe a glitch? Regardless seems okay now after an extended test in Fix Common Problems. Will run memtest86 later tonight for good measure.

 

--------------EDIT 2--------------

 

Nope it's back, and here's an update from the logs, after I had installed mcelog as Fix Common Problems had suggested:

 

Aug 25 14:21:26 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 25 14:21:26 Tower kernel: [Hardware Error]: Corrected error, no action required.
Aug 25 14:21:26 Tower kernel: [Hardware Error]: CPU:0 (17:71:0) MC25_STATUS[-|CE|MiscV|-|-|-|-|-|-|CECC]: 0x98004000003e0000
Aug 25 14:21:26 Tower kernel: [Hardware Error]: IPID: 0x000100ff03830400
Aug 25 14:21:26 Tower kernel: [Hardware Error]: Bank 25 is reserved.
Aug 25 14:21:26 Tower kernel: [Hardware Error]: cache level: RESV, tx: INSN
Aug 25 14:23:00 Tower root: Fix Common Problems Version 2020.08.02
Aug 25 14:23:06 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server
Aug 25 14:23:06 Tower root: mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. Please use the edac_mce_amd module instead.
Aug 25 14:23:06 Tower root: CPU is unsupported

Edited by montag
Link to comment

Thanks for the response and yup, I think this is the issue, I dug more and actually found that thread last night and made the changes to the bios (just set back to factory default) and it seems to have cleared it up. I was going to post in a few days if it was clear and mark this as solved.

 

The strange thing is that I had been running the RAM @ 3600MHz for like 20+ days with no issue, but it just seemed to kick in after a reboot. I had mucked around with some PBO settings too somewhat recently to see if I could get a bit more performance out of the CPU... but I'll just play it safe. Thanks again.

  • Like 1
Link to comment
  • 1 month later...

Hi,
For my case I has able to resolve the errors by making the following adjustments.

I've had no errors since doing to following:

I manually set my RAM's voltage to 1.4V and manually set XMP(DOCP) to level 1.

 

edit: my ram is running at 3200Mhz

 

Edited by sfisher_x
Link to comment
  • 2 months later...

Hi, 

 

i got that error today for the first time after months without problems, but I didn't had mcelog installed.

I installed it now for further errors.

 

But for now - is there any chances to see whats causes this error without the mcelog?

Or is it a 99% chance that in my case (Ryzen 7 with GSkill RAM @3600Mhz) its the RAM too?  🙈

 

Thanks - have a nice sunday.

Link to comment
  • 3 years later...

I know that this seemed like am almost dead post (issue), but I have also been experiencing this issue and I have latest BIOS "MSI PRESTIGE X570 CREATION (MS-7C36), BIOS 1.M0 10/14/2023". I have had this system running for a few YEARS now and only recently (last few months) have been having SEVERE system stability issues. I recently enabled the syslog mirror to USB and found that my last system crash (last week) had these following lines as the last entries in the log before it powered itself down.

 

Mar 24 08:32:12 Cherry-Pit kernel: mce: [Hardware Error]: Machine check events logged
Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: Corrected error, no action required.
Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: CPU:0 (17:71:0) MC25_STATUS[-|CE|MiscV|-|-|-|-|CECC|-|-|-]: 0x98004000003e0000
Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: IPID: 0x000100ff03830400
Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: Platform Security Processor Ext. Error Code: 62
Mar 24 08:32:12 Cherry-Pit kernel: [Hardware Error]: cache level: RESV, tx: INSN

 

Does anyone have anything to help me get to a fix for this? Thanks in advance

Link to comment
  • 2 weeks later...

Running Unraid 6.12.10

 

Hey all, fellow Unraider here in need of some help! I recently upgraded my server with a new CPU, RAM and HBA card. I have started to see the same

 

kernel: mce: [Hardware Error]: Machine check events logged

 

errors in my syslog. Before I go unplugging things and constantly rebooting to find the cause. Can someone please check out my diagnostics/logs and maybe narrow down the search for me? I thinking one of my new DIMMs is bad but could it be one of the CPU cores?

 

Additionally I'm seeing high CPU_IOWAIT in glances. I figure my HDDs are just being slow but if you have any input on that issue as well I'd appreciate it.

 

I have attached my unraid diagnotics as well as a snippet from the mcelog and the full mcelog.

 

@Squid If you could take a look too I'd very much appreciate it! 🙏

mcelog.zip aorus-diagnostics-20240409-2040.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...