Machine Check Events detected on your server


Go to solution Solved by smileybri,

Recommended Posts

Fix Common Problems has reported an error but the solution is quite unclear to me. 

 

Quote

Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the Unraid forums. The output of mcelog (if installed) has been logged  More Information

 

The More information link further discusses this mcelog as part of NerdPack, but the thing is I don't have "NerdPack" I have "NerdTools" and there is no "mcelog" toggle under NerdTools. Somewhere in my search I found something that suggested to use "edac_mce_amd" since I have an AMD CPU, but even that isn't found under NerdTools. 

 

I doubt my diagnostics (attached) are much help without the required log suggested by Fix Common Problems but if someone could give me some direction I thank you in advance. 

bkhunraid-diagnostics-20231220-1131.zip

Link to comment

Hi
Yes we would need you to install the mcelog tool then post your diagnostics. The reason being is this is a Linux tool (which you can install from the nerdpack plugin on CA) is used for logging and interpreting mces ( stands for Machine Check Exceptions) These exceptions are hardware errors reported by the cpu. Modern cpus have built-in error detection which when they detect a problem, they generate an mce. These errors include things like problems with the cpu itself, memory errors, bus errors, cache errors etc etc.
They can be -

Temporary errors - these are often corrected by the system (os)

Intermittent errors--  Occur every so often and not all the time. So these errors are harder to diagnose.

Fatal errors --  Obviously more serious because they can cause server crashes or even data corruption.

So when these errors are reported it is good to find out what they are as they can indicate potential hardware errors early before causing too much trouble.
Howver not all erros mean something is bad. Some errors can be down to quirks of the cpu. If i remember correctly certain Amd cpus have been known to generate some mces that can be considered harmless because they are just part of the processor's normal way of working so they generate an mce under normal conditions.

Also the way the cpus firmware or microcode is designed can lead to harmless mces being reported. Various mb bios settings also can especially those related to power management and overclocking etc. So you may want to see if there are bios updates for your mb.

But yes without the mce log it will not be possible to know what the errors are as the tool makes them human readable.
I hope this helps

Link to comment
Quote

bash: cd: /var/log/mcelog/: No such file or directory

 

This is in Tools/Syslog (now I am just searching for "mcelog"

Quote

 

Dec 16 04:35:31 BKHunraid kernel: mce: [Hardware Error]: Machine check events logged

Dec 16 04:35:31 BKHunraid kernel: [Hardware Error]: Corrected error, no action required.

Dec 16 04:35:31 BKHunraid kernel: [Hardware Error]: CPU:1 (19:21:2) MC18_STATUS[-|CE|-|-|-|-|-|-|-]: 0x80000006cfa001e3

Dec 16 04:35:31 BKHunraid kernel: [Hardware Error]: IPID: 0x0000000000000000

Dec 16 04:35:31 BKHunraid kernel: [Hardware Error]: Bank 18 is reserved.

Dec 16 04:35:31 BKHunraid kernel: [Hardware Error]: cache level: L3/GEN, tx: INSN, mem-tx: Wrong R4!

 

and this

Quote

 

Dec 17 04:30:01 BKHunraid root: Fix Common Problems Version 2023.10.08a

Dec 17 04:30:05 BKHunraid root: Fix Common Problems: Error: Machine Check Events detected on your server

Dec 17 04:30:05 BKHunraid root: mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor. Please use the edac_mce_amd module instead.

 

 

Edited by smileybri
Link to comment
  • Solution

So, I want to give SpaceInvaderOne credit as he did suggest that because I have an AMD CPU that the mcelog is not compatible so there is no way to see what those errors are. He also helped me to understand that the error is probably not something to worry about base don the info that is available. 

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.