Machine Check Events detected on your server


Maddeen

Recommended Posts

Hi there,

 

for the first time after months I get this error message from Fix Common Problems in November

Machine Check Events detected on your server

I want to digg deeper, and installed (as adviced) mcelog-161-x86_64-1.txz via the nerd pack.

 

Last night I get this error again - but sadly I had to learn, that the mcelog is only capable of INTEL CPUs... and the system log now tells me

Dec 20 04:32:55 v1ew-s0urce root: Fix Common Problems: Error: Machine Check Events detected on your server 
Dec 20 04:32:55 v1ew-s0urce root: mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. Please use the edac_mce_amd module instead. 
Dec 20 04:32:55 v1ew-s0urce root: CPU is unsupported

But the recommended module "edac_mce_amd" is not part of the Nerd Pack. 

 

Can anyone tell me how I install this module to make sure that I can validate/check whats the problem of my server?

 

Thanks for any help - have a nice sunday. 

Link to comment

edac_mce_amd is included in Unraid for the last couple of years

 

The message is simply a reminder to everyone else in the world that the author(s) of mcelog have no idea how to properly word an informational sentence or they are not native English speakers and utilized a TI-99/4A to translate the actual message into English.

 

IE: It's simply telling you that the mcelog default driver (Intel) doesn't support the chip.  Its automatically using the AMD module instead

  • Like 1
Link to comment
Dec 14 14:48:32 v1ew-s0urce kernel: [Hardware Error]: Corrected error, no action required.
Dec 14 14:48:32 v1ew-s0urce kernel: [Hardware Error]: CPU:0 (17:71:0) MC27_STATUS[-|CE|MiscV|-|-|-|SyndV|-|-|-]: 0x982000000002080b
Dec 14 14:48:32 v1ew-s0urce kernel: [Hardware Error]: IPID: 0x0001002e00000500, Syndrome: 0x000000005a020001
Dec 14 14:48:32 v1ew-s0urce kernel: [Hardware Error]: Power, Interrupts, etc. Ext. Error Code: 2, Link Error.
Dec 14 14:48:32 v1ew-s0urce kernel: [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout)

IIRC, Ryzen has problems with Overclocking memory in some circumstances.  Run the memory at the SPD speed to see if that makes a difference.

  • Like 1
Link to comment

Thanks - but can you say if this is something to worry about?

Server was not unresponsive or anything like this - just this message - no noticeable impacts. Server runs fine - Docker runs fine - VMs runs fine. 

And this message only popped up twice since changing to Ryzen at 21th of October... 

 

I want to understand it because I use the server as a gaming rig as well and running it on stock speeds will decrease the fps a lot. Because stock speed is at a funny 2133MHz. 🙈
I bought a high expensive RAM (G-Skill Trident) to reach 3600MHz out of the box because it's the Ryzens sweet spot. 

 

So I want to know if I can live with this error and  having better fps. Thank you again

Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.