Machine Check Events detected on your server


Recommended Posts

17 minutes ago, Nanobug said:

I tried using the "mcelog" command, but it returned "mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor.  Please use the edac_mce_amd module instead.
CPU is unsupported"

 

The CPU is an R9 5900X.

Do not bother for that "error", it's a warning, for amd edac_mce_amd is loaded correctly in unraid, but you still receive the mce "error"; mce errors will be correctly logged in the syslog. It's an old bug in mce.

 

May 24 23:27:19 NanoStorage kernel: mce: [Hardware Error]: Machine check events logged
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: Corrected error, no action required.
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: CPU:1 (19:21:0) MC9_STATUS[-|CE|-|-|-|-|-|-|-]: 0x8000000271c31163
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: IPID: 0x0000000000000000
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: L3 Cache Ext. Error Code: 3, L3M Tag Multi-way-hit Error.
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: cache level: L3/GEN, tx: INSN

Temperature problem, thermal paste, not enough cooling, dust in fans

Cpu failing

Motherboard failing

Edited by ghost82
Link to comment
51 minutes ago, ghost82 said:

Do not bother for that "error", it's a warning, for amd edac_mce_amd is loaded correctly in unraid, but you still receive the mce "error"; mce errors will be correctly logged in the syslog. It's an old bug in mce.

 

May 24 23:27:19 NanoStorage kernel: mce: [Hardware Error]: Machine check events logged
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: Corrected error, no action required.
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: CPU:1 (19:21:0) MC9_STATUS[-|CE|-|-|-|-|-|-|-]: 0x8000000271c31163
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: IPID: 0x0000000000000000
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: L3 Cache Ext. Error Code: 3, L3M Tag Multi-way-hit Error.
May 24 23:27:19 NanoStorage kernel: [Hardware Error]: cache level: L3/GEN, tx: INSN

Temperature problem, thermal paste, not enough cooling, dust in fans

Cpu failing

Motherboard failing

It doesn't show the CPU temperature atm. but it's at less than 10 % load. The room is (depending on what you define it as)

somewhat hot. 23,6 C right now.
When it was showing the temperatures, it was't above 50 C at any point.


Is there a way to check if the CPU or MB is failing, other than replacing it?
 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.