petchav Posted November 21, 2021 Share Posted November 21, 2021 Hello everyone, The fix common problems plugin found a hardware error on my server. I followed the instructions and installed mcelog via NerdPack and generated a diagnostic that I attach to this post. I hope you can help me to see more clearly, I also noticed that one of my hard drives had an orange icon 👎 since Thank you in advance unraid-diagnostics-20211121-1052.zip Quote Link to comment
Squid Posted November 21, 2021 Share Posted November 21, 2021 Nov 18 20:36:43 Unraid kernel: mce: [Hardware Error]: Machine check events logged Nov 18 20:36:43 Unraid kernel: [Hardware Error]: Corrected error, no action required. Nov 18 20:36:43 Unraid kernel: [Hardware Error]: CPU:1 (19:21:0) MC21_STATUS[Over|CE|-|AddrV|PCC|-|CECC|-|Poison|-]: 0xc6c7485541c35b5a Nov 18 20:36:43 Unraid kernel: [Hardware Error]: Error Addr: 0x0000000000000000 Nov 18 20:36:43 Unraid kernel: [Hardware Error]: IPID: 0x0000000000000000 Nov 18 20:36:43 Unraid kernel: [Hardware Error]: Bank 21 is reserved. Nov 18 20:36:43 Unraid kernel: [Hardware Error]: cache level: L2, tx: GEN If I was going to take a guess, it's the typical AMD doesn't know how to properly design a chip and is safe to ignore Quote Link to comment
petchav Posted November 21, 2021 Author Share Posted November 21, 2021 1 hour ago, Squid said: Nov 18 20:36:43 Unraid kernel: mce: [Hardware Error]: Machine check events logged Nov 18 20:36:43 Unraid kernel: [Hardware Error]: Corrected error, no action required. Nov 18 20:36:43 Unraid kernel: [Hardware Error]: CPU:1 (19:21:0) MC21_STATUS[Over|CE|-|AddrV|PCC|-|CECC|-|Poison|-]: 0xc6c7485541c35b5a Nov 18 20:36:43 Unraid kernel: [Hardware Error]: Error Addr: 0x0000000000000000 Nov 18 20:36:43 Unraid kernel: [Hardware Error]: IPID: 0x0000000000000000 Nov 18 20:36:43 Unraid kernel: [Hardware Error]: Bank 21 is reserved. Nov 18 20:36:43 Unraid kernel: [Hardware Error]: cache level: L2, tx: GEN If I was going to take a guess, it's the typical AMD doesn't know how to properly design a chip and is safe to ignore Thanks for the answer, so it would be a design flaw in my processor, I don't quite understand? So, I can ignore this error nothing serious really? Quote Link to comment
Squid Posted November 21, 2021 Share Posted November 21, 2021 Yes, ignore it. Maybe I was flippant, but seems that AMD always has trouble -> whether that's with C-States, performance issues on Win 11, issuing MCE's out of the blue, etc. Quote Link to comment
petchav Posted November 21, 2021 Author Share Posted November 21, 2021 Thanks for the explanations Quote Link to comment
Geck0 Posted March 22, 2022 Share Posted March 22, 2022 Hi, I was about to post a topic on this. However, this post seems to have a similar issue. I'm getting two errors. One is a Parity failure, which I cannot find in syslog and I've changed the Parity drive a couple of days ago after preclearing twice. I'm currently running a parity check at the moment. The other is Machine Check Events, which I've never had before. I started doing a memtest, but nothing really led me to believe it was an issue, so I abandoned it after a couple of hours testing. Could somebody cast an eye over my syslogs to see if Quote Mar 22 16:47:01 Nexus root: Fix Common Problems Version 2022.03.18 Mar 22 16:47:07 Nexus root: Fix Common Problems: Error: Machine Check Events detected on your server Mar 22 16:47:07 Nexus root: mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor. Please use the edac_mce_amd module instead. Mar 22 16:47:07 Nexus root: CPU is unsupported I looked in the nerd pack plugin and I cannot find this module, although from what I could find it doesn't work properly anyway. I also came across this; Quote Mar 21 22:29:27 Nexus nmbd[17837]: Mar 21 22:29:27 Nexus nmbd[17837]: ***** Mar 21 22:37:35 Nexus kernel: mce: [Hardware Error]: Machine check events logged Mar 21 22:37:35 Nexus kernel: [Hardware Error]: Deferred error, no action required. Mar 21 22:37:35 Nexus kernel: [Hardware Error]: CPU:1 (19:21:0) MC20_STATUS[-|-|MiscV|-|-|-|CECC|Deferred|Poison|-]: 0x894858538b482850 Mar 21 22:37:35 Nexus kernel: [Hardware Error]: IPID: 0x0000000000000000 Mar 21 22:37:35 Nexus kernel: [Hardware Error]: Coherent Slave Ext. Error Code: 8, SDP read response had no match in the CS queue. Mar 21 22:37:35 Nexus kernel: [Hardware Error]: cache level: RESV, tx: INSN Is it possible I have a memory module error? I can see in syslog there are a few unrelated errors to do with an SMB share, which I will need to fix up. However, could somebody please give this a peruse? nexus-diagnostics-20220322-1743.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.