aco262 Posted January 4, 2020 Share Posted January 4, 2020 Hello, I am getting an error on the fix common problems plugin and it told me to post my mcelog for more help. Is anyone able to tell me what any of this means? I'm pretty new to linux. syslog Thanks! Quote Link to comment
Squid Posted January 4, 2020 Share Posted January 4, 2020 Bad memory Jan 3 18:57:32 Linus kernel: mce: [Hardware Error]: Machine check events logged Jan 3 18:57:32 Linus kernel: EDAC MC0: 1 CE read ECC error on CPU#0Channel#0_DIMM#0 (channel:0 slot:0 page:0x21e664 offset:0x4c0 grain:8 syndrome:0x20 - read error) Jan 3 18:58:20 Linus kernel: mce: [Hardware Error]: Machine check events logged Jan 3 18:58:20 Linus kernel: EDAC MC0: 1 CE read ECC error on CPU#0Channel#0_DIMM#0 (channel:0 slot:0 page:0x21e664 offset:0x4c0 grain:8 syndrome:0x20 - read error) Jan 3 18:58:20 Linus kernel: EDAC i7core: New Corrected error(s): dimm0: +1, dimm1: +0, dimm2 +0 Quote Link to comment
aco262 Posted January 4, 2020 Author Share Posted January 4, 2020 4 minutes ago, Squid said: Bad memory Jan 3 18:57:32 Linus kernel: mce: [Hardware Error]: Machine check events logged Jan 3 18:57:32 Linus kernel: EDAC MC0: 1 CE read ECC error on CPU#0Channel#0_DIMM#0 (channel:0 slot:0 page:0x21e664 offset:0x4c0 grain:8 syndrome:0x20 - read error) Jan 3 18:58:20 Linus kernel: mce: [Hardware Error]: Machine check events logged Jan 3 18:58:20 Linus kernel: EDAC MC0: 1 CE read ECC error on CPU#0Channel#0_DIMM#0 (channel:0 slot:0 page:0x21e664 offset:0x4c0 grain:8 syndrome:0x20 - read error) Jan 3 18:58:20 Linus kernel: EDAC i7core: New Corrected error(s): dimm0: +1, dimm1: +0, dimm2 +0 Thank you! Is this something I should worry about? Ill be honest the error has been there for probably around 6 months. Quote Link to comment
JonathanM Posted January 5, 2020 Share Posted January 5, 2020 On 1/4/2020 at 11:02 AM, aco262 said: Is this something I should worry about? Only if you care about the uptime on your server. If the memory errors are ever not correctable it will lock solid or restart to attempt to avoid further corruption. It's also possible the memory is fine, just being pushed past its limits. I'd play with slowing the memory down or maybe increasing the voltage just a tiny bit and see of the errors change. Quote Link to comment
Squid Posted January 5, 2020 Share Posted January 5, 2020 And without digging into the diagnostics and googling a ton of stuff, it should be noted that any and all overclocks introduce instability into any system and any OS. XMP / AMP memory profiles (which unfortunately tend to be used by default nowadays in the BIOS) is an overclock. Just because at the time of manufacture a DIMM could handle the overclock by XMP/AMP does not mean that a day, a week, a year down the road that it still can. YMMV Quote Link to comment
Squid Posted January 5, 2020 Share Posted January 5, 2020 On 1/4/2020 at 11:02 AM, aco262 said: Thank you! Is this something I should worry about? Ill be honest the error has been there for probably around 6 months. Isn't the whole point of ECC memory that you should worry about it once the error happens? If you aren't going to replace the memory once errors begin to happen, then why bother with the extra expense of ECC in the first place? Completely up to you though. The errors are being corrected for the time being. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.