strahd_zarovich Posted August 27, 2018 Share Posted August 27, 2018 Can some one help me find what this error is talking about? Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged tower-diagnostics-20180827-1810.zip Quote Link to comment
John_M Posted August 27, 2018 Share Posted August 27, 2018 It looks like a memory fault: Aug 21 02:13:19 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:17:53 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:17:53 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 02:17:53 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:17:53 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 02:27:46 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:27:46 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 02:50:25 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:50:25 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 04:06:07 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 04:06:07 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 04:55:49 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 04:55:49 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 04:55:49 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 04:55:49 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Quote Link to comment
strahd_zarovich Posted August 27, 2018 Author Share Posted August 27, 2018 Well damn. I did a memory test and everything passed. I guess I could try again. Quote Link to comment
John_M Posted August 27, 2018 Share Posted August 27, 2018 Are you using ECC RAM? What does the BIOS log say? Quote Link to comment
strahd_zarovich Posted August 27, 2018 Author Share Posted August 27, 2018 Yes to ECC, but I didn't look at the bios log. I will check that when I get a chance. Quote Link to comment
pwm Posted August 28, 2018 Share Posted August 28, 2018 13 hours ago, strahd_zarovich said: Yes to ECC, but I didn't look at the bios log. I will check that when I get a chance. Memtest may stress-test the memory and provoke errors. But ECC will hide any single-bit error so the Memtest program will not display them. You need to look in the BIOS log file to see the actual errors - then you can figure out which memory module that is causing the errors. Or if you possibly have multiple memory slots with issues. Obviously, you can get memory errors without a bad module if you have overclocked the memory or the memory controller. Quote Link to comment
John_M Posted August 28, 2018 Share Posted August 28, 2018 19 hours ago, John_M said: CPU#1Channel#0_DIMM#0 Does that pinpoint the faulty module? It would be interesting to see if it agrees with the BIOS log. Quote Link to comment
pwm Posted August 28, 2018 Share Posted August 28, 2018 51 minutes ago, John_M said: Does that pinpoint the faulty module? It would be interesting to see if it agrees with the BIOS log. The motherboard manufacturer might have decided on custom naming of the memory slots and have the BIOS present these custom names. The printout in the Linux kernel log is based on how the CPU is addressing the problematic memory module. So there just might be a difference between the BIOS log and the kernel log. Computers likes zero-indexed numbers, but lots of product managers prefers to start numbering from one. Not too many people are used to "the zeroth chair". Quote Link to comment
strahd_zarovich Posted August 28, 2018 Author Share Posted August 28, 2018 I just went through the event log in BIOS and there it is empty. There were no errors logged. Quote Link to comment
Patb Posted May 19, 2019 Share Posted May 19, 2019 (edited) I seem to be having the same problem. Did you ever find a solution? I'm running on an IBM server and the integrated management is not showing any errors. Edited May 19, 2019 by Patb Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.