Posted August 27, 20186 yr Can some one help me find what this error is talking about? Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged tower-diagnostics-20180827-1810.zip
August 27, 20186 yr It looks like a memory fault: Aug 21 02:13:19 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:17:53 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:17:53 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 02:17:53 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:17:53 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 02:27:46 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:27:46 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 02:50:25 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 02:50:25 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 04:06:07 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 04:06:07 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 04:55:49 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 04:55:49 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0) Aug 21 04:55:49 Tower kernel: mce: [Hardware Error]: Machine check events logged Aug 21 04:55:49 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
August 27, 20186 yr Author Well damn. I did a memory test and everything passed. I guess I could try again.
August 27, 20186 yr Author Yes to ECC, but I didn't look at the bios log. I will check that when I get a chance.
August 28, 20186 yr 13 hours ago, strahd_zarovich said: Yes to ECC, but I didn't look at the bios log. I will check that when I get a chance. Memtest may stress-test the memory and provoke errors. But ECC will hide any single-bit error so the Memtest program will not display them. You need to look in the BIOS log file to see the actual errors - then you can figure out which memory module that is causing the errors. Or if you possibly have multiple memory slots with issues. Obviously, you can get memory errors without a bad module if you have overclocked the memory or the memory controller.
August 28, 20186 yr 19 hours ago, John_M said: CPU#1Channel#0_DIMM#0 Does that pinpoint the faulty module? It would be interesting to see if it agrees with the BIOS log.
August 28, 20186 yr 51 minutes ago, John_M said: Does that pinpoint the faulty module? It would be interesting to see if it agrees with the BIOS log. The motherboard manufacturer might have decided on custom naming of the memory slots and have the BIOS present these custom names. The printout in the Linux kernel log is based on how the CPU is addressing the problematic memory module. So there just might be a difference between the BIOS log and the kernel log. Computers likes zero-indexed numbers, but lots of product managers prefers to start numbering from one. Not too many people are used to "the zeroth chair".
August 28, 20186 yr Author I just went through the event log in BIOS and there it is empty. There were no errors logged.
May 19, 20196 yr I seem to be having the same problem. Did you ever find a solution? I'm running on an IBM server and the integrated management is not showing any errors. Edited May 19, 20196 yr by Patb
Archived
This topic is now archived and is closed to further replies.