Jump to content

Error from Fix Common Problems

Featured Replies

Posted

Can some one help me find what this error is talking about?

 

 

Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged

tower-diagnostics-20180827-1810.zip

It looks like a memory fault:

Aug 21 02:13:19 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 21 02:17:53 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 21 02:17:53 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
Aug 21 02:17:53 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 21 02:17:53 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
Aug 21 02:27:46 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 21 02:27:46 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
Aug 21 02:50:25 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 21 02:50:25 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
Aug 21 04:06:07 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 21 04:06:07 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
Aug 21 04:55:49 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 21 04:55:49 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
Aug 21 04:55:49 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug 21 04:55:49 Tower kernel: EDAC MC1: 1 CE error on CPU#1Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)

 

  • Author

Well damn. I did a memory test and everything passed. I guess I could try again. 

Are you using ECC RAM? What does the BIOS log say?

  • Author

Yes to ECC, but I didn't look at the bios log. I will check that when I get a chance.

13 hours ago, strahd_zarovich said:

Yes to ECC, but I didn't look at the bios log. I will check that when I get a chance.

 

Memtest may stress-test the memory and provoke errors. But ECC will hide any single-bit error so the Memtest program will not display them. You need to look in the BIOS log file to see the actual errors - then you can figure out which memory module that is causing the errors. Or if you possibly have multiple memory slots with issues. Obviously, you can get memory errors without a bad module if you have overclocked the memory or the memory controller.

19 hours ago, John_M said:

CPU#1Channel#0_DIMM#0

 

Does that pinpoint the faulty module? It would be interesting to see if it agrees with the BIOS log.

51 minutes ago, John_M said:

 

Does that pinpoint the faulty module? It would be interesting to see if it agrees with the BIOS log.

 

The motherboard manufacturer might have decided on custom naming of the memory slots and have the BIOS present these custom names. The printout in the Linux kernel log is based on how the CPU is addressing the problematic memory module. So there just might be a difference between the BIOS log and the kernel log. Computers likes zero-indexed numbers, but lots of product managers prefers to start numbering from one. Not too many people are used to "the zeroth chair".

  • Author

I just went through the event log in BIOS and there it is empty. There were no errors logged.

  • 8 months later...

I seem to be having the same problem. Did you ever find a solution?

 

I'm running on an IBM server and the integrated management is not showing any errors.

Edited by Patb

Archived

This topic is now archived and is closed to further replies.