August 31, 20178 yr First popped up yesterday. Fix common problems said: Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged I didn't have mcelog installed at the time, but installed after that. The error popped up again today but showing more detail in the syslog: Aug 31 04:40:59 Brahms1 root: Fix Common Problems: Error: Machine Check Events detected on your server Aug 31 04:40:59 Brahms1 root: Hardware event. This is not a software error. Aug 31 04:40:59 Brahms1 root: MCE 0 Aug 31 04:40:59 Brahms1 root: CPU 0 BANK 8 Aug 31 04:40:59 Brahms1 root: TIME 1504014611 Tue Aug 29 09:50:11 2017 Aug 31 04:40:59 Brahms1 root: MCG status: Aug 31 04:40:59 Brahms1 root: MCi status: Aug 31 04:40:59 Brahms1 root: Corrected error Aug 31 04:40:59 Brahms1 root: Error enabled Aug 31 04:40:59 Brahms1 root: MCA: MEMORY CONTROLLER GEN_CHANNELunspecified_ERR Aug 31 04:40:59 Brahms1 root: Transaction: Generic undefined request Aug 31 04:40:59 Brahms1 root: STATUS 900000400009008f MCGSTATUS 0 Aug 31 04:40:59 Brahms1 root: MCGCAP 1000c18 APICID 0 SOCKETID 0 Aug 31 04:40:59 Brahms1 root: CPUID Vendor Intel Family 6 Model 47 If I'm reading this right, it appears to be a ram issue on stick 8 on cpu 0, right? My assumption is that I should re-seat the ram, then run a memory test overnight. Please advise. Server is currently running 6.2.4. Can not upgrade it to 6.3.x because of known issues with HP servers and that version of unRaid. Can't upgrade to 6.4RC because of loss of passthrough, detailed (an unacknowledged by lime tech) here: Diagnostics attached. Thanks! brahms1-diagnostics-20170831-0930.zip
August 31, 20178 yr Community Expert 42 minutes ago, 1812 said: If I'm reading this right, it appears to be a ram issue on stick 8 on cpu 0, right? Looks to me like an ECC corrected memory error.
August 31, 20178 yr Perhaps Xeon E7 C-state issue? Here and here and the referenced HP c03282091 advisory as well as c02847572 and c03356780.
August 31, 20178 yr Author 2 hours ago, johnnie.black said: Looks to me like an ECC corrected memory error. So maybe nothing more than the ram doing its job? 1 hour ago, unevent said: Perhaps Xeon E7 C-state issue? Here and here and the referenced HP c03282091 advisory as well as c02847572 and c03356780. Thanks for this, will look into it further.
Archived
This topic is now archived and is closed to further replies.