February 1, 20215 yr 2 hours ago, Squid said: Best place to start: Thanks Squid! I'll do some research
June 25, 20215 yr I got the dreaded error warning today after a hard reboot (system was unresponsive) it's running now, parity check is going. Anyone have any clue if this is telling me something? Attached the full log but.... these don't sound like good news. Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: Machine check events logged Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 3: fe00000000800400 Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffff8108843d MISC ffffffff8108843d Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: PROCESSOR 0:a0655 TIME 1624618181 SOCKET 0 APIC 0 microcode e0 Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: Machine check events logged Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: fe00000000800400 Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: TSC 0 ADDR fffff8044651bb59 MISC fffff8044651bb59 Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: PROCESSOR 0:a0655 TIME 1624618181 SOCKET 0 APIC 0 microcode e0 syslog
August 2, 20214 yr Hi guys, Hope someone can shed a light on my Ryzen 5950 Unraid system - first time builder - please be patient with me... I am getting this after a few months and this month twice... might be the heat in the room... Aug 1 19:14:09 MyBongo kernel: mce: [Hardware Error]: Machine check events logged Aug 1 19:14:09 MyBongo kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 0: bc00080001010135 Aug 1 19:14:09 MyBongo kernel: mce: [Hardware Error]: TSC 0 ADDR fb8d39280 MISC d012000000000000 IPID 1000b000000000 Aug 1 19:14:09 MyBongo kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1627870430 SOCKET 0 APIC 8 microcode a201009 Much appreciate if I can safely ignore. I am planning to upgrade my Gigabyte X570 BIOS Master and also upgrade UNRAID OS to the latest... just being extra careful... mybongo-syslog-20210802-0321.zip
August 2, 20214 yr Some combinations of hardware will issue an mce during cpu initialization. This happened to you and can be safely ignored.
August 22, 20214 yr Hey all, I'm new to this. This morning, my Intel machine mysteriously rebooted (I guess the BIOS is set to reboot when encountering hardware problems? I know, I know, I should change this, and I will). When I logged in at around 2pm, I noticed it started a parity check on reboot. MCE tells me that there's a hardware error. Here's my zip file. Can anyone tell me if this is true? nas-diagnostics-20210822-1449.zip
August 22, 20214 yr The mce listed happened during core initialization, and isn't anything to worry about and happens on certain hardware combinations I would start with running a memtest for a pass or two
August 22, 20214 yr 11 minutes ago, Squid said: The mce listed happened during core initialization, and isn't anything to worry about and happens on certain hardware combinations I would start with running a memtest for a pass or two So some hardware combos are just doomed to randomly reboot? That sucks camel caboose. How do I go about running a memtest? I've never done one.
August 22, 20214 yr 3 minutes ago, Corvus said: So some hardware combos are just doomed to randomly reboot? I didn't say that. I said the mce happens on certain hardware combinations when initializing the cpu cores and is nothing to worry about. 3 minutes ago, Corvus said: How do I go about running a memtest? I've never done one. Its on the boot menu. If you're booting via UEFI, then you'll have to temporarily switch to Legacy in order to run it (or download a new stick from https://www.memtest86.com/)
August 22, 20214 yr 9 minutes ago, Squid said: I didn't say that. I said the mce happens on certain hardware combinations when initializing the cpu cores and is nothing to worry about. Its on the boot menu. If you're booting via UEFI, then you'll have to temporarily switch to Legacy in order to run it (or download a new stick from https://www.memtest86.com/) Ok that's gonna be a problem. You see, my particular motherboard has this known bug where if the secondary m.2 is occupied, it sometimes refuses to output display via the GPU until the m.2 is reseated - and that's not possible because I'd have to dismantle the entire system to do that. Sooo anywho, I have no direct display output capabilities whatsoever. Any alternative?
October 10, 20214 yr Hello All, Ive also received this error: Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged. Ive uploaded both methods of obtaining logs below. I don't know what I would be looking for. Any help would be appreciated syslog yianni-diagnostics-20211010-0708.zip
October 10, 20214 yr You can try running the memory at its rated speed of 2133 instead of overclocking it (XMP / AMP) to 2666
October 22, 20214 yr Hey all! I also have encountered this email. I do not know why and what this could mean. Any help would be greatly appreciated. yianni-diagnostics-20211021-1926.zip
October 22, 20214 yr Oct 13 00:44:23 Yianni kernel: mce: Uncorrected memory error in page 0x0 ignored Oct 13 00:44:23 Yianni kernel: Rebuild kernel with CONFIG_MEMORY_FAILURE=y for smarter handling Oct 13 00:44:23 Yianni kernel: [Hardware Error]: Deferred error, no action required. Oct 13 00:44:23 Yianni kernel: [Hardware Error]: CPU:1 (19:21:0) MC24_STATUS[Over|-|-|AddrV|-|-|UECC|Deferred|-|-]: 0xd589f68949fd8949 Oct 13 00:44:23 Yianni kernel: [Hardware Error]: Error Addr: 0x0000000000000000 Oct 13 00:44:23 Yianni kernel: [Hardware Error]: IPID: 0x0000000000000000 Oct 13 00:44:23 Yianni kernel: [Hardware Error]: System Management Unit Ext. Error Code: 61 Oct 13 00:44:23 Yianni kernel: [Hardware Error]: cache level: L1, tx: GEN Safe to ignore. It's just a known Ryzen issue where that happens on earlier kernels
November 5, 20214 yr Hey Everyone, i got this message as well. i built a new unraid machine, basically a new cpu, new psu, new mobo, new ram, just migrated the hard drives over. thehans-diagnostics-20211105-1617.zip
November 6, 20214 yr It's all good. Certain combinations of hardware issue an mce during processor initialization and is normal and to be expected
December 19, 20214 yr Received the same message. Log file is attached. I believe that this is the relevant portion. Dec 19 10:08:11 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server Dec 19 10:08:11 Tower root: Hardware event. This is not a software error. Dec 19 10:08:11 Tower root: MCE 0 Dec 19 10:08:11 Tower root: CPU 1 BANK 6 TSC e0507bb7fe8f4 Dec 19 10:08:11 Tower root: MISC a010414 ADDR bdbaeefc0 Dec 19 10:08:11 Tower root: TIME 1639771657 Fri Dec 17 14:07:37 2021 Dec 19 10:08:11 Tower root: MCG status: Dec 19 10:08:11 Tower root: MCi status: Dec 19 10:08:11 Tower root: Corrected error Dec 19 10:08:11 Tower root: MCi_MISC register valid Dec 19 10:08:11 Tower root: MCi_ADDR register valid Dec 19 10:08:11 Tower root: Threshold based error status: green Dec 19 10:08:11 Tower root: MCA: corrected filtering (some unreported errors in same region) Dec 19 10:08:11 Tower root: Generic CACHE Level-2 Data-Write Error Dec 19 10:08:11 Tower root: STATUS 8c2000400001114a MCGSTATUS 0 Dec 19 10:08:11 Tower root: MCGCAP 1c09 APICID 0 SOCKETID 0 Dec 19 10:08:11 Tower root: MICROCODE 1f Dec 19 10:08:11 Tower root: CPUID Vendor Intel Family 6 Model 44 Dec 19 10:08:11 Tower root: mcelog: warning: 8 bytes ignored in each record Dec 19 10:08:11 Tower root: mcelog: consider an update Dec 19 10:08:21 Tower emhttpd: read SMART /dev/sdm Dec 19 10:08:21 Tower emhttpd: read SMART /dev/sdj Dec 19 10:08:21 Tower emhttpd: read SMART /dev/sdh Dec 19 10:08:21 Tower emhttpd: read SMART /dev/sdn Dec 19 10:08:35 Tower emhttpd: read SMART /dev/sdi tower-diagnostics-20211219-1008.zip
December 19, 20214 yr This is a Dell Poweredge r510. The iDrac system event log doesn't show anything that wasn't done by me (the drive being removed was a mistake, and the power supply errors is me unplugging the redundant and re-plugging.) Edited December 19, 20214 yr by TheScrantonStrangler grammar.
January 11, 20224 yr My webui became "unresponsive". I was able to to move around the interface. It told me the array had stopped but my VMs and Docker containers were all responding and working fine. I tried shutting them down via the Unraid UI but it seemed like no commands were executing...even though the ui seemed to confirm the commands were being sent. This message was displayed when I clicked on the Apps tab. Luckily I took a picture because it didn't show again after a page refresh. I was able to shutdown my VM's and then had to shutdown the unraid server via the physical power button. On restart everything seems fine but I got the MCE error. Please see attached logs. Any guidance would be appreciated. Edited January 12, 20224 yr by stephack
January 11, 20224 yr The mce I believe is nothing to worry about and is thrown on occasion by Ryzen CPUs (ie: bug in CPU?) The error in CA would be that /tmp (or all of your RAM) was completely filled...
February 5, 20224 yr Had this pop up today, any insight? Much appreciated! Feb 5 08:30:59 Tower kernel: smpboot: CPU0: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (family: 0x6, model: 0x3f, stepping: 0x2) Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: Machine check events logged Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 17: ee2000000004017a Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 5f000000 MISC 4f00031e0000086 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1644067837 SOCKET 0 APIC 0 microcode 44 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: Machine check events logged Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 18: ee2000000004017a Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 5f100000 MISC 44f00031e0000086 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1644067837 SOCKET 0 APIC 0 microcode 44 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 19: ee2000000004017a Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 5f100080 MISC 84f00031e0000086 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1644067837 SOCKET 0 APIC 0 microcode 44 tower-diagnostics-20220205-0854.zip
February 5, 20224 yr Standard mce that gets issued on occasion with certain hardware combinations upon processor initialization. Ignore it.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.