Squid Posted February 1, 2021 Share Posted February 1, 2021 Best place to start: 1 Quote Link to comment
spacer00ster Posted February 1, 2021 Share Posted February 1, 2021 2 hours ago, Squid said: Best place to start: Thanks Squid! I'll do some research Quote Link to comment
trevisthomas Posted June 25, 2021 Share Posted June 25, 2021 I got the dreaded error warning today after a hard reboot (system was unresponsive) it's running now, parity check is going. Anyone have any clue if this is telling me something? Attached the full log but.... these don't sound like good news. Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: Machine check events logged Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 3: fe00000000800400 Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffff8108843d MISC ffffffff8108843d Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: PROCESSOR 0:a0655 TIME 1624618181 SOCKET 0 APIC 0 microcode e0 Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: Machine check events logged Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: fe00000000800400 Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: TSC 0 ADDR fffff8044651bb59 MISC fffff8044651bb59 Jun 25 05:50:04 TheBronze kernel: mce: [Hardware Error]: PROCESSOR 0:a0655 TIME 1624618181 SOCKET 0 APIC 0 microcode e0 syslog Quote Link to comment
Squid Posted June 25, 2021 Share Posted June 25, 2021 Harmless mce that happened during cpu core initialization. 1 Quote Link to comment
dhawk2k Posted August 2, 2021 Share Posted August 2, 2021 Hi guys, Hope someone can shed a light on my Ryzen 5950 Unraid system - first time builder - please be patient with me... I am getting this after a few months and this month twice... might be the heat in the room... Aug 1 19:14:09 MyBongo kernel: mce: [Hardware Error]: Machine check events logged Aug 1 19:14:09 MyBongo kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 0: bc00080001010135 Aug 1 19:14:09 MyBongo kernel: mce: [Hardware Error]: TSC 0 ADDR fb8d39280 MISC d012000000000000 IPID 1000b000000000 Aug 1 19:14:09 MyBongo kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1627870430 SOCKET 0 APIC 8 microcode a201009 Much appreciate if I can safely ignore. I am planning to upgrade my Gigabyte X570 BIOS Master and also upgrade UNRAID OS to the latest... just being extra careful... mybongo-syslog-20210802-0321.zip Quote Link to comment
Squid Posted August 2, 2021 Share Posted August 2, 2021 Some combinations of hardware will issue an mce during cpu initialization. This happened to you and can be safely ignored. Quote Link to comment
Corvus Posted August 22, 2021 Share Posted August 22, 2021 Hey all, I'm new to this. This morning, my Intel machine mysteriously rebooted (I guess the BIOS is set to reboot when encountering hardware problems? I know, I know, I should change this, and I will). When I logged in at around 2pm, I noticed it started a parity check on reboot. MCE tells me that there's a hardware error. Here's my zip file. Can anyone tell me if this is true? nas-diagnostics-20210822-1449.zip Quote Link to comment
Squid Posted August 22, 2021 Share Posted August 22, 2021 The mce listed happened during core initialization, and isn't anything to worry about and happens on certain hardware combinations I would start with running a memtest for a pass or two Quote Link to comment
Corvus Posted August 22, 2021 Share Posted August 22, 2021 11 minutes ago, Squid said: The mce listed happened during core initialization, and isn't anything to worry about and happens on certain hardware combinations I would start with running a memtest for a pass or two So some hardware combos are just doomed to randomly reboot? That sucks camel caboose. How do I go about running a memtest? I've never done one. Quote Link to comment
Squid Posted August 22, 2021 Share Posted August 22, 2021 3 minutes ago, Corvus said: So some hardware combos are just doomed to randomly reboot? I didn't say that. I said the mce happens on certain hardware combinations when initializing the cpu cores and is nothing to worry about. 3 minutes ago, Corvus said: How do I go about running a memtest? I've never done one. Its on the boot menu. If you're booting via UEFI, then you'll have to temporarily switch to Legacy in order to run it (or download a new stick from https://www.memtest86.com/) Quote Link to comment
Corvus Posted August 22, 2021 Share Posted August 22, 2021 9 minutes ago, Squid said: I didn't say that. I said the mce happens on certain hardware combinations when initializing the cpu cores and is nothing to worry about. Its on the boot menu. If you're booting via UEFI, then you'll have to temporarily switch to Legacy in order to run it (or download a new stick from https://www.memtest86.com/) Ok that's gonna be a problem. You see, my particular motherboard has this known bug where if the secondary m.2 is occupied, it sometimes refuses to output display via the GPU until the m.2 is reseated - and that's not possible because I'd have to dismantle the entire system to do that. Sooo anywho, I have no direct display output capabilities whatsoever. Any alternative? Quote Link to comment
drumking53 Posted October 10, 2021 Share Posted October 10, 2021 Hello All, Ive also received this error: Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged. Ive uploaded both methods of obtaining logs below. I don't know what I would be looking for. Any help would be appreciated syslog yianni-diagnostics-20211010-0708.zip Quote Link to comment
Squid Posted October 10, 2021 Share Posted October 10, 2021 You can try running the memory at its rated speed of 2133 instead of overclocking it (XMP / AMP) to 2666 Quote Link to comment
drumking53 Posted October 22, 2021 Share Posted October 22, 2021 Hey all! I also have encountered this email. I do not know why and what this could mean. Any help would be greatly appreciated. yianni-diagnostics-20211021-1926.zip Quote Link to comment
Squid Posted October 22, 2021 Share Posted October 22, 2021 Oct 13 00:44:23 Yianni kernel: mce: Uncorrected memory error in page 0x0 ignored Oct 13 00:44:23 Yianni kernel: Rebuild kernel with CONFIG_MEMORY_FAILURE=y for smarter handling Oct 13 00:44:23 Yianni kernel: [Hardware Error]: Deferred error, no action required. Oct 13 00:44:23 Yianni kernel: [Hardware Error]: CPU:1 (19:21:0) MC24_STATUS[Over|-|-|AddrV|-|-|UECC|Deferred|-|-]: 0xd589f68949fd8949 Oct 13 00:44:23 Yianni kernel: [Hardware Error]: Error Addr: 0x0000000000000000 Oct 13 00:44:23 Yianni kernel: [Hardware Error]: IPID: 0x0000000000000000 Oct 13 00:44:23 Yianni kernel: [Hardware Error]: System Management Unit Ext. Error Code: 61 Oct 13 00:44:23 Yianni kernel: [Hardware Error]: cache level: L1, tx: GEN Safe to ignore. It's just a known Ryzen issue where that happens on earlier kernels Quote Link to comment
Lawllipops Posted November 5, 2021 Share Posted November 5, 2021 Hey Everyone, i got this message as well. i built a new unraid machine, basically a new cpu, new psu, new mobo, new ram, just migrated the hard drives over. thehans-diagnostics-20211105-1617.zip Quote Link to comment
Squid Posted November 6, 2021 Share Posted November 6, 2021 It's all good. Certain combinations of hardware issue an mce during processor initialization and is normal and to be expected Quote Link to comment
Lawllipops Posted November 6, 2021 Share Posted November 6, 2021 Thanks Squid! I appreciate It! Quote Link to comment
TheScrantonStrangler Posted December 19, 2021 Share Posted December 19, 2021 Received the same message. Log file is attached. I believe that this is the relevant portion. Dec 19 10:08:11 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server Dec 19 10:08:11 Tower root: Hardware event. This is not a software error. Dec 19 10:08:11 Tower root: MCE 0 Dec 19 10:08:11 Tower root: CPU 1 BANK 6 TSC e0507bb7fe8f4 Dec 19 10:08:11 Tower root: MISC a010414 ADDR bdbaeefc0 Dec 19 10:08:11 Tower root: TIME 1639771657 Fri Dec 17 14:07:37 2021 Dec 19 10:08:11 Tower root: MCG status: Dec 19 10:08:11 Tower root: MCi status: Dec 19 10:08:11 Tower root: Corrected error Dec 19 10:08:11 Tower root: MCi_MISC register valid Dec 19 10:08:11 Tower root: MCi_ADDR register valid Dec 19 10:08:11 Tower root: Threshold based error status: green Dec 19 10:08:11 Tower root: MCA: corrected filtering (some unreported errors in same region) Dec 19 10:08:11 Tower root: Generic CACHE Level-2 Data-Write Error Dec 19 10:08:11 Tower root: STATUS 8c2000400001114a MCGSTATUS 0 Dec 19 10:08:11 Tower root: MCGCAP 1c09 APICID 0 SOCKETID 0 Dec 19 10:08:11 Tower root: MICROCODE 1f Dec 19 10:08:11 Tower root: CPUID Vendor Intel Family 6 Model 44 Dec 19 10:08:11 Tower root: mcelog: warning: 8 bytes ignored in each record Dec 19 10:08:11 Tower root: mcelog: consider an update Dec 19 10:08:21 Tower emhttpd: read SMART /dev/sdm Dec 19 10:08:21 Tower emhttpd: read SMART /dev/sdj Dec 19 10:08:21 Tower emhttpd: read SMART /dev/sdh Dec 19 10:08:21 Tower emhttpd: read SMART /dev/sdn Dec 19 10:08:35 Tower emhttpd: read SMART /dev/sdi tower-diagnostics-20211219-1008.zip Quote Link to comment
Squid Posted December 19, 2021 Share Posted December 19, 2021 See if the System Event Log in the BIOS shows any info Quote Link to comment
TheScrantonStrangler Posted December 19, 2021 Share Posted December 19, 2021 (edited) This is a Dell Poweredge r510. The iDrac system event log doesn't show anything that wasn't done by me (the drive being removed was a mistake, and the power supply errors is me unplugging the redundant and re-plugging.) Edited December 19, 2021 by TheScrantonStrangler grammar. Quote Link to comment
stephack Posted January 11, 2022 Share Posted January 11, 2022 (edited) My webui became "unresponsive". I was able to to move around the interface. It told me the array had stopped but my VMs and Docker containers were all responding and working fine. I tried shutting them down via the Unraid UI but it seemed like no commands were executing...even though the ui seemed to confirm the commands were being sent. This message was displayed when I clicked on the Apps tab. Luckily I took a picture because it didn't show again after a page refresh. I was able to shutdown my VM's and then had to shutdown the unraid server via the physical power button. On restart everything seems fine but I got the MCE error. Please see attached logs. Any guidance would be appreciated. Edited January 12, 2022 by stephack Quote Link to comment
Squid Posted January 11, 2022 Share Posted January 11, 2022 The mce I believe is nothing to worry about and is thrown on occasion by Ryzen CPUs (ie: bug in CPU?) The error in CA would be that /tmp (or all of your RAM) was completely filled... 1 Quote Link to comment
Wyllic Posted February 5, 2022 Share Posted February 5, 2022 Had this pop up today, any insight? Much appreciated! Feb 5 08:30:59 Tower kernel: smpboot: CPU0: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (family: 0x6, model: 0x3f, stepping: 0x2) Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: Machine check events logged Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 17: ee2000000004017a Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 5f000000 MISC 4f00031e0000086 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1644067837 SOCKET 0 APIC 0 microcode 44 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: Machine check events logged Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 18: ee2000000004017a Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 5f100000 MISC 44f00031e0000086 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1644067837 SOCKET 0 APIC 0 microcode 44 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 19: ee2000000004017a Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 5f100080 MISC 84f00031e0000086 Feb 5 08:30:59 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:306f2 TIME 1644067837 SOCKET 0 APIC 0 microcode 44 tower-diagnostics-20220205-0854.zip Quote Link to comment
Squid Posted February 5, 2022 Share Posted February 5, 2022 Standard mce that gets issued on occasion with certain hardware combinations upon processor initialization. Ignore it. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.