unRAID server rebooted on its own after about 160 days uptime (mce errors) , diagnostic attachted


Recommended Posts

Good evening all,

 

I've just gotten a notice of a parity check running as well as a notice from fix common problems that errors were found (on my phone).  Server is pretty new, however has been ROCK solid since going live. Replaced old hardware about 10 months ago.  The fix common problems suggest installing 'mcelog' via NerdPack which I *just* did.  So, I guess if it happens again I'll have additional logging.  Server's been back up for ~2h now, all green dots, and parity hasn't found any errors yet.  My signature has my current build specs.

 

That said, I do see the following in the syslog, but that appears to have appeared *at* boot, and not the cause?  I assume the logs were wiped at reboot which is what the mcelog will protect if this happens again?

 

Mar 15 19:01:13 NAS1 kernel: ACPI: Early table checksum verification disabled
Mar 15 19:01:13 NAS1 kernel: IOAPIC[0]: apic_id 25, version 33, address 0xfec00000, GSI 0-23
Mar 15 19:01:13 NAS1 kernel: IOAPIC[1]: apic_id 26, version 33, address 0xfec01000, GSI 24-55
Mar 15 19:01:13 NAS1 kernel: Kernel command line: BOOT_IMAGE=/bzimage initrd=/bzroot
Mar 15 19:01:13 NAS1 kernel: Memory: 65630376K/67017592K available (10242K kernel code, 1183K rwdata, 2348K rodata, 1120K init, 1596K bss, 1386960K reserved, 0K cma-reserved)
Mar 15 19:01:13 NAS1 kernel: Console: colour VGA+ 80x25
Mar 15 19:01:13 NAS1 kernel: Calibrating delay loop (skipped), value calculated using timer frequency.. 7585.67 BogoMIPS (lpj=3792837)
Mar 15 19:01:13 NAS1 kernel: smpboot: CPU0: AMD Ryzen 9 3900X 12-Core Processor (family: 0x17, model: 0x71, stepping: 0x0)
Mar 15 19:01:13 NAS1 kernel: mce: [Hardware Error]: Machine check events logged
Mar 15 19:01:13 NAS1 kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 7: bea020000004017b
Mar 15 19:01:13 NAS1 kernel: mce: [Hardware Error]: TSC 0 ADDR 100a05f20 MISC d012000500000000 SYND 9e3f1d470707 IPID 700b020350000 
Mar 15 19:01:13 NAS1 kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1647385258 SOCKET 0 APIC 0 microcode 8701021

 

Any help / advice would be greatly appreciated.

 

nas1-diagnostics-20220315-2049.zip

Edited by Dmtalon
clarity
Link to comment
  • Dmtalon changed the title to unRAID server rebooted on its own after about 160 days uptime (mce errors) , diagnostic attachted

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.