delaney Posted July 10, 2020 Posted July 10, 2020 (edited) Hi UNRAIDer's, As always, thanks heaps in advance for any insight you can share on troubleshooting my MCE's I am posting this also in case this information is helpful for the great work occurring in the 6.9 release space. I've attached my diagnostics from my primary UNRAID server which has rebooted once, and locked up once this week with MCE's. I don't have the details from the first event - I put it down to the "that's weird. oh well stuff goes sideways sometimes if running for long enough" and kept going. So the attached diagnostics are from the second event. Key points that I *think* are helpful: - Earlier in the week (~ 7 days ago) - I upgraded both my primary and secondary UNRAID servers to 6.9 Beta 22. - To date - no issues with the secondary server (running different hardware) - but also under a different load / usage profile - both MCE events occurred after the move to 6.9 Beta 22 - both machines have been running UNRAID in different configurations for years - undergoing incremental hardware upgrades along the way. - This morning I ran a memtest for ~4 3/4 hours on the affected machine with no errors detected. - I do have some services that are cyclical/self referencing (my syslog server, pfsense e.g.) so there are some errors in the startup sequence about being unable to contact these services - these are "expected" - at least have been to date. - I do have some BIOS updates available for the mobo running in this install which I intend to apply - I note some references to CPU bugs being detected in the logs - will post a follow up here should the BIOS updates be effective but that will take at least a few days. (Thanks to - Is it fixed? or has it just not occurred again yet?) I've inserted a snippet from the syslog that I think is most pertinent - full log contained in the diagnostics attached. Not that I think it matters - all times are UTC+10 Kind regards, Del ... ... ... Jul 10 12:04:21 vision kernel: virbr0: port 1(virbr0-nic) entered disabled state Jul 10 12:04:21 vision kernel: L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details. ... ... ... Jul 10 12:25:49 vision root: Fix Common Problems: Error: Machine Check Events detected on your server Jul 10 12:25:49 vision root: Hardware event. This is not a software error. Jul 10 12:25:49 vision root: MCE 0 Jul 10 12:25:49 vision root: CPU 3 BANK 0 TSC 33255d618d8 Jul 10 12:25:49 vision root: ADDR 1ffff815fbcc4 Jul 10 12:25:49 vision root: TIME 1594346856 Fri Jul 10 12:07:36 2020 Jul 10 12:25:49 vision root: MCG status: Jul 10 12:25:49 vision root: MCi status: Jul 10 12:25:49 vision root: Corrected error Jul 10 12:25:49 vision root: Error enabled Jul 10 12:25:49 vision root: MCi_ADDR register valid Jul 10 12:25:49 vision root: MCA: Instruction CACHE Level-0 Instruction-Fetch Error Jul 10 12:25:49 vision root: STATUS 9400004000040150 MCGSTATUS 0 Jul 10 12:25:49 vision root: MCGCAP c0c APICID 6 SOCKETID 0 Jul 10 12:25:49 vision root: MICROCODE d6 Jul 10 12:25:49 vision root: CPUID Vendor Intel Family 6 Model 158 ... ... ... vision-diagnostics-20200710-1236.zip Edited July 20, 2020 by delaney Update to reflect issue no longer occurring - solved I believe by BIOS update Quote
delaney Posted July 11, 2020 Author Posted July 11, 2020 Logging an update in case internet searches brings anyone else with the same problem here..... My MOBO / CPU combination is: Mobo : ASUSTeK COMPUTER INC. - ROG MAXIMUS XI HERO CPU : Intel® Core™ i7-8700K CPU @ 3.70GHz I was running an older BIOS - the board shipped with American Megatrends Inc. Version 1502. Dated: 02/21/2020. This morning I upgraded the BIOS to: American Megatrends Inc. Version 1502. Dated: 02/21/2020 So far uptime is approaching 6 hours, given that the crashes occurred a couple of days apart, I will continue to monitor, if I see no similar crashes/lock-ups after a week will report back here. (update) 1 Quote
delaney Posted July 20, 2020 Author Posted July 20, 2020 (edited) Hi UNRAIDer's, I've not had a reoccurrence of the symptoms described in the original post since the BIOS updated documented in the second post, so I am calling this one solved. Edited July 20, 2020 by delaney 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.