matty2k Posted November 19, 2021 Share Posted November 19, 2021 Dear team, last night I encountered a sudden server reboot/crash. The log and also fix common problems show "Machine Check Events detected on your server". Also some warning about: mcelog: failed to prefill DIMM database from DMI data. I have attached the diagnostics. Is there some serious hardware error I need to worry about? How to get more details? Server was running nearly 2 weeks on 6.10RC2. kind regards nasa-diagnostics-20211119-0754.zip Quote Link to comment
matty2k Posted November 24, 2021 Author Share Posted November 24, 2021 Got one unexpected reboot/crash again. 3 days later on 23rd of November. Strangely nearly the same time 02:07. Last time Nov 19 02:03. How to figure out? regards Quote Link to comment
matty2k Posted November 26, 2021 Author Share Posted November 26, 2021 (edited) nasa-diagnostics-20211126-1050.zipAgain, today 26th of November I have a new entry in the log: 02:00:14 NASA kernel: mce: [Hardware Error]: Machine check events logged very similar timestamp. how to figure out details? already installed mcelog and activated syslogserver. but still no clue. Edited November 26, 2021 by matty2k Quote Link to comment
Squid Posted November 26, 2021 Share Posted November 26, 2021 Nov 24 15:44:09 NASA mcelog: failed to prefill DIMM database from DMI data Nov 24 15:44:09 NASA mcelog: Kernel does not support page offline interface Never seen it before, but it looks to me like a FYI rather than anything else. (Although you might look for BIOS updates) Doubt it has to do with your crashing. Run memtest from the boot menu for a minimum of a pass or 2 Quote Link to comment
matty2k Posted November 27, 2021 Author Share Posted November 27, 2021 OK. Did run memtest86 latest version 9.3 two times with 4 passes for all 13 different RAM tests with finally 0 failures. I also checked for new bios upgrade but I am on the latest. Again I would like to mention that the problem occurred with 6.10RC2 on RC1 the system was stable more than 14 days. best regards. Quote Link to comment
matty2k Posted December 29, 2021 Author Share Posted December 29, 2021 Meanwhile I tried to investigate a little more. Regarding the time (see above) is when Jellyfin does the library scan and extracting covers (both scheduled at 02:00). Perhaps the crash / mce error is linked to this activity. Furthermore, mce seems to be outdated. I now get the note to update to newer version. I am on RC2 actually. Since 14 days the server is running without crash/reboot now. kind regards Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.