springsman Posted March 22, 2020 Share Posted March 22, 2020 (edited) This morning I woke up to messages about errors on my server. Nothing has been done recently to the server but today I have a slew of errors on the system log. I have attached as much information as I could. I am on Unraid 6.8.3. On the Dashboard and on the Main pages of Unraid everything is Green and Healthy. Fix Common Problems states: Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged. Any help would be greatly appreciated. homenet-unraid-diagnostics-20200322-1055.zip homenet-unraid-syslog-20200322-1457.zip PCI Devices and IOMMU Groups.txt Server Logging.rtf Hardware setup.rtf Edited March 22, 2020 by springsman Quote Link to comment
Squid Posted March 22, 2020 Share Posted March 22, 2020 Mar 22 10:52:08 Homenet-Unraid mcelog: CPU 1 on socket 0 received Bus and Interconnect Errors in Other-transaction Mar 22 10:52:08 Homenet-Unraid mcelog: CPU 0 on socket 0 received Bus and Interconnect Errors in Other-transaction Mar 22 10:52:08 Homenet-Unraid mcelog: Location: CPU 1 on socket 0 Mar 22 10:52:08 Homenet-Unraid mcelog: Location: CPU 0 on socket 0 You would have to google the bus and interconnect... to find out what the problem is. But, many many of these hardware errors being logged (so many that mcelog has more or less thrown up its hands in frustration). Quote Link to comment
springsman Posted March 22, 2020 Author Share Posted March 22, 2020 I could not find much help via Google on this. I did move around my RAM modules on the board. I have 4 that are 32Gb in size from the same manufacturer with exact specifications. However 2 on them looked slightly different than the other two (slightly thinner). I placed the two look-alike pairs each on the same Processor BUS (since I have two processors) and then restarted my server. Looks like the errors have stopped for now. I will let it soak for a day or two and see if the errors come back or not. This is strange since these RAM DIMMS have been in place for a long time now, for this to suddenly be an issue. Maybe they just need to be reset? Not sure... Thanks for the help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.