LazyNinja22 Posted April 23, 2020 Share Posted April 23, 2020 Hey everyone, I've been using my unraid server for a couple weeks now. Haven't encountered any instability issues but my system log is flooded with Hardware errors on the CPU 3/19 pair. It's running a Ryzen 3950x. Apr 22 20:22:16 UnraidTower kernel: mce: [Hardware Error]: Machine check events logged Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: Corrected error, no action required. Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: CPU:19 (17:71:0) MC0_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd820000000100015 Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: IPID: 0x000000b000000000, Syndrome: 0x000000003a035c0e Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: Load Store Unit Extended Error Code: 16 Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: Load Store Unit Error: L2 TLB parity. Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: cache level: L1, tx: DATA I attached the diagnostics log. Any ideas? Thanks guys. unraidtower-diagnostics-20200423-1012.zip Quote Link to comment
Squid Posted April 23, 2020 Share Posted April 23, 2020 Try the system without the overclock on the RAM (you're running XMP/AMP profile on the ram to pump it to 3600 when it's rated for 2133 and the max supported by the CPU is either 2666 or 2933 depending upon if its dual or single rank) https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-819173 After that, I'd suggest that you upgrade to 6.9.0-beta 1 as it has a later kernel version with Ryzen fixes in it. Quote Link to comment
LazyNinja22 Posted April 23, 2020 Author Share Posted April 23, 2020 Hi, I previously tried disabling the XMP profile but still had the same issue. I will look into upgrading the unraid version and let you know how it goes. Thanks. Quote Link to comment
LazyNinja22 Posted April 24, 2020 Author Share Posted April 24, 2020 Upgraded to 6.9.0-beta 1 but unfortunately the error messages still occur. Quote Link to comment
itimpi Posted April 25, 2020 Share Posted April 25, 2020 7 hours ago, LazyNinja22 said: Upgraded to 6.9.0-beta 1 but unfortunately the error messages still occur. Not surprising as they indicate a problem with the hardware Quote Link to comment
cpshoemake Posted April 25, 2020 Share Posted April 25, 2020 Looks like parity errors in your L2 cache. Just to clarify, you ran the system at the stock 2133 speed and still got these errors? Did you try resetting the BIOS to default values? If you have another system that would allow you to swap the MB and/or CPU to validate, I'd try that. If none of that works you may have to RMA your CPU. Quote Link to comment
LazyNinja22 Posted April 25, 2020 Author Share Posted April 25, 2020 Yes, I tried resetting the BIOS but it still happens. Unfortunately I don't have another suitable MB or CPU to test with. I'll look into getting the CPU RMA'd once the world goes goes back to normal. Don't really want to be down a computer in the mean time. For the time being I have no instability and real noticeable issues, other than a wall of error messages... Thank you very much for everyone's help. Quote Link to comment
Squid Posted April 25, 2020 Share Posted April 25, 2020 You could also try this Quote Link to comment
LazyNinja22 Posted April 25, 2020 Author Share Posted April 25, 2020 Alright, I'll give that a read. If it is a kernel issue I'd rather not waste my time submitting a fruitless RMA claim. I booted up into my bare-metal windows installation to see if there any events will be logged. That being said, not sure if windows will log machine check events that are recoverable. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.