April 23, 20206 yr Hey everyone, I've been using my unraid server for a couple weeks now. Haven't encountered any instability issues but my system log is flooded with Hardware errors on the CPU 3/19 pair. It's running a Ryzen 3950x. Apr 22 20:22:16 UnraidTower kernel: mce: [Hardware Error]: Machine check events logged Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: Corrected error, no action required. Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: CPU:19 (17:71:0) MC0_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd820000000100015 Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: IPID: 0x000000b000000000, Syndrome: 0x000000003a035c0e Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: Load Store Unit Extended Error Code: 16 Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: Load Store Unit Error: L2 TLB parity. Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: cache level: L1, tx: DATA I attached the diagnostics log. Any ideas? Thanks guys. unraidtower-diagnostics-20200423-1012.zip
April 23, 20206 yr Try the system without the overclock on the RAM (you're running XMP/AMP profile on the ram to pump it to 3600 when it's rated for 2133 and the max supported by the CPU is either 2666 or 2933 depending upon if its dual or single rank) https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-819173 After that, I'd suggest that you upgrade to 6.9.0-beta 1 as it has a later kernel version with Ryzen fixes in it.
April 23, 20206 yr Author Hi, I previously tried disabling the XMP profile but still had the same issue. I will look into upgrading the unraid version and let you know how it goes. Thanks.
April 25, 20206 yr Community Expert 7 hours ago, LazyNinja22 said: Upgraded to 6.9.0-beta 1 but unfortunately the error messages still occur. Not surprising as they indicate a problem with the hardware
April 25, 20206 yr Looks like parity errors in your L2 cache. Just to clarify, you ran the system at the stock 2133 speed and still got these errors? Did you try resetting the BIOS to default values? If you have another system that would allow you to swap the MB and/or CPU to validate, I'd try that. If none of that works you may have to RMA your CPU.
April 25, 20206 yr Author Yes, I tried resetting the BIOS but it still happens. Unfortunately I don't have another suitable MB or CPU to test with. I'll look into getting the CPU RMA'd once the world goes goes back to normal. Don't really want to be down a computer in the mean time. For the time being I have no instability and real noticeable issues, other than a wall of error messages... Thank you very much for everyone's help.
April 25, 20206 yr Author Alright, I'll give that a read. If it is a kernel issue I'd rather not waste my time submitting a fruitless RMA claim. I booted up into my bare-metal windows installation to see if there any events will be logged. That being said, not sure if windows will log machine check events that are recoverable.
Archived
This topic is now archived and is closed to further replies.