Jump to content

6.8.3 Machine Check Events & Hardware Errors


Recommended Posts

 Hey everyone,

 

I've been using my unraid server for a couple weeks now. Haven't encountered any instability issues but my system log is flooded with Hardware errors on the CPU 3/19 pair. It's running a Ryzen 3950x. 

 

Apr 22 20:22:16 UnraidTower kernel: mce: [Hardware Error]: Machine check events logged
Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: Corrected error, no action required.
Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: CPU:19 (17:71:0) MC0_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd820000000100015
Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: IPID: 0x000000b000000000, Syndrome: 0x000000003a035c0e
Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: Load Store Unit Extended Error Code: 16
Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: Load Store Unit Error: L2 TLB parity.
Apr 22 20:22:16 UnraidTower kernel: [Hardware Error]: cache level: L1, tx: DATA

 

I attached the diagnostics log. Any ideas?

 

Thanks guys.

unraidtower-diagnostics-20200423-1012.zip

Link to comment

Try the system without the overclock on the RAM (you're running XMP/AMP profile on the ram to pump it to 3600 when it's rated for 2133 and the max supported by the CPU is either 2666 or 2933 depending upon if its dual or single rank)  https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-819173

 

 

After that, I'd suggest that you upgrade to 6.9.0-beta 1 as it has a later kernel version with Ryzen fixes in it.

Link to comment

Looks like parity errors in your L2 cache. Just to clarify, you ran the system at the stock 2133 speed and still got these errors? Did you try resetting the BIOS to default values? If you have another system that would allow you to swap the MB and/or CPU to validate, I'd try that. If none of that works you may have to RMA your CPU.

Link to comment

Yes, I tried resetting the BIOS but it still happens. Unfortunately I don't have another suitable MB or CPU to test with. I'll look into getting the CPU RMA'd once the world goes goes back to normal. Don't really want to be down a computer in the mean time. For the time being I have no instability and real noticeable issues, other than a wall of error messages... Thank you very much for everyone's help.

Link to comment

Alright, I'll give that a read. If it is a kernel issue I'd rather not waste my time submitting a fruitless RMA claim. I booted up into my bare-metal windows installation to see if there any events will be logged. That being said, not sure if windows will log machine check events that are recoverable. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...