[mce hardware error] on a brand new system running 3900x


ec911

Recommended Posts

Hi! I am new in this community and this is my first post.

 

I've been running unraid on trial on my old i7-4970k system without any issues.

I then recently upgraded my system (and bought a Plus license) for a Ryzen 3900x running on a Gigabyte Aorus Elite x570 with 32gb of ram (Kingston Fury 3200mhz)

 

Everything seemed to be ok, I refered to a post on this forum for tips and tricks on this particular motherboard to passthrough gpu for vms.

 

2 days ago, I woke up and server was frozen. Rebooted everything and I started to get Machine check events from Fix Common Problems plugin.

Everything seems to be running fine, but the error keeps coming back. Unless I miss something, can anyone help me out with the error? Is this common on newer Ryzen CPU running on unRaid?

 

I'm running version 6.8.1.

 

tower-diagnostics-20200126-1555.zip

Edited by ec911
Link to comment

No, that isn't normal. What all have you changed in the bios? If you've messed with overclocking or any advanced settings I would highly recommend setting to factory defaults and making ONLY these changes:

Quote

 

UEFI / BIOS Settings:

Tweaker -> Advanced CPU Settings -> SVM Mode -> Enable

Settings -> Miscellaneous -> IOMMU -> Enable

Settings -> AMD CBS -> ACS Enable -> Enable

Settings -> AMD CBS -> Enable AER Cap -> Enable

 

My initial guess is it is related to Global C-state Control. The factory default settings should be very stable (zero crashes). If you get crashes with default + the above settings, my next guess would be a hardware failure.

 

What power supply are you using?

Link to comment
2 hours ago, Skitals said:

What all have you changed in the bios?

The only change I've made along with the changes you've mentionned was to turn on XMP profile 1 to clock the ram to 3200 mhz. I put it back to 2400 mhz after but the error happened again. I didn't overclock the cpu.

 

3 hours ago, Skitals said:

What power supply are you using?

I'm running a Corsair RM850w. Couple months of use. I was using it on my other system without any issue.

 

 

I will try to replace the ram if the error comes back with the default clock settings in bios.

 

Here's the log that I was getting:

Jan 26 15:34:10 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jan 26 15:34:10 Tower kernel: #13
Jan 26 15:34:10 Tower kernel: x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
Jan 26 15:34:10 Tower kernel: mce: [Hardware Error]: CPU 12: Machine Check: 0 Bank 0: baa0000000010145
Jan 26 15:34:10 Tower kernel: #14
Jan 26 15:34:10 Tower kernel: x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
Jan 26 15:34:10 Tower kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d00002e IPID b000000000 
Jan 26 15:34:10 Tower kernel: #15
Jan 26 15:34:10 Tower kernel: x86/cpu: Activated the Intel User Mode Instruction Prevention (UMIP) CPU feature
Jan 26 15:34:10 Tower kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1580070829 SOCKET 0 APIC 1 microcode 8701013

 

I also noticed this but I don't know what it is

Jan 26 15:34:10 Tower kernel: tsc: Fast TSC calibration failed

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.