Jump to content
limetech

AMD Ryzen update

41 posts in this topic Last Reply

Recommended Posts

Here's a follow up on this topic:

 

We did receive a reply from the engineer via customer care guy:

 

Quote

We have been investigating the issue where systems are reportedly locking up when idling or running small workloads.

 

This issue is related to the power supply.  Most PC power supplies (PSUs) are designed to handle a wide range of power consumption from your PC components, but not all PSUs are created equal.

 

Because of this, there are some rare conditions where the power draw of an efficient PC does not meet the minimum power consumption requirements for one or more circuits inside some PSUs.

 

This scenario (called “minimal loading supply”) can cause such PSUs to output poor quality power, or shut off entirely.

 

To prevent this issue from happening, it is important to ensure that the power supply supports 0A minimum load on the +12V circuit. These PSUs became commonplace starting in 2013 for the Intel “Haswell” platform.

 

This specification can be found printed on the sticker affixed to most PSUs, or it may be available on the manufacturer’s website.

 

However, AMD understands that not everyone is in a position to replace their PSU with a contemporary 0A-rated unit. To help with that, AMD is also developing a firmware workaround for these power supplies, and will make it available through motherboard partners as a BIOS update in the future.

 

Seems plausible, though in many server builds using single-rail PSU's, seems hard to imagine +12V rail going close to 0A.  Interesting that a bios "firmware workaround" is possible.  This seems to imply issue beyond just the PSU, or maybe the workaround is to prevent C6 state which would not be good.

Share this post


Link to post

Wonder if the monitoring circuitry of the PS might just be on the 12V 'wire' to the socket that goes to the MB?  

Share this post


Link to post

Thanks for adding a specific topic for this.

 

I've experienced one crash so far on Ryzen (Threadripper) due to this, but that's on 6.3.5.

 

I haven't been able to move to the RCs because my HBA isn't detected. Is there a possibility this can be looked at? I'm willing to pay for a troubleshooting session to facilitate data gathering, testing, etc. if need be.

 

Relevant reading: https://forums.lime-technology.com/topic/61500-64rc101112-it-flashed-lsi-9211-8i-drives-not-detected-on-x399/

 

 

Share this post


Link to post

Thanks for the added information, but can you expand a bit more?  Power supply issue, but a BIOS update might fix it?  With C-State enabled, my Ryzen system becomes unresponsive; however, the fans and HDs are still spinning, so I'm a little confused.  

Share this post


Link to post
1 hour ago, luisv said:

Thanks for the added information, but can you expand a bit more?  Power supply issue, but a BIOS update might fix it?  With C-State enabled, my Ryzen system becomes unresponsive; however, the fans and HDs are still spinning, so I'm a little confused.  

 

AMD Ryzen has defect which causes very idle CPU to randomly freeze with C-States enabled.  Some have reported freezing even with C-State disabled (though rare).

https://bugzilla.kernel.org/show_bug.cgi?id=196683

Share this post


Link to post

When Haswell came out, it was the first to utilize C6/C7 states (or at least on consumer side).
At the time it was made known  that certain PSUs wouldn't support these deeper C states therefore weren't compatible with Haswell.

 

Looks like we're a few years down the track and it's still an issue.

Edited by tjb_altf4

Share this post


Link to post

That's a pure crock-of-shit foboff answer if I ever saw one.  First rule of tech support - blame someone else and hope some other tech support person gets the angry callback.

Share this post


Link to post

Received an update from AMD:

 

Quote

Some of our partners (Gigabyte and ASRock) have started releasing BIOS updates for some of their AM4 motherboards.

 

The new BIOS provides a Power Supply Idle Control option which addresses the PSU problem causing the small workload/idle lockup issue. 

 

I expect that it won’t be too long until the BIOS is made available for all AM4 boards through our motherboard partners.

 

For users that do not have the updated BIOS, and are experiencing the issue due to the latest kernel, a known workaround is to disable C6 or Global C-state Control in the BIOS.

 

C6 or Global C-state Control can be re-enabled after updating to a BIOS that supports PSU idle control.

 

Share this post


Link to post

Tom -

 

I haven't kept up that well with the Ryzen / TR saga, but recall there were two issues - C states and the NPT. I thought there was a fix / work around for both, but then there a second C state / stability issue. 

 

Do you believe that this low power C states issue COULD be the "3rd one's the charm" final fix to make the platform stable? Or are there another symptoms (e.g., with pass through) that aren't consistent with this problem?

Share this post


Link to post

The C state issue is all one issue. I turned off C states myself as the system config fix only worked for so long. This is a good step for finally getting it fixed. 

Share this post


Link to post

I have been contemplating upgrading my home server to a ryzen 1920 since I have a buyer for my xeon 2670, but then I came across this thread.

 

Since I use unRAID exclusively as my primary "OS", I have to ask, how prevalent are these issues? Can the lock ups be 100% avoided with Cstates disabled? Would those of you currently running unRAID on ryzen systems recommend holding off on the upgrade? Or are the issues infrequent enough to pull the trigger on it?

 

The xeon setup is what I would call 99.99% stable at the moment, with extremely rare gpu/WIN10 induced VM lock ups. I would like to keep/have as much stability as possible. 

Share this post


Link to post
1 hour ago, ryoko227 said:

 The xeon setup is what I would call 99.99% stable at the moment, with extremely rare gpu/WIN10 induced VM lock ups. I would like to keep/have as much stability as possible. 

There are some very good people doing very good work with ryzen, but the level of chatter indicates we are not yet in a stable state.  It may be real soon,  but keep your Xeon  2670 a little bit longer if you need stable. 

 

My definition of stable is at least a few months with no issues reported.  Ryzen on unRaid has incredible potential, but it's still beta.... 

 

I'm buying another 2670 system while I wait.

Edited by tr0910

Share this post


Link to post
10 hours ago, ryoko227 said:

I have been contemplating upgrading my home server to a ryzen 1920 since I have a buyer for my xeon 2670, but then I came across this thread.

 

Since I use unRAID exclusively as my primary "OS", I have to ask, how prevalent are these issues? Can the lock ups be 100% avoided with Cstates disabled? Would those of you currently running unRAID on ryzen systems recommend holding off on the upgrade? Or are the issues infrequent enough to pull the trigger on it?

 

The xeon setup is what I would call 99.99% stable at the moment, with extremely rare gpu/WIN10 induced VM lock ups. I would like to keep/have as much stability as possible. 

 

The majority of my issues with Ryzen have been related to C-State.   I had one issue related to memory a few weeks back, but it has not happened again. 

 

 

When I leave C-State disabled the longest up-time I've recorded was 22 days.  I built the system back in August and since then have applied 4 - 5 BIOS updates as well as various unRAID RCs, so these updates have attributed to the short up-time.   I typically enable C-State after Asus indicates that the BIOS update includes AGESA updates; however, as you can tell by now, the C-State issue persists.    Frustrating yes, but once C-State is disabled, the system is stable.  

 

I do not do perform GPU pass through, but have read others have been successful; however, you should let them chime in to understand their pain points.  

 

Not sure if you have read through this post yet, but it might help you with your decision.  Due to the performance / price point that Ryzen initially provided, I took the chance with Ryzen and have been rewarded yet frustrated at the same time.  My main use cases are around media streaming, file share, backup and lab type testing with various Windows and Linux VMs.  I have a separate PC as a daily driver.   

 

 

 

 

Share this post


Link to post

I have disabled and enabled c-state,  rcu-nocbs , overclocked the memory & stock speeds, CPU power consumption on and off, all combinations possible have been tried, the server just wont be stable. 

I cant define what makes it freeze as it freezes while idling and under load while in a vm.

I was using RC15e then rolled back to RC14

 

the server has the following:

mobo: Asus x370 pro bios updated yesterday

CPU: Ryzen 1700x stock speeds

GPU: GT 710

Ram: Corsair LPX 3200 <- not on the comparability list.

PSU: 1000w cooler master 80 gold.

Array are a sacrificial seagate 2TBx3, M.2 256GB in cache.

Share this post


Link to post

Mine would freeze no matter what when I tried any oc using the bios. I oc using the zenstates script now, you can even use the script to turn on/off c states in the cmdline.

Sent from my SM-G955U using Tapatalk

Share this post


Link to post

Last night my system froze on me.

I think there must be somthing with the rc15 update.

I have rolled back to rc14 with rcu_nocbs=0-15 and so far my system is stable

 

Asus Prime x370 pro, bios 3401

ryzen 1700x

Edited by Handl3vogn

Share this post


Link to post

So far so good for me... latest Asus BIOS v3401, C-State Disabled with rcu_nocbs=0-15 set.    I updated the BIOS yesterday, so right now the system has been up for 22hrs and 12mins.   

Share this post


Link to post

Also update ASUS prime X370 BIOS 3203 to 3401 20 min ago. Notice some change in BIOS, (1) SVM default disable (2) BIOS FAN control feature extend to AIO/Pump FAN now.

 

About C-State control, there are big different start from 3203.

 

if you enable, then (1) At BIOS page, the CPU will run at 1.3v. (2) unRAID boot log show some fimware bug relate C-State.

If you disable, then (1) At BIOS page, the CPU will run at 1.08v, CPU temp also keep in low range. (2) unRAID boot log haven't those C-State error.

 

So I don't think C-State disable really "disable"

Edited by Benson

Share this post


Link to post
2 minutes ago, Benson said:

if you enable, then (1) At BIOS page, the CPU will run at 1.3v. (2) unRAID boot log show some fimware bug relate C-State.

 

Please post diags.

Share this post


Link to post

Sorry I currently disable C-State, so attach previous error log (have below error @ BIOS 3203, since version 3xxx ) for your ref.

 

Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Nov 22 18:20:06 X370 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)

 

Edited by Benson

Share this post


Link to post
1 hour ago, Benson said:

Also update ASUS prime X370 BIOS 3203 to 3401 20 min ago. Notice some change in BIOS, (1) SVM default disable (2) BIOS FAN control feature extend to AIO/Pump FAN now.

 

About C-State control, there are big different start from 3203.

 

if you enable, then (1) At BIOS page, the CPU will run at 1.3v. (2) unRAID boot log show some fimware bug relate C-State.

If you disable, then (1) At BIOS page, the CPU will run at 1.08v, CPU temp also keep in low range. (2) unRAID boot log haven't those C-State error.

 

So I don't think C-State disable really "disable"

 

I didn't notice those additional options yesterday, so I'll check later today and will report back.  

Share this post


Link to post
1 hour ago, luisv said:

 

I didn't notice those additional options yesterday, so I'll check later today and will report back.  

 

If I am correct, IOMMU and SVM may affect above founding, anyway I disable both. ( CPU run at 1.08v in BIOS page only be found since BIOS ver 3xxx )

So there are some odd with new AGESA code.

Edited by Benson

Share this post


Link to post

Thank you all very much for all of your information and insight! With the power of your experiences combined.... :P I should be able to make a much more informed decision. Will let you guys know what I decide once I consult the wallet again ^_^v

Share this post


Link to post

just a note, I switched from the Asus x370 prime to MSI x370 gaming pro and wit the c-state disabled I managed over 24+ hours without a freeze, it seems a memory comparability issue the Asus board..

 

Now testing the uptime without  rcu-nocbs and my current time is 7 hours and counting.

 

 

Edited by PSYCHOPATHiO

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.