[SOLVED][UPDATED] - [6.8.1] Persistant Ryzen Headaches, Need Help...


Recommended Posts

=======================================<UPDATE: 30 Jun 2020>=======================================

These issues started randomly cropping up again about a month after this final post.  Their frequency seemed to be increasing as well.  It really seemed like bad memory to me, but it was a brand new system and a 12-hour memtest showed no errors.  I decided to let it go, just to see what happened.  Shortly after the 24 hour mark memtest started finding errors.  A few hours later and it had found thousands of errors.  I ended up replacing the ram and everything has been smooth sailing ever since.  I suspect that may have always been the underlying issue and the fixes below were just performance improvements that obscured the issue.  So...check your RAM thoroughly, folks!

=======================================</UPDATE: 30 Jun 2020>=======================================

 

 

=======================================<SOLVED>=======================================

I believe this issue is finally solved (for me)!  The following is the combination of settings

that ended up working:

  1. Disable C States in the BIOS.
  2. Add the disable C6 state command in the 'go' file.
  3. Add Flash -> Syslinux Config -> label unRAID OS (and GUI Mode) -> append 'rcu_nocbs=0-7' ...
  4. Upgrade to the most recent BIOS firmware.
  5. Uninstall all Apps, stop docker service, remove/delete docker image, delete the appdata for
    each previously installed app (except Plex), remake docker image, reinstall apps.

 

So far everything seems to be working as expected.  I'm still getting the mysterious crash after
a parity check, but I believe that to be unrelated to this original post.
=======================================</SOLVED>=======================================

 

 

Hello all,

 

After months of attempted troubleshooting and much struggle, I'm finally reaching out for help with my server.  About a year ago, a very kind and generous friend gifted me a new barebones server to replace the laughable Intel NUC and pile of external USB drives that was serving as my Plex server.  His super generous gift was the following:

Mobo: MSI B450 Tomahawk
CPU: Ryzen 5 2400G with Radeon RX Vega 11 Graphics
RAM: G.Skill Flare X (for AMD) DDR4 2400 - 16 GB (2 x 8 GB)
PSU: Seasonic Focus Plus 550 Gold (SSR-550FX)
LSI: SAS9211-8I 8PORT Int 6GB Sata+sas Pcie 2.0

A solid start to an UnRAID server that I could fill with hard drives as my college student budget allowed.  After a year of saving money, I finally had enough for an UnRAID license and 3x 10 TB hard drives.  So I got the system setup, following all of the glorious wisdom of @SpaceInvader One, and began the arduous task of moving all of my content over to the new system, and then got Plex setup and running.  The new system seemed amazing!

 

HOWEVER, I almost immediately began to have stability issues.  The system would just randomly "crash"/lockup.  It would essentially just fall off of the network, and if I happened to be logged into the admin console from my laptop, the whole interface would just stop responding. To this day I have not had more than 20 hours uptime, and it's frequently more like 1-6 hours.  Basically rendering a Plex server useless, much less all of the other functionality I'd like to use on it.

 

I quickly realized that these were the common Ryzen stability issues that people were complaining about.  So I went through all of the troubleshooting steps I could find online.

 

So far I have taken the following steps:

  1. 'SVM Mode = Enabled' in the BIOS. (I know this is just for VMs, which I'm not doing, but it's been included in many of the guides.)
  2. 'IOMMU = Enabled' in the BIOS. (Again, I know this is just for VMs, which I'm not doing, but it's been included in many of the guides.)
  3. Disable C States in the BIOS.
  4. Add the disable C6 state command in the 'go' file.
  5. Add Flash -> Syslinux Config -> label unRAID OS (and GUI Mode) -> append 'rcu_nocbs=0-7' ...
  6. Add 'IOMMU = soft' to the /boot/syslinux/syslinux.cfg
  7. BIOS UPDATES: I have tried every possible permutation of these 6 settings with each of FIVE most recent firmware versions.

 

At this point I'm basically ready to order an Intel i7-9700K and ASUS Prime Z390-A motherboard on my credit card and throw my AMD CPU/Mobo/RAM in the dumpster.  Any advice anyone has would be extremely welcome!

 

I can provide logs of the next crash, but I can tell you that thus far there hasn't been anything meaningful in them.  It's like they stop writing to the log as soon as the crash begins, so there seems to be no evidence that anything went wrong.

Edited by EgyptianSnakeLegs
Updated to include [Solved] status report.
Link to comment
1 hour ago, RedReddington said:

Set PSU to: Typical Current Idle

@EgyptianSnakeLegs ^^^This will likely make your system much more stable since you have already disabled C states. 

 

The first generation Ryzens had lots of issues with Linux.  Even though your CPU/APU is labeled as a 2400G leading many to believe it is a second generation Ryzen in which most of these issues had been resolved, it is really a first generation Ryzen CPU with an integrated GPU.  By the same token, the 3400G is a second generation Ryzen CPU with an integrated GPU.

  • Like 1
Link to comment

Thank you all for the responses!  @RedReddington & @Hoopster I just rebooted and changed the idle current setting.  With any luck that will be the final piece of the puzzle.  It had been in "Auto" and many of my crashes seemed to happen at night, so that seems like a logical cause.

 

@John_M Here is the diagnostics file, as requested.  Thank you in advance for taking a look at it! I also have a diagnostics file and syslog from November 24th that is when this issue started getting especially frequent.  I can post those as well, if you're at all interested in them.

ratasum-diagnostics-20200124-1358.zip

Link to comment

Well...I woke up to find the UnRAID login page still accessible, which is a first.  But when I entered my credentials it took about 60-80 seconds to load the main page.  I clicked over to the dashboard and it promptly crashed again.  I also see that the Duplicati backup job I've been frantically trying to get completed so I have a solid backup, got hung up after only 10 GB of transfer after I went to bed last night.  I saw another post by a person who deleted and remade their docker container image.  I think I might try that next, before I spend 24 hours running a memtest.

 

Any thoughts?

Link to comment

At the advice of another topic, I uninstalled all of my docker apps, deleted the docker image, deleted most of the corresponding appdata folders (except Plex), and then remade the docker image and reinstalled the apps.  My primary issues seem to be gone.  I'm still experiencing the mysterious crash after a parity check completes, but all other crashes appear to be gone, and system performance seems dramatically faster and more responsive.  I'll update the original post to reflect this.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.