EgyptianSnakeLegs Posted January 24, 2020 Share Posted January 24, 2020 (edited) =======================================<UPDATE: 30 Jun 2020>======================================= These issues started randomly cropping up again about a month after this final post. Their frequency seemed to be increasing as well. It really seemed like bad memory to me, but it was a brand new system and a 12-hour memtest showed no errors. I decided to let it go, just to see what happened. Shortly after the 24 hour mark memtest started finding errors. A few hours later and it had found thousands of errors. I ended up replacing the ram and everything has been smooth sailing ever since. I suspect that may have always been the underlying issue and the fixes below were just performance improvements that obscured the issue. So...check your RAM thoroughly, folks! =======================================</UPDATE: 30 Jun 2020>======================================= =======================================<SOLVED>======================================= I believe this issue is finally solved (for me)! The following is the combination of settings that ended up working: Disable C States in the BIOS. Add the disable C6 state command in the 'go' file. Add Flash -> Syslinux Config -> label unRAID OS (and GUI Mode) -> append 'rcu_nocbs=0-7' ... Upgrade to the most recent BIOS firmware. Uninstall all Apps, stop docker service, remove/delete docker image, delete the appdata for each previously installed app (except Plex), remake docker image, reinstall apps. So far everything seems to be working as expected. I'm still getting the mysterious crash after a parity check, but I believe that to be unrelated to this original post. =======================================</SOLVED>======================================= Hello all, After months of attempted troubleshooting and much struggle, I'm finally reaching out for help with my server. About a year ago, a very kind and generous friend gifted me a new barebones server to replace the laughable Intel NUC and pile of external USB drives that was serving as my Plex server. His super generous gift was the following: Mobo: MSI B450 Tomahawk CPU: Ryzen 5 2400G with Radeon RX Vega 11 Graphics RAM: G.Skill Flare X (for AMD) DDR4 2400 - 16 GB (2 x 8 GB) PSU: Seasonic Focus Plus 550 Gold (SSR-550FX) LSI: SAS9211-8I 8PORT Int 6GB Sata+sas Pcie 2.0 A solid start to an UnRAID server that I could fill with hard drives as my college student budget allowed. After a year of saving money, I finally had enough for an UnRAID license and 3x 10 TB hard drives. So I got the system setup, following all of the glorious wisdom of @SpaceInvader One, and began the arduous task of moving all of my content over to the new system, and then got Plex setup and running. The new system seemed amazing! HOWEVER, I almost immediately began to have stability issues. The system would just randomly "crash"/lockup. It would essentially just fall off of the network, and if I happened to be logged into the admin console from my laptop, the whole interface would just stop responding. To this day I have not had more than 20 hours uptime, and it's frequently more like 1-6 hours. Basically rendering a Plex server useless, much less all of the other functionality I'd like to use on it. I quickly realized that these were the common Ryzen stability issues that people were complaining about. So I went through all of the troubleshooting steps I could find online. So far I have taken the following steps: 'SVM Mode = Enabled' in the BIOS. (I know this is just for VMs, which I'm not doing, but it's been included in many of the guides.) 'IOMMU = Enabled' in the BIOS. (Again, I know this is just for VMs, which I'm not doing, but it's been included in many of the guides.) Disable C States in the BIOS. Add the disable C6 state command in the 'go' file. Add Flash -> Syslinux Config -> label unRAID OS (and GUI Mode) -> append 'rcu_nocbs=0-7' ... Add 'IOMMU = soft' to the /boot/syslinux/syslinux.cfg BIOS UPDATES: I have tried every possible permutation of these 6 settings with each of FIVE most recent firmware versions. At this point I'm basically ready to order an Intel i7-9700K and ASUS Prime Z390-A motherboard on my credit card and throw my AMD CPU/Mobo/RAM in the dumpster. Any advice anyone has would be extremely welcome! I can provide logs of the next crash, but I can tell you that thus far there hasn't been anything meaningful in them. It's like they stop writing to the log as soon as the crash begins, so there seems to be no evidence that anything went wrong. Edited June 30, 2020 by EgyptianSnakeLegs Updated to include [Solved] status report. Quote Link to comment
John_M Posted January 24, 2020 Share Posted January 24, 2020 First, post your diagnostics (one diagnostic file is worth hundreds of words of description): Tools -> Diagnostics and post the resulting zip file. Second, reboot and choose MemTest from the boot loader and let it run for at least 24 hours. 1 Quote Link to comment
RedReddington Posted January 24, 2020 Share Posted January 24, 2020 Set PSU to: Typical Current Idle That setting alone stopped crashing my Ryzen system - besides other issues. 2 Quote Link to comment
Hoopster Posted January 24, 2020 Share Posted January 24, 2020 1 hour ago, RedReddington said: Set PSU to: Typical Current Idle @EgyptianSnakeLegs ^^^This will likely make your system much more stable since you have already disabled C states. The first generation Ryzens had lots of issues with Linux. Even though your CPU/APU is labeled as a 2400G leading many to believe it is a second generation Ryzen in which most of these issues had been resolved, it is really a first generation Ryzen CPU with an integrated GPU. By the same token, the 3400G is a second generation Ryzen CPU with an integrated GPU. 1 Quote Link to comment
EgyptianSnakeLegs Posted January 24, 2020 Author Share Posted January 24, 2020 Thank you all for the responses! @RedReddington & @Hoopster I just rebooted and changed the idle current setting. With any luck that will be the final piece of the puzzle. It had been in "Auto" and many of my crashes seemed to happen at night, so that seems like a logical cause. @John_M Here is the diagnostics file, as requested. Thank you in advance for taking a look at it! I also have a diagnostics file and syslog from November 24th that is when this issue started getting especially frequent. I can post those as well, if you're at all interested in them. ratasum-diagnostics-20200124-1358.zip Quote Link to comment
EgyptianSnakeLegs Posted January 25, 2020 Author Share Posted January 25, 2020 Well...I woke up to find the UnRAID login page still accessible, which is a first. But when I entered my credentials it took about 60-80 seconds to load the main page. I clicked over to the dashboard and it promptly crashed again. I also see that the Duplicati backup job I've been frantically trying to get completed so I have a solid backup, got hung up after only 10 GB of transfer after I went to bed last night. I saw another post by a person who deleted and remade their docker container image. I think I might try that next, before I spend 24 hours running a memtest. Any thoughts? Quote Link to comment
EgyptianSnakeLegs Posted January 27, 2020 Author Share Posted January 27, 2020 At the advice of another topic, I uninstalled all of my docker apps, deleted the docker image, deleted most of the corresponding appdata folders (except Plex), and then remade the docker image and reinstalled the apps. My primary issues seem to be gone. I'm still experiencing the mysterious crash after a parity check completes, but all other crashes appear to be gone, and system performance seems dramatically faster and more responsive. I'll update the original post to reflect this. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.