Random Shutdowns on x670e and ryzen 7950x


Recommended Posts

Made some recent hardware upgrades. Long story short, ASUS x670e proart, and AMD 7950x. However, my system hasn't seen 6+hours of continuous  uptime in almost 2 days now. The power supply is 1200w and brand new, cstates are normal, but Power Supply Idle Control is set to typical current idle. Any help is greatly appreciated. I believe the attached syslog has at least 2 reboots captured. One happened just before turning syslog on while Docker and vm service were disabled.

syslog3 infinity-diagnostics-20221006-0454.zip

Link to comment
3 hours ago, JorgeB said:

Possibly unrelated but I'm seeing some call traces that should be gone if you switch to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right))

I just made the change to ipvlan. Thanks for the tip. Memory is also running at stock non xmp speeds.

Link to comment
4 hours ago, JorgeB said:

It's the first Ryzen 7000 i see with Unraid, no idea if it's stable or not with current kernel, boot in safe mode and leave docker/VM services divisibled for for now, if it still crashes like that there might be issues with the hardware or kernel support.

Kind of thinking the same thing with ryzen 7000.  System entirely froze up in safe mode earlier. Newest syslogs attached.

syslog

Link to comment

It could be hardware I guess with a 1-week-old system. Brand-new processor + new bios, chipset, and 4 sticks of DDR5 Om going to tinker and change a few things. I'll disable c states just in case, but there is one issue I've noticed for unraid on my motherboard. I have no USB 2.0 ports on my motherboard. Figuring higher heat of usb3 devices, I bought a USB 3.0 flash drive and plugged it into a 2.0 port on my old motherboard. Logic was it should last longer IF they designed it for likely higher USB 3.0 heat output plugged into a usb 2.0 port making it run slower/cooler in the long run. My motherboard is an asus x670e creator proart(for 4x nvme, and 3x pcie slots and 4x sata ports. I doubt this flash drive is failing given it's less than 9 months old. Early adopters beware. Might spin up a test server and hope the issues with ryzen 7000 get ironed out of unraid if thats the case or I can figure out what to do for stability. I managed to get it to run for close to 24 hours with docker disabled. Trying to figure out more right now.

Link to comment

Currently going on about 20hours of runtime with docker enabled. Disabled c states in bios as well as disabling all forms of overclocking other than PBO curve optimizer. Previously had memory set to default non XMP/EXPO but realized there are multiple auto overclock settings auto enabled that mention CPU and memory. Ryzen 7000 integrates a lot of power saving features from the mobile Ryzen 6000 so I'm going to try and narrow down which of these changes made this happen.

  • Like 1
Link to comment
  • 2 weeks later...

I guess that makes at least 2 of us on 7000 series on unRAID. I was also having reboot issues but I think it had something to do with my GPU (1070). The only major issues in the log were saying that the PCI device had fallen off the bus. I removed it from the server and deleted the Nvidia driver yesterday evening and it hasn't rebooted since then, where previously it barely made it a few hours without crashing.

 

I have no evidence of this, but I'm wondering if maybe there's some weirdness somewhere having to do with a PCI-e 3.0 device running on a 5.0 platform, plus unRAID which probably hasn't received much in the way of Ryzen 7000 and/or PCI-e 5.0 optimizations. I may try adjusting PCI-e setting in the BIOS.

 

Not sure if this has anything to do with your issue, but since this was the only post from a Ryzen 7000 owner I could find on the forums figured I'd chime in since there seems to be very few of us at this point in time. Hopefully it's helpful to anyone else that finds their way in here.

  • Like 1
Link to comment
20 minutes ago, Clobes said:

I guess that makes at least 2 of us on 7000 series on unRAID. I was also having reboot issues but I think it had something to do with my GPU (1070). The only major issues in the log were saying that the PCI device had fallen off the bus. I removed it from the server and deleted the Nvidia driver yesterday evening and it hasn't rebooted since then, where previously it barely made it a few hours without crashing.

 

I have no evidence of this, but I'm wondering if maybe there's some weirdness somewhere having to do with a PCI-e 3.0 device running on a 5.0 platform, plus unRAID which probably hasn't received much in the way of Ryzen 7000 and/or PCI-e 5.0 optimizations. I may try adjusting PCI-e setting in the BIOS.

 

Not sure if this has anything to do with your issue, but since this was the only post from a Ryzen 7000 owner I could find on the forums figured I'd chime in since there seems to be very few of us at this point in time. Hopefully it's helpful to anyone else that finds their way in here.

I could see this being related to the issue. My reboots have stopped, and I didn't make other changes. Today I re-enabled c states fingers crossed it remains stable. But I am running dual GPU one for ML and heavier tasks and an old GTX960 for lighter tasks. 

Link to comment

Updated from bios 0611 to 0705 last night. Others here might find this interesting, but there is no longer a bios option for "Power Supply Idle Control" where on the previous bios revision there was. Also tried searching "idle" and ASUS bios reports no settings, so It would seem at least on this motherboard and bios combo there's no longer a Power Supply Idle Control setting. I left c states in default this time, and currently my plan is if there is still instability to be found I'm going to change one setting at a time for the future of the Unraid community. Going on 15 hours right now of uptime on new bios and default settings. Fingers are also crossed that bios improved stability and I'll never have experience an unexpected reboot again 😅

Link to comment

How's uptime looking now on your end?

My system unexpectedly, inexplicably, and without any changes made on my part decided it was finished with the daily crashing nonsense. So, when last I posted, I thought things were stable after removing the GPU, but it went from multiple daily crashes to once per day. I believe it was Jorge who suggested changing power supply idle control settings and seeing if that helped. I didn't get a chance to do that right away because I was working 12+ hour days. Well after that, and again with no changes made to anything, it stopped crashing. I've been up 4 days and 3 hours now without a sign of trouble. 

 

When it was doing the daily crash, it seemed like both times it happened right around 2pm, so I'm wondering if there was some scheduled task that was causing it, but it wouldn't have been anything that I scheduled.

 

In my experience, I can't think of a single computer that I've built that didn't have bizarre stability issues at the very beginning that didn't resolve themselves without any intervention on my part. I've always theorized that it's just cables/connections settling in or something like that. Maybe heat cycling the plastic insulation from normal use removes any tension they were under? I don't know, but it sounds smart :)

  • Like 1
Link to comment

Unsolved mystery but in the end good news :)

I can just say that this bios setting disappearing for you while bios update has happened others including me before as well.

 

I'd give the ftpm section some attention as well. For me, it was causing some serious time of investigation

  • Like 1
Link to comment
  • 2 months later...

Uptime has gotten a lot better. Im still running 0705 bios and im tempted to upgrade to 0805. Logs are filling at a rapid rate sometimes more than other times causing the need to reboot before they fill up ram. The fastest ive seen the logs fill is probably around 72 hours but on the high end ive probably gone 14 days before the need to reboot. 

infinity-diagnostics-20230103-1514.zip infinity-syslog-20230103-2021.zip

Link to comment

Hi Reggie. I hope you're still having good uptime. Just wanted to update here since I was able to "fix" whatever was causing the shutdowns. My last post here in October said that things had gotten better but they hadn't. The crashes kept happening after a short stint of not crashing. I continued adjusting settings both in the BIOS and in Unraid but it kept crashing.

 

So, after that last post I got so sick of all of the issues that I decided to get rid of Unraid entirely thinking that it was an Unraid issue (that didn't last long and I'm now happily back on Unraid). I moved all of my storage back to my old hardware, installed TrueNAS Core, got 10GB NICs and a 10GB switch and decided to use the new hardware as a Proxmox host and access storage over iSCSI. That all seemed to be stable in terms of not crashing, but I wrestled with share/file permissions on TrueNAS for an entire ******* week and could not get a working configuration. That was one of the most miserable weeks of my life in terms of tech stuff. I was ready to sell everything, genuinely. I felt so completely stupid for not being able to figure out permissions when probably millions of other people before me were able to figure it out. At that point I missed the simplicity of Unraid, to a painful degree.

 

I ripped all of my storage out of my old hardware again, put it back into the new system. By this point all of my drives had been wiped many times over so I couldn't just pick back up where I left off on Unraid. I had to start completely from scratch, which it turns out was the best thing I could have done. I started from a fresh install of Unraid on a fresh USB flash drive and copied my data back over from my backup server, set up all my Docker containers, spun up a critical VM or two and basically just waited at that point to see what would happen. About a week went by and no issues whatsoever. Since then, I have installed a 3090 and setup a gaming VM which was so reliable that I've daily driven it for close to a month now. I have about 50 or 60 hours of game time on this VM with no issues. Star Citizen, Destiny 2, Splitgate, Fallout 76... all running beautifully at 4K... well except Start Citizen but that's not the VM's fault :) . My server uptime is now at 26 days 13 hours. It would be longer but I had to shut down to install an NVME drive 26 days ago.

 

If you're still having issues and you've exhausted every other option available to you, I can say that reinstalling everything is worth a shot. I ultimately don't know what was wrong with my previous installation of Unraid but the fact that it works fine on a fresh install tells me that it must've been something that I did on the old installation to make it crash on different hardware.

  • Like 1
Link to comment
  • 2 months later...

Did anyone ever find a solution to this besides a complete reinstall? If not, can I get a confirmation the fresh install works, or that you have better up time now?

 

I just swapped from my old xeon rig for a 7950x/x670e and now having random shutdown/reboots and uptime issues. I can attach my syslog and diagnostics as well if that helps. Should I scrub either file  before hand, or are they both safe to share?

Link to comment
  • 4 weeks later...

Posting for anyone who is also having problems. I haven't had any random crashes since updating to 6.12-rc1. I believe this is due to the updated kernel which now fully supports the new 7000 processors. Fingers crossed.

 

Update: I should have kept my mouth shut. My system just crashed after ~30 days of uptime. Seems much more stable, but still not 100% resolved. I will be upgrading to the most recent rc2 version, turning on eco-mode, and double checking my power settings per this thread:

 

Edited by huquad
Link to comment
  • 7 months later...

I can confirm, that I run into a lot of these problems as well.

I had originally built the 7950x on Asus X670E Extreme. I ended up doing RMA's and other things that left my build in a messy state. While buying and swapping hardware around, I ultimately ended up building a secondary system. The other system, the 7950x3D and x670e Gene, seems to be a lot more stable. I will say that I went away from unraid due to the instability with it. Now both of those systems are on 7950x3D, and I notice, that there are lots of limitations with unraid and the x3D processors as well, having to send data through a specific core, and crashes after reserving specific cores for vm's.

I'll be playing around with all the builds again on unraid soon, I'm sure I'll still run into the same issues, but seing as there as been a lot of BIOS updates and Kernel updates with unraid, perhaps there is some stability now? We will see.

Link to comment
  • 2 months later...
On 10/6/2022 at 5:03 PM, reggienaz said:

Made some recent hardware upgrades. Long story short, ASUS x670e proart, and AMD 7950x. However, my system hasn't seen 6+hours of continuous  uptime in almost 2 days now. The power supply is 1200w and brand new, cstates are normal, but Power Supply Idle Control is set to typical current idle. Any help is greatly appreciated. I believe the attached syslog has at least 2 reboots captured. One happened just before turning syslog on while Docker and vm service were disabled.

syslog3 522.43 kB · 2 downloads infinity-diagnostics-20221006-0454.zip 137.3 kB · 1 download

Hi there, would like to check how is the stability of your system now, with unraid 6.12.6 which should fully support ryzen 7000?

 

Was thinking of getting threadripper 7960x to replace my 1920x but I think I'm gonna need to go with just the normal ryzen considering the limited Mobo and general availability of parts for threadripper this time around. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.