Frequent crashes under load


Recommended Posts

TLDR: PSU since 2013, no issues prior to this build. Won't start for 5 minutes post-crash. Have tried swapping the graphics card. Crashes under load after 30 minutes prime95, both on smallest FFTs and blend. Not overheating, watts well under PSU capacity (300w after 5 minutes prime95), 1 pass memtest fine, vm.background.ratio & vm.dirty.ratio changed to 1 & 2% respectively.

 

Hi there, looking for some advice.

 

I’ve been having some persisting issues with frequent crashes. The PSU I’ve had probably since ~2013, and have had no issues with in its lifetime, including running dual ATI 7990s (admittedly with an i5-3330s rather than the below type of build). After a shutdown, it will be impossible to boot the system from either the power button or the button on the motherboard for ~5 minutes, at which point it will boot itself without having to be touched.

 

The PSU should have more than enough wattage - techpowerup and pcpartpicker both suggest power draws of 730w or less (my power monitor arrived whilst typing this - prime95 is peaking at <280w 5 minutes in) and I’ve tried with the RX 580 removed, getting the same issue.

 

I’ve had 2x shutdowns after 10 minutes of gameplay of Battlefield V in a VM. I will fairly consistently also have the server crash after 30 minutes of prime95 blend mode in an ubuntu VM.

 

Smallest FFTs on prime95 delays a crash by about 5 minutes, but it still crashes. (Pointing away from a memory issue), I’ve also done 1 pass on memtest, which succeeded with no errors.

 

It’s not related to CPU temp - the highest temp the cpu die is hitting is ~67-68 degrees. I’ve tried changing the vm.dirty_background ratio and vm.dirty_ratio to 1 and 2% in the Tips and Tweaks plugin. I’ve also tried disabling docker.

 

There’s not a lot I can see in the tail log - it tends to do some trimming about ten minutes before a crash, but beyond that, nothing. The motherboard was from eBay and the seller was a bit shady, so perhaps that? But hard to say.

 

My specs are below, and diagnostics are attached.

 

Motherboard: Asus Prime X399-A
CPU: Threadripper 1900X stock clocks, with the Noctua threadripper cooler
RAM: 4*8GB HyperX DDR4
Graphics card 1: Nvidia Quadro P2000
Graphics card 2: ATI RX 580 (as above, have tried with this removed)
PSU: Coolermaster 1200W Silent Pro Gold
HDDs: 4x 8-12TB WD red/white label
SSD: 1TB Sabrent NVMe
OS: Unraid 6.9.0 RC2

 

E2A: I've checked power consumption over the course of another run - max load 279W.

tower-diagnostics-20210228-1431.zip

Edited by JoeBloggs
Link to comment
2 hours ago, JoeBloggs said:

After a shutdown, it will be impossible to boot the system from either the power button or the button on the motherboard for ~5 minutes, at which point it will boot itself without having to be touched.

If you mean abnormal self shutdown, then symptoms look like temperature relate issue. This can happen in any component not just CPU. I would suggest change PSU, even have lower power rating on hand only, Prime test just load CPU, so power usage won't be max.

Link to comment
41 minutes ago, Vr2Io said:

If you mean abnormal self shutdown, then symptoms look like temperature relate issue. This can happen in any component not just CPU. I would suggest change PSU, even have lower power rating on hand only, Prime test just load CPU, so power usage won't be max.

Hi, yes, considered that, though it's not feeling overly hot. I've got both the sides off also. Don't have a spare one laying around, so it will involve buying a new one - is there anything at all that can be done to confirm? Never had any issues with the PSU prior to this build. Also, even the motherboard-based power switch doesn't work for a reboot.

 

One thing I have just noticed - the chipset fan located over where the IO is seems to not be spinning, even under load. Hadn't previously noticed it, as it's buried within a heatsink. Does that seem like a likely cause? Is there any way to confirm?

Link to comment
40 minutes ago, JoeBloggs said:

is there anything at all that can be done

Pls try disconnect all unnecessary component, comtrol thr VNC or remote desktop, disconnect most disk / gpu's power.

 

40 minutes ago, JoeBloggs said:

Also, even the motherboard-based power switch doesn't work for a reboot.

But you can power on/off throught external switch ?

 

40 minutes ago, JoeBloggs said:

the chipset fan located over where the IO

This fan for power module, it may spin when extreme overclock only. You can touch it to feel hot or not.

Edited by Vr2Io
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.