Temp issues on TR 2950x and Asrock X399D8A-2T


Recommended Posts

I am putting together what has been the build from hell.  Through a bizarre set of circumstances I've been through 2 CPUs and 3 mobos.  Finally got my replacement mobo in and put it all together and I am seeing some crazy temps reported and not sure what to trust.

 

My system resides in a Norco 4224 case with the fan wall modified with 3-Noctua NF-F12 industrialPPC-3000 120mm fans along with 2-Cooler Master Blade Master 80mm fans on the rear of the case.  I am also running a Noctua NH-U9 TR4-SP3 cooler.  All fans are set to full on the BIOS.

 

IPMI shows CPU at 62C with Unraid idling (VM and Containers running but no real activity. CU at ~6%).  If I start transcoding a 4k video in Emby CPU goes to about 50% and IPMI reports CPU temp jumps to 93C in about 20 seconds.  This freaked me out and I shut things down and double checked everything.  I removed the cooler and reapplied the thermal grease.  A weird thing about this cooler on this mobo is that when installed, the orientation is such that the fans are pointing towards the PSU/PCIE cards rather than front/back of the case.  I chose to have it blowing towards the intake fan on the PSU rather than the opposite direction, if that matters.

 

When I go to console and run sensors I get:

 

root@Tower:~# sensors
nct6779-isa-0290
Adapter: ISA adapter
SYSTIN:       +28.5°C    sensor = thermistor
CPUTIN:       +35.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:      +15.0°C    sensor = thermistor
AUXTIN1:      +36.0°C    sensor = thermistor
AUXTIN2:      -62.0°C    sensor = thermistor
AUXTIN3:      +15.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor

 

CPUTIN from sensors reads 42C when IPMI CPU temp reads 93C.  But I get critical notifications in unraid like the attached pic.  So what the heck is going on nd how do I achieve a "normal running" system?  Is the CPUTIN in sensors the accurate temp?  Why does IPMI read so high? 

 

 

Untitled2.jpg

Link to comment
11 hours ago, RockDawg said:

the orientation is such that the fans are pointing towards the PSU/PCIE cards rather than front/back of the case. 

Normal for this mainboard layout.

 

11 hours ago, RockDawg said:

IPMI CPU temp reads 93C.

Seems it ref. to coretemp. Pls make Unraid also show coretemp and check it.

 

BTW, I think something wrong between heatsink and CPU. Could you try light loading and check the IPMI temperature again ?

Edited by Vr2Io
Link to comment

I just ran a Primed 95 small FFT on 4 cores.  It yielded an average overall load of 15-20% load and the IPMI CPU reached 95C in about 10 minutes and I started receiving critical notifications in unraid.  I played a 1080P video that Emby had to transcode and it did the same in about 7 minutes with an average overal load of 40%.  I double checked that I didn't leave on the protective film on the cooler or anything stupid like that.  This build has been a total nightmare from the get go.  I can't understand why this thing gets so hot so easily.  

Edited by RockDawg
Link to comment

I would replace your cooler with one designed to move air from front to back (a true server one). It will also benefit from having the fan wall pushing a good amount of air in the same direction.

 

Any server grade Dynatron or Supermicro SP3 cooler should work. The Dynatron A35 looks like it should handle the load really well.

 

I would also contact ASRock Rack because it could be an issue with their BIOS/IPMI/BMC.

 

If you do all that and are still having issues, you might have a faulty motherboard.

Edited by ramblinreck47
Link to comment
8 minutes ago, RockDawg said:

What case are you using?

 

Fractal design S with Gigabyte X399 AORUS PRO.

 

My new build was X299 with 9800x ( TDP 165W ) with Noctua  NH-D9L in 3U case also no issue, just bit hot in full load.

 

Room temp 25, system in idle ( 8982rpm was onboard power module fan, I add Noctua low noise adapter between that then reduce much noise.

 

123.PNG.cc24351cbdb9a006c2648fb63f69a419.PNG  456.PNG.8f2765fe69cb6b675d833c6e99c1081c.PNG

Edited by Vr2Io
Link to comment

I know tons of people use Noctua coolers as have I in the past.  I am wondering though if this issue is because my case is a rack mounted case and the PSU mounts directly above the mobo.  Your PSU is below the mobo, right?  And the way the socket is situated on my mobo the CPU is right below the PSU and the cooler blows right into it rather than out the back of the case.  Doesn't yours blow out the back of the case?  I've never had that orientation in any of my systems before.  And I can't think of anything else causing my issues.

Link to comment

For reference, my previous system was a dual CPU Xeon E5-2640 with 2 Noctua CPU coolers (don't remember the model) and they both blewto the back of the case and I never had any temp issues at all.  And that was with regular old brown Noctua fans in the fan wall and the rear of the case.  Not the 3k RPM industrials I am using now.

Link to comment

I have update previous post

 

17 minutes ago, RockDawg said:

rack mounted case

Cooling really headache.

 

17 minutes ago, RockDawg said:

Doesn't yours blow out the back of the case?

Opposite, CPU fan and storage fan in same direction which blow out to front , disks would be hot ( sometimes will over temp, expect due to all fan are silent type ) but other parts really in good condition. ( I haven't any back fan, I need rather silent, those change was make when my HBA burn out due to high temp)

Edited by Vr2Io
Link to comment
14 minutes ago, RockDawg said:

I know tons of people use Noctua coolers as have I in the past.  I am wondering though if this issue is because my case is a rack mounted case and the PSU mounts directly above the mobo.  Your PSU is below the mobo, right?  And the way the socket is situated on my mobo the CPU is right below the PSU and the cooler blows right into it rather than out the back of the case.  Doesn't yours blow out the back of the case?  I've never had that orientation in any of my systems before.  And I can't think of anything else causing my issues.

The socket on this motherboard is situated perpendicular to what is normally on the threadripper motherboards. It's done the exact same way with the Epyc motherboards. The Noctua cooler is meant for normal (non-server) motherboards, that's why it's pointed in an unusual direction.

 

Overall with the Noctua cooler, you're definitely not getting optimal cooling with the way it is orientated, but you shouldn't still see the CPU spikes you are seeing. It's more than likely a IPMI/BMC issues in their software. The E3C246D4U had something similar going on.

 

ASRock Rack is renowned for trying new things like Threadripper server boards but they struggle often with their BIOS/BMC updates. Their motherboards are very cool but most usually have some sort of quirks that take time to work themselves out. Are you on the latest stable BIOS or the beta BIOS?

Edited by ramblinreck47
Link to comment

I updated my BIOS to 1.37 beta and now get additional results in sensors.

 

root@Tower:~# sensors
nct6779-isa-0290
Adapter: ISA adapter
Vcore:       392.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:           1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
AVCC:          3.34 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
+3.3V:         3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:           1.86 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:         880.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:           1.21 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
3VSB:          3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Vbat:          3.20 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
in10:        768.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:          2.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:          1.69 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:        912.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:          1.21 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
SYSTIN:       +15.0°C  (high =  +0.0°C, hyst =  +0.0°C)  sensor = thermistor
CPUTIN:       +34.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:      +15.5°C    sensor = thermistor
AUXTIN1:      +39.0°C    sensor = thermistor
AUXTIN2:      -62.0°C    sensor = thermistor
AUXTIN3:      +15.0°C    sensor = thermistor
intrusion0:  ALARM
intrusion1:  ALARM
beep_enable: disabled

k10temp-pci-00c3
Adapter: PCI adapter
Tdie:         +34.1°C  (high = +70.0°C)
Tctl:         +61.1°C  

k10temp-pci-00cb
Adapter: PCI adapter
Tdie:         +34.0°C  (high = +70.0°C)
Tctl:         +61.0°C  

 

At the very moment I ran this, IPMI was reporting 61C for CPU so it must be reading the Tctl.  This is with all containers and VMs shut down and unraid showing 2% overall CPU usage.  According to what I read that is a normal idle temp for a 2950X.

 

From what I've read that makes sense since Tdie is supposedly the actual CPU temp and Tctl includes the 27 degree offset.  When the IPMI temp approaches 95 is when I get the critical notifications.  95 - 27 offset = the 68 degree max according to AMD so everything seems to add up except why my temps get so high when doing anything.

 

I restarted all my containers and Vm and let it settle in and the CPU usage bounces around between 5-10% and my CPU temp goes to:

 

10temp-pci-00c3
Adapter: PCI adapter
Tdie:         +42.5°C  (high = +70.0°C)
Tctl:         +69.5°C  

k10temp-pci-00cb
Adapter: PCI adapter
Tdie:         +42.4°C  (high = +70.0°C)
Tctl:         +69.4°C  

 

Link to comment

By your description, at least IPMI reading match tCTL, so it not likely hardware reading problem.

You shouldn't only focus on CPU % loading, you also need check does BIOS and CPU freq-voltage scale driver (governor) does set to extreme performance mode too. You may need install plugin "tweak and tips" to tune that and do more test. Pls also take note on CPU voltage (Vcore) in different freq. does any abnormal high.

 

But you really need to ensure heatsink fully contact with CPU in good condition, this quite hard.

 

Any misalignment will cause this result.

Edited by Vr2Io
Link to comment

Now your getting a bit over my head. "Tweaks and tips" shows CPU Scaling Governor is set to On Demand.  I'm a little nervous going in and tweaking things I'm not real familair with.  Should that even be necessary.  I mean I'm not overclocking or anything.

 

Likewise, I'm not sure how to check the cooler's contact with the CPU.  When I get home I'll try putting it under load and seeing if I can feel the pipes or fins get hot.  Beyond that I'm not really sure.  I've built many system over the years and I have never had temp issues before.

Link to comment
32 minutes ago, RockDawg said:

Should that even be necessary.  I mean I'm not overclocking or anything.

This not necessary, but this is some method for problem troubleshooting.

 

I always found default not perform well. For example, some light job will make CPU run in high frequency and voltage state, but test show it can do it in a bit powersaving without change the end result. Or I want a job complete in best performance, then just set it don't automatic clock down.

 

Just several click in plugin, it will do for you.

Edited by Vr2Io
Link to comment

Understood.  My concern would be making a change that is masking an underlying problem.  It seems to me that with the loads I've been talking about on a non-overclocked CPU, I shouldn't be having a temperature issue.  I didn't get a chance last night to go and check how warm the cooler is under load, but I will.  Installing the cooler is very straightforward and simple so I wouldn't think there is anything there.

 

In the meantime I have ordered the Dynatron A35 cooler in hopes that it improves things.  

Link to comment

Still having issues here.  I got a new cooler.  The Dynatron A35 and installed it yesterday.  Idle temps dropped to around 35C but I still get overtemp alarms in certain conditions.  First thing I did was run Prime95 with small FFts on all 32 threads as a test.  It maxed out out all threads at 100% and I ran it for an hour.  Temps quickly climbed to 67.8/94 in about 20 seconds and never budged an inch higher.  Staying right below the 95 critical point.  I wasn't thrilled that it got so close to throttling but I figured Prime95 was worse than anything I would ever do so decided I was okay.  Later in the evening, someone started streaming a movie from my Emby server that requires transcoding and I get bombarded by IPMI events of overtemp issues.  Just tried it again and 10 minutes of Prime95 yields the same results.  67.8/94 temps but never higher and no critical notifications.  I start a movie that needs to be transcoded and within about 2-3 inutes the temps are 97/50.2 and I'm getting notifications.

 

Thinking about Vr2Io's voltage comments, I watch the VCPU voltage reported ny IPMI in Unraid during the tests and notice that it goes up to 1.12v during Prime95 but I saw 1.33v during Emby transcoding.  Prime95 resulted in 100% overall CPU usage and Emby transcoding was averaging about 35-45%. But I've also seen it reading 1.25v with hardly any CPU usage.  I don;t understand the voltage aspect at all so maybe all of this means nothing or is normal.  Just trying to provide info in hopes someone can help diagnose my issue.

 

I'm not sure what that all means and what I should do next.  I am lost and frustrated at this point.  I've built many systems over the years and never had issues like this.  I would really appreciate any advice on where to go from here.

Edited by RockDawg
Link to comment
6 hours ago, RockDawg said:

within about 2-3 inutes the temps are 97/50.2 and I'm getting notifications.

I think you typo 50.2, expect to be 97/124.

 

6 hours ago, RockDawg said:

1.12v during Prime95

 

6 hours ago, RockDawg said:

1.33v during Emby transcoding

 

Those is normal figure, my 1920x also have similar behavior, i.e. in idle or only 1 or 2 core in loading, then voltage will be highest ~1.4v ( freq in 4.1GHz 4.2GHz) , but if all core-thread in loading, then voltage will steady at ~1.125v ( freq also steady 3.7Ghz ).

 

Below are coretemp figure in different loading, I also compare coretemp estimate power usage with the reading on UPS also match.

In summarize, system will thermal throttling automatically to make CPU not exceed 68C, during test I never exceed that even use different burn-in tools.

 

The problem seems your system sometime won't throttling, but I really not sure this perform by mainboard or CPU itself, you may need decide buy a CPU or mainboard to rule-out the problem or try setting "CPU voltage fixed at 1.125v and Core freq fixed at 37x", that means you will lost boost performance feature and use bit more power average.

 

 

TR.thumb.png.72ffef77ad2375219e2448a84b12364f.png

 

 

Edited by Vr2Io
Link to comment

I did mean 97/50.2.  97 Tctl and 50.2 Tdie.  I believe it is throttling.  My understanding is that the CPU-PROCHOT State Asserted is it throttling due to being too hot.  If so, then it is throttling.  Here is my IPMI event log:

 

 ID  |       TimeStamp      |    Sensor Name   |             Sensor Type            |                          Description                           
======|======================|==================|====================================|================================================================
 88   | 11/14/2020, 12:04:06 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 87   | 11/14/2020, 12:04:04 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Deasserted (Reading 94 °C < Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 86   | 11/14/2020, 12:04:03 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Asserted (Reading 95 °C >= Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 85   | 11/14/2020, 12:04:02 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 84   | 11/14/2020, 12:03:42 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 83   | 11/14/2020, 12:03:38 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 82   | 11/14/2020, 04:40:14 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 81   | 11/14/2020, 04:40:12 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Deasserted (Reading 93 °C < Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 80   | 11/14/2020, 04:40:11 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Asserted (Reading 95 °C >= Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 79   | 11/14/2020, 04:40:10 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 78   | 11/14/2020, 04:40:05 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 77   | 11/14/2020, 04:40:03 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Deasserted (Reading 94 °C < Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 76   | 11/14/2020, 04:40:02 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Asserted (Reading 95 °C >= Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 75   | 11/14/2020, 04:40:01 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 74   | 11/14/2020, 04:39:42 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 73   | 11/14/2020, 04:39:40 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Deasserted (Reading 93 °C < Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 72   | 11/14/2020, 04:39:39 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Asserted (Reading 95 °C >= Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 71   | 11/14/2020, 04:39:38 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 70   | 11/14/2020, 04:39:16 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 69   | 11/14/2020, 04:39:11 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 68   | 11/14/2020, 04:38:42 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 67   | 11/14/2020, 04:38:41 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 66   | 11/14/2020, 04:38:20 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 65   | 11/14/2020, 04:38:19 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 64   | 11/14/2020, 04:38:13 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 63   | 11/14/2020, 04:38:11 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Deasserted (Reading 93 °C < Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 62   | 11/14/2020, 04:38:10 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Asserted (Reading 95 °C >= Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 61   | 11/14/2020, 04:38:09 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 60   | 11/14/2020, 04:38:01 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 59   | 11/14/2020, 04:38:00 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 58   | 11/14/2020, 04:37:49 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 57   | 11/14/2020, 04:37:48 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 56   | 11/14/2020, 04:37:38 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 55   | 11/14/2020, 04:37:35 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Deasserted (Reading 94 °C < Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 54   | 11/14/2020, 04:37:33 | CPU1 Temp        | Temperature                        | Upper Non-critical - going high - Asserted (Reading 97 °C >= Threshold 95 °C)
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 53   | 11/14/2020, 04:37:31 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 52   | 11/14/2020, 04:37:15 | CPU_PROCHOT      | Processor                          | State Asserted - Deasserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------
 51   | 11/14/2020, 04:37:13 | CPU_PROCHOT      | Processor                          | State Asserted - Asserted
------|----------------------|------------------|------------------------------------|----------------------------------------------------------------

 

My question is why should it even need to throttle?  I've never had cooling issues with other systems in this case and that was before I added beefier case fans a a server level CPU cooler.

Link to comment
26 minutes ago, RockDawg said:

I did mean 97/50.2.  97 Tctl and 50.2 Tdie. 

Then no reason if Tdie is 50 but Tctl was 97 which out of 27C off set and cause alert. Does you observe offset always 27C ?

 

26 minutes ago, RockDawg said:

My question is why should it even need to throttle?  I've never had cooling issues with other systems in this case and that was before I added beefier case fans a a server level CPU cooler.

That's why I said need perform more test, problem may not come from cooling system. It could be some hardware not work in normal.

Edited by Vr2Io
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.