Jump to content
lukeoslavia

Wierd threadripper temp issue

20 posts in this topic Last Reply

Recommended Posts

I was doing some testing with Hyper-v on my 1950x and noticed that when running hyper-v my processor temps could not be read properly and my liquid cooler (kraken x62) was unable to scale fan speeds and pump speeds. Im still unsure what exactly the issue was there but am curious if anyone has noticed this in Unraid. 

 

My processor was reaching temps of 67'C and the fans and pump where still running silent (940 rpm fan speed). This was also causing the processor to lower the clock speed quite a bit to try to keep cool. 

 

So i guess my question is, is this also a kvm issue or just something to do with hyper-v?

Share this post


Link to post

So, when I started building my TR build, I went through the usual paces and noticed that my thermals were insanely high.  I started freaking out and exchanged my water cooler- but still the temps were through the roof.  I did some research and found that there is a 27 degree offset for that CPU.  That means that it reports a 21 degree higher temp on specific sensors.  I run about 50C at the high end and idle around 30C once I choose the sensors I know will report without the offset, everything looks great.  I never get thermal throttling, so not sure what is going on there, but I did set my pump to 1400rpm- 900 sounds way slow- and unless you have a loud pump, you should get it at least to 1200.

Edited by jordanmw
27... not 21

Share this post


Link to post

Well I can say for sure I am reading the averaged temp from tdie. Nzxt cam, hwmonitor, and the amd ryzen master utility all read from that. I am pretty sure unraid 6.7 rc3 wasnt as it was reading about double the temps im used to seeing. I applied a better coat of thermal paste but even with that, if I have hyper-v running in windows, my temps dont read out correctly and the pump and fan speed wont change at all as load and temp go up. I used prime 95 to peg the processor and the fan speed and pump stayed the same.

 

With hyper-v not running, the fan speeds go up. I also am idling at 27'C now, but with hyper-v on it idles around 40 best I can tell.

As far as with unraid I never noticed my fan speeds increasing but im not sure if it does or not. I might just have to install it again and test.

Share this post


Link to post
18 minutes ago, jordanmw said:

I never get thermal throttling, so not sure what is going on there, but I did set my pump to 1400rpm- 900 sounds way slow- and unless you have a loud pump, you should get it at least to 1200.

I am curious do know of anyway to control cooler fan and pump speeds in Unraid? I haven't been able to find any info on that.

Share this post


Link to post

No, I can see temps but not control fan/pump speed.  If you have a USB header for the pump- you can pass that through to a windows machine and install the control software if you have some.  Otherwise I just did all the tweaking with fan/pump speed in the bios- that is really the best way.

Share this post


Link to post

The sensors can't be read inside a VM, not that i know of. All the specific monitoring tools like HWinfo or Ryzenmaster don't have access to that. Best solution would be to set up your pump speeds and fan curves directly inside the BIOS. 

Share this post


Link to post
15 hours ago, lukeoslavia said:

I was doing some testing with Hyper-v on my 1950x and noticed that when running hyper-v my processor temps could not be read properly and my liquid cooler (kraken x62) was unable to scale fan speeds and pump speeds. Im still unsure what exactly the issue was there but am curious if anyone has noticed this in Unraid. 

 

My processor was reaching temps of 67'C and the fans and pump where still running silent (940 rpm fan speed). This was also causing the processor to lower the clock speed quite a bit to try to keep cool. 

 

So i guess my question is, is this also a kvm issue or just something to do with hyper-v?

Interesting .. I have similar issues (high temperatures) with a kraken x62 and a 1950. (MB Asus Rog Zenith Extreme)

I always thought that it's an issue with the cooler/type and that it isn't cooling the CPU properly..  also reapplied the thermal paste.

 

I was already thinking of getting a new one and replacing the kraken as other users seem to get much lower temperatures even with air cooling ones but was yert too lazy to do so ..

 

To check the temperatures on thre kraken, i passed through the USB connector to the VM and used the ultility but also in the Bios the temperatures go up too high (idle and under high frequency) ...

Edited by Symon

Share this post


Link to post
8 hours ago, bastl said:

The sensors can't be read inside a VM, not that i know of. All the specific monitoring tools like HWinfo or Ryzenmaster don't have access to that. Best solution would be to set up your pump speeds and fan curves directly inside the BIOS. 

Just to be clear @bastl, when I check temps I am referring to those utilities in bare metal windows with hyper-v on top. As for the unraid temps I was using the dynamix (I think) temp plugin and reading it from the dash. 

 

@Symon I did pass through mine as well on an unraid Vm, but the temp didn’t read out properly for me, I would check into the dynamix plug in and install Perl with the nerdpack as the temperature readings seem to work well there. Just be aware if you select tdie as the sensor that’s an actual average of the processor sensors not the one that’s +20(ish) degrees.

Share this post


Link to post
19 hours ago, bastl said:

@Symon What are high temps for you and at what clock speeds you are running?

Before I was running the CPU at 3.875 GHz @ 1.25 V but the system wasn't stable (Unraid crashed) if I was running a stress test on 12 VCPUs on my VM. The temperature would go up to 70 ° within a minute before the whole host would crash (my guess is that the CPU would shut dow due to the high temperature). This wasn't really an issue for the daily usage as I would never have such a load on my Gaming VM so I used this for a while.

 

Yesterday I did a Bios update and at the same time removed the OC settings and activated the Intel Turbo Boost in the Tips and Tweaks plugin. Now a single core will go up to 3.9 GHz and when doing a test with 12 cores they will be around 3.5 GHz during a stress test. The system runs stable now.

 

Unraid Without Dockers and VM runnig: 40 - 43 ° (Single corese go up to 3.9)

Unraid with multiple VM running: 45 - 50 ° (Single corese go up to 3.9)

Additionally running a stress test on the gaming VM: 69° (All Cores running at 3.5)

 

I don't know if I should switch back to OC the CPU or just leave it as it is.. I'm a bit worried to toast my CPU :D

Share this post


Link to post
15 hours ago, lukeoslavia said:

@Symon I did pass through mine as well on an unraid Vm, but the temp didn’t read out properly for me, I would check into the dynamix plug in and install Perl with the nerdpack as the temperature readings seem to work well there. Just be aware if you select tdie as the sensor that’s an actual average of the processor sensors not the one that’s +20(ish) degrees. 

You are right, it will only show 50° all the time ..

If you use "watch sensors" within unraid, youd should see the temperature of the CPU.

With that command I can somehow see the temperature of each DIE of the CPU. Is this the same for you? What motherboard are you using?

 

Im currently testing to configure the CPU Fan speed in the bios settings.

 

Share this post


Link to post

@Symon My 1950x runs at 4GHz @ 1.275V under air stable (NH-U14S). Idle temps are somewhere in between 37-47 °C with all my dockers running and my main VM browsing the web. Running a stresstest inside the gaming VM (half the cores) it spikes up to 70 some short spikes up to 76 but never had any crashes. Usual temps while gaming are more around 55.

 

On stock settings when testing for a stable OC on an physical Windows install the temps for me showed they can spike up really quick by the fact the board regulating the vcore automatically. I saw spikes above 1.35 sometimes between 1.45-1.5. I tested a lot to get an stable system and with my current settings I'am running the server for almost 1 year stable now. It will even run 4GHz at only 1.25V stable on every game and usual use case i tried. I only had 1 single crash on cinebench. With 1.275V no crashes at all. 

 

The way how the sensors are presented by the BIOS is motherboard related. For my ASRock Fatal1ty x399 it only shows me one die with "watch sensors". I remember back when I did research on the TR4 platform and it's cooling solutions, except from the Enermax Liqtech 240 and 360 none of the standard AIO's on the market performed significant better that the Noctua U14S aircooler. Most of them showed worst results by the fact that they don't cover the complete IHS.

 

https://www.gamersnexus.net/news-pc/3008-threadripper-cooler-and-thermalpaste-coverage-vs-die-ihs

 

On stock settings that might be ok'ish but with heavy load or with an OC applied none of the AIO's could handle the produced heat. 

Share this post


Link to post
1 hour ago, bastl said:

@Symon My 1950x runs at 4GHz @ 1.275V under air stable (NH-U14S). Idle temps are somewhere in between 37-47 °C with all my dockers running and my main VM browsing the web. Running a stresstest inside the gaming VM (half the cores) it spikes up to 70 some short spikes up to 76 but never had any crashes. Usual temps while gaming are more around 55. 

Do you know whether the CPU will throttle the speed itself when the temperature gets too high or will it just kill itself ? :)

The system maybe also crashed because the CPU voltage wasn't high enough for higher temperatures ..

 

 

 

Share this post


Link to post

@Symon I'am really lucky with that chip running at that low voltage stable. Most people had to put in 1.3+ to get it stable at 4GHz. Above 68°C the CPU starts throttling. I first had an first gen Enermax Liqtech 360 the one with quality issues where it starts corroding and clog up the radiator. After 2 months I noticed the performance of the whole system more and more decreasing. I checken the clock speeds and the temps and the system was idling around 50°C and instant spiked up to 80-90 on small loads and started to throttle below 2GHz even on default clocks. But it never crashed of the high temps. I think there is a threshold where the CPU will shut down, 105°C I guess but don't quote me on that. . 

Share this post


Link to post
On 2/28/2019 at 12:03 PM, bastl said:

@Symon I'am really lucky with that chip running at that low voltage stable. Most people had to put in 1.3+ to get it stable at 4GHz. Above 68°C the CPU starts throttling. I first had an first gen Enermax Liqtech 360 the one with quality issues where it starts corroding and clog up the radiator. After 2 months I noticed the performance of the whole system more and more decreasing. I checken the clock speeds and the temps and the system was idling around 50°C and instant spiked up to 80-90 on small loads and started to throttle below 2GHz even on default clocks. But it never crashed of the high temps. I think there is a threshold where the CPU will shut down, 105°C I guess but don't quote me on that. . 

Crazy what a difference the quality of a chip can make :) 

I might try to overclock the system again with a higher CPU voltage and see if I can get it to run stable ..

 

I also checked whether i could switch to a Noctua U14S but unfortunately it will cover PCI and Ram slots im using right now on my MB.

 

Share this post


Link to post

@Symon The Ram slots shouldn't be an issue to populate. At least for me it looks like there is space under the fan for another Trident Z. Luckily you can adjust the fan, also the cooler is adjustable a couple mm to get it more distance from the GPU. Has to be done when it's mounted. The thick ass Strix 1080ti fits in the upper slot with a 2mm gap to the cooler. I tested with the card on a lower slot. The temperature difference on max load is 2-3°C.

 

Sry for the dirty side panel 😂

 

02.thumb.jpg.ea22fec7a49087f2e8e2c5d52734098f.jpg01.thumb.jpg.0676b091b01514f2524f6be9f7d4c81e.jpg

Share this post


Link to post
On 2/26/2019 at 9:26 PM, lukeoslavia said:

I am curious do know of anyway to control cooler fan and pump speeds in Unraid? I haven't been able to find any info on that.

Did you find anything? I realized now that the pump was never running faster than the base speed and thats also the reason why I got such bad cooling results

.. feel kind of dumb now :D

I think the CAM software isn't a very good solution anyways as a VM needs to be running for it to work.

Bios control seems to be not possible with the kraken or I haven't found anything about it..

 

There are some solutions for linux described here:

https://medium.com/@leinardi/how-to-control-a-nzxt-kraken-from-linux-with-a-gui-93367113f2f5

 

Maybe something like this could be run on Unraid itself ?

 

 

 

Share this post


Link to post

Tried to get this running with user scripts:

https://github.com/leaty/camctl

 

#!/usr/bin/env python3

import usb.core
import usb.util
import argparse
import sys

class CAM:
	def __init__(self, vid, pid):
		self.vendor = vid
		self.product = pid
		self._find()

	def _find(self):
		devices = list(usb.core.find(idVendor=vid, idProduct=pid, find_all=True))
		self.device = devices[0]

	def claim(self):
		if self.device.is_kernel_driver_active(0):
			print('Detaching kernel driver..')
			self.device.detach_kernel_driver(0)

	def declaim(self):
		if not self.device.is_kernel_driver_active(0):
			print('Reattaching kernel driver..')
			usb.util.dispose_resources(self.device)
			self.device.attach_kernel_driver(0)

	def fan(self, speed):
		print('Setting fan speed to {}..'.format(speed))
		self.device.write(1, [2, 77, 0, 0, speed, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

	def pump(self, speed):
		print('Setting pump speed to {}..'.format(speed))
		self.device.write(1, [2, 77, 64, 0, speed, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

# Change these if they differ from yours
# See lsusb (e.g. Bus 001 Device 004: ID 1e71:170e NZXT)
vid = 0x1e71
pid = 0x170e

parser = argparse.ArgumentParser()
parser.add_argument('-f', '--fanspeed', dest='fanspeed', type=int, default=None, help="Fan speed between 10 - 100")
parser.add_argument('-p', '--pumpspeed', dest='pumpspeed', type=int, default=None, help="Pump speed between 10 - 100")
args = parser.parse_args()

if args.fanspeed:
	if args.fanspeed < 10 or args.fanspeed > 100:
		print('Fan speed must be between 10 - 100')
		sys.exit(0)
if args.pumpspeed:
	if args.pumpspeed < 10 or args.pumpspeed > 100:
		print('Pump speed must be between 10 - 100')
		sys.exit(0)
try:
	cam = CAM(vid, pid)
	cam.claim()
	if args.fanspeed:
		cam.fan(args.fanspeed)
	if args.pumpspeed:
		cam.pump(args.pumpspeed)
	cam.declaim()
except Exception as e:
	cam.declaim()
	raise(e)

 

I was able to install python 3 through nerd tools but am stuck at importing usb.core

It would probably be possible to run a script like this periodically and adjust the fan / pump speed according to the CPU temperature.

 

But this is not really my area so I'm kind of lost at this point now :)

 

 

 

Edited by Symon

Share this post


Link to post
2 hours ago, bastl said:

The thick ass Strix 1080ti fits in the upper slot with a 2mm gap to the cooler

Did you plan that before buying or just luck ? :D

Other users mentioned that the first pci slot on my MB would be covered by it .. but I might still try it if I can't get the current cooler to work properly with unRaid.. :) 

Share this post


Link to post

@Symon I've had read that it should fit on most of the boards and depending on the thickness of the backplate other cards might be touching the cooler. Btw, a card with backplate you kinda safe and won't short something on the card by touching with the cooler and the plate is mostly only for design reasons. I never saw any temperature testings where the backplate of the card had any effect on the cooling of the card.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.