Show CPU temps on Dashboard with AM5 x670 with ryzen 7000 series


Recommended Posts

Hi all, 

 

After looking for an solution for many many days now, reading all known forums I saw many threads with same issue as I had, so I share with you the steps I took to make the CPU temps shown on the dashboard after many trials and errors. Hope it helps you out too.. 

 

My setup is an ASUS TUF X670E-PLUS wifi edition with an Ryzen 9 7900x 

image.png.c41c5472c11b44e2026777b7bc54c3d7.png

 

Before all I'm using custom kernel for Intel ARC compatibility, and am on 6.12.7-rc2

I'm using Thor kernel latest version (more info: https://github.com/thor2002ro/unraid_kernel)

Use it on your own risk, and it will break the nvidia drivers if you are using them!!

 

Allright let's begin

 

First open up an terminal window or ssh in to your box. 

type in sensors and your output should look something like this: 

 

nct6799-isa-0290
Adapter: ISA adapter
in0:                            1.22 V  (min =  +0.00 V, max =  +1.74 V)
in1:                          1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                            3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                            3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                          1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                            1.03 V  (min =  +0.00 V, max =  +0.00 V)
in6:                          408.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                            3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                            3.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                            3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                           1.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                           1.10 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                           1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                         208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                           2.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in15:                         944.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in16:                           3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in17:                           1.27 V  (min =  +0.00 V, max =  +0.00 V)
fan1:                          590 RPM  (min =    0 RPM)
fan2:                          845 RPM  (min =    0 RPM)
fan3:                          624 RPM  (min =    0 RPM)
fan4:                          604 RPM  (min =    0 RPM)
fan5:                          760 RPM  (min =    0 RPM)
fan6:                            0 RPM  (min =    0 RPM)
fan7:                            0 RPM  (min =    0 RPM)
SYSTIN:                        +29.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +125.0°C)  sensor = thermistor
CPUTIN:                        +34.5°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +125.0°C)  sensor = thermistor
AUXTIN0:                       +66.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +125.0°C)  sensor = thermistor
AUXTIN1:                       +12.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +125.0°C)  sensor = thermistor
AUXTIN2:                       +19.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +125.0°C)  sensor = thermistor
AUXTIN3:                       -61.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +100.0°C)  sensor = thermistor
AUXTIN4:                       +24.0°C  (high = +80.0°C, hyst = +75.0°C)
                                        (crit = +100.0°C)
PECI/TSI Agent 0 Calibration:  +34.0°C  (high = +80.0°C, hyst = +75.0°C)
AUXTIN5:                       +12.0°C  
PCH_CHIP_CPU_MAX_TEMP:          +0.0°C  
PCH_CHIP_TEMP:                  +0.0°C  
PCH_CPU_TEMP:                   +0.0°C  
TSI0_TEMP:                     +45.0°C  
intrusion0:                   ALARM
intrusion1:                   OK
beep_enable:                  disabled

amdgpu-pci-1400
Adapter: PCI adapter
vddgfx:        1.46 V  
vddnb:         1.24 V  
edge:         +42.0°C  
PPT:           3.19 W  

nvme-pci-0e00
Adapter: PCI adapter
Composite:    +28.9°C  (low  =  -5.2°C, high = +79.8°C)
                       (crit = +84.8°C)

k10temp-pci-00c3
Adapter: PCI adapter
CPU Temp:     +45.1°C  
Tccd1:        +36.0°C  
Tccd2:        +35.1°C  

i915-pci-0300
Adapter: PCI adapter
in0:           0.00 V  
power1:           N/A  (max =  55.00 W)
energy1:      12.39 kJ

nvme-pci-1300
Adapter: PCI adapter
Composite:    +24.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +89.8°C)

nvme-pci-0500
Adapter: PCI adapter
Composite:    +35.9°C  (low  = -273.1°C, high = +89.8°C)
                       (crit = +94.8°C)
MB Temp:      +35.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +38.9°C  (low  = -273.1°C, high = +65261.8°C)

 

If you see something similir like this continue to next step: 

We're going to create a driver.conf file in the plugin directory

(It normally gets created after sensors are being detected but 

for some reason the sensors dont get correctly recognized on 

the x670e or the ryzen 7000 series by the plugin or both dont

know i am not a coder :) just a guy trying to fix broken stuff

to tell the plugin what sensors to use for readings

In the same terminal window type: 

touch /boot/config/plugins/dynamix.system.temp/drivers.conf

and after that 

nano /boot/config/plugins/dynamix.system.temp/drivers.conf

add the following lines:

it87

nct6775

nct6799

k10temp

press ctrl + o followed by enter  followed by ctrl + x 

 

after this go to the plugin page and you should see the 4 drivers listed under available drivers:

image.thumb.png.29bb930777e7f7edee4acb7a228f0df6.png

 

from here you can go ahead and choose cpu, mainboard, and fan speeds as normally:

image.thumb.png.e9de04533cedfc151ecae2eafa94bc9c.png

 

 

after choosing the correct sensors (mine looks like this:) don't forget to press the apply button. 

image.thumb.png.9a20f1e67c3103776bd6a1211e277f24.png

 

I tested this on the official kernel and thor kernel both work allthough on the official kernel the nct 6799 doesnt get loaded and the choises will be more limited it will look like this:

image.thumb.png.acc4c85397c2c8c303f887650995a44d.png

As you can see only nvme and cpu temps are showing up then. 

 

Happy Temp watching on your Dashboards :) 

image.png

Link to comment
  • 3 weeks later...

I have the same problem, however I see many warnings that ZFS is not available on the newer kernels.. So I'm afraid I'm stuck at the default 6.1.74 kernel with Unraid 6.12.8.

Or do you have other experiences?

Link to comment
Posted (edited)

Hi there, 
 

So far as i know i have no issues with zfs, although i use them only on cache pools.
As of writing this I am also running 6.12.8 with following kernel: 6.8.0-rc4-thor-Unraid+

Haven't got the time yet to upgrade to newer rc6 as it has got newer intel libraries baked in to it. 

My queue's in my media app's still for 1 week to go, so can test it out next week. 

 

Back up the 4 files existing in your flash drive before overwriting, and you should revert back by copying them back, 

nothing is permanent. Just revert back any vfio's before applying and the reapply them afterwards, just in case if you use them.

 

Have a backup what is important on those zfs pools 

 

*** IMPORTANT***
You will loose NVIDIA support

 

I don't care about it as I use the ARC now 

 

Edited by DeNiX
addition
Link to comment
  • 1 month later...

I'm currently suffering from major stability issues with my x670 7950x build. Call traces (not related macvlan), usbs dropping, cpu cores not responding, dropping network connection (but sometimes the server can still access the internet).. 

I'm wondering if this kernel would help. On unraid, I can't get 6 to 8 hours of uptime.. On windows server? I'm going on 1 day and 2 hours.
 

Are there any disadvantages to running this kernel? I don't nor will I use ZFS and I don't plan on getting an nvidia gpu for ages. Especially if I can utilize the igpu on my cpu. (using the radeonTop plugin with plex causes crashes).

Edited by MightyRufo
Link to comment

U could try, I never had one day any issues nor on the official or the Thor kernel like you are describing, but hearing this, I rather check the PSU or ur bios settings. Make sure the ram is expo compatible not xmp, it makes a difference. And also check the temps. I have mine running with a 420mm Aio, because I couldn't find any air cooler that would tame this beast of a chip. I peak at 71 Celsius max load with an ambiant room temperature of 23 ~24 Celsius. 

Link to comment
1 hour ago, DeNiX said:

U could try, I never had one day any issues nor on the official or the Thor kernel like you are describing, but hearing this, I rather check the PSU or ur bios settings. Make sure the ram is expo compatible not xmp, it makes a difference. And also check the temps. I have mine running with a 420mm Aio, because I couldn't find any air cooler that would tame this beast of a chip. I peak at 71 Celsius max load with an ambiant room temperature of 23 ~24 Celsius. 

I certainly came to that conclusion myself as well. I'm trying to exhaust every option before I'm forced to start switching out hardware as I will have to eat the cost or wait for a warranty replacement. My motherboard is the gigabyte x670 aorus elite ax.. But I see a fair amount of peopling running the asus tuf x670e (such as yourself) without issues. I'm considering getting that board. I say this because the system seems 100% stable on windows and I wonder if it's just a major stability issue with this board.

I have checked temps and whatnot. I am running a 420 AIO as well. All checks out. Ram is indeed expo but I am not running it at its advertised speed. I am running it at the stock 48000. I haven't messed with c-states or typical idle control. Bios is at stock.

also. interesting enough, I copied the files for this kernel onto my flash drive and booted up unraid, opened terminal and typed sensors and I don't see any info about my cpu and whatnot. I attached a photo of what I see. I feel like that's odd given this is a x670 board.

msedge_H3p9oQRWLW.png

Edited by MightyRufo
Link to comment

Try running the system with expo enabled first, and disable the aggressive pbo in bios first and let the chip it self decide when and how much to boost, I know some board manufacturers like to pump the chip full of voltages, maybe the silicon doesn't like that also disable every setting for power management in the bios too, did you check for updates as well for the bios? 

 

Strange if you don't see the sensors in cli, could be that the sensors that is being used is not supported oob, but I have to dive in to that, and can get back at you later on the day. 

 

Also are you sure all of the disk are OK? 

I had some similar issues on my old server because of a dying ssd cache disk, what resulted in random reboots, hangs, slow feeling of everything you do on the server. 

Link to comment

Wouldn't running the system with expo enabled effectively enable the overlclock on the ram? Also never considered the pbo thing, I will try this if it crashes again.. Yesterday I went through and reseated all of my cables for everything. I did check for bios updates. I'm running the latest one as of checking today. I will say the system seems "more" stable with c-states disabled and typical current idle enabled. But my crashing is so random that it's hard to know for sure.

 

Yeah it's weird that I don't get any readings. I do get "more" readings on the thor kernel. I'm guessing asus uses a more popular method of the OS interaction with temps.

All disks are okay, all disks were used in a previous setup. No slow downs or errors in disk logs. I know what you're talking about though -- I've had issues where an ssd would cause the system to completely hang on my desktop.

I will say, I am going 11 hours of up time now on stock unraid kernel. Let's see what happens. Sometimes it can take a while. 6 to 8 hours of uptime was more of a recent estimate. The most I've gotten was 13 days so.. it's a waiting game now. Thanks for replying!

Edited by MightyRufo
Link to comment

Yes expo will mean that the ram will run faster thus the memory controller too, could be that it wants it tho, have you also tried running the server in safe mode after it hangs? Try that one also, if works better and stable, you could try with one service turned on, like vm, see how that goes, if stable turn on docker services, but don't start all dockers, start them one by one with a few days in between, if works keep adding containers untill it locks up, try same again but leave that container or service off and go on, maybe a misconfigured container or vm that could cause issues, also turn on syslog to flash, if it hangs you still would have the log to inspect further, otherwise it will be time for the parts cannon I'm afraid, I would in this case start with power supply, I have a titanium rated psu btw, maybe overkill, but it's one of the most important component to have, without clean power that mobo and chip will suffer, they are really power hungry :) if able to connect the second 12v cpu power plug to on the mobo 

 

And yes asus uses the nct6779d chip, 

The newer kernels support this out of the box

Link to comment

I have not tried safe mode. Though that is a good idea. Does safe mode just disable plugins? or does it disable VMs and Docker as well?
I was doing some research and came to conclusion that it could also be bad memory as well. Especially since the issues are so wide spread. Obviously the last part I want to replace is the motherboard. so I will do everything I can to avoid it lol. Let's hope a reseat of the cables wsa enough to fix it.

Server's been up for 1 day and 23 hours. Currently running without Docker and VMs. *fingers crossed*. You gave me some good troubleshooting tips and for that, I thank you. This is my first server set up so I'm learning.

Edited by MightyRufo
Link to comment

No problems, your welcome, in the past 3 years or so, i learned so much, because i thought unraid would run on anything, but never knew about underlaying, spending countless days on the forums, where most of the time nobody took time to answer, to put something togheter that would work combining diffrent post
Well if it works without the docker and vm's you could try enabling one of the services, and try to start a container or vm and see what happens next.
The safe mode should disable the plugins indeed, i dont know if docker and vm are also auto disabled, i'll have that too somebody else to answer. 
Coming back to the C-states, i had reboot moment for an kernel update, i checked my settings, I have c-states enabled, the only difference i could think of is that I use the 
ASUS provided CPU/PBO settings instead of the AMD provided ones, eg boost duration boost voltage boost frequency etc, check those settings too, maybe they have something similiar build in you could change to AMD or Gigabyte delivered settings
If setting is there in the bios for the os used set it to other instead of windows.
Also note, backwards compatibility of RAM speed will not mean the RAM will run stable on those frequencies, the RAM is tested on a specified Speed and other settings. 
Try enabling the RAM profile and test that also. Could be just a small setting thats causing this headache.

Link to comment
  • 2 weeks later...

Guess what? I think I -might- have figured out what the issue was.. My case has 4 USBs on top.. One of them didn't work from the beginning and I just assumed it was dead. Well I thought about it a coupled days ago and I was like "I wonder if reseating all of my cables fixed that too?" And sure enough, it did!

So interesting enough, I am going on 14 days of uptime. The system has been running as just a nas for that duration. No vms or containers for. No issues whatsoever. No usb issues or ethernet issues.. I will eventually turn vms and docker on but I wanted to kill two birds with one stone. It seems very strange to me but I wonder if the usb font panel connector (usb 3.0) was somehow the root cause. It'a just funny how my uptime is much better and that's fixed too..

 

Before I reseated all of my cables, I had the system in the bios as I was changing some things inside the case and noticed it rebooted. That's what made me think to reseat my cables. I wonder if somehow the cable was seated in a way that caused it to mess with the motherboard. I really have no idea.

Edited by MightyRufo
Link to comment

Glad to hear that it is working good , for such a long period whilst it didnt. 
Well cableing can be a issues, well maybe one cable wasnt all the way in, maybe it was a weird usb thingy, nobody will tell us :)
now try starting one of the two services and let that run for a week or so, do some load on the host, see if it keeps a live.. 
if so enable the second service, and see how that goes, apply some load to it, and when its staying stable, hurray, you solved the issue, 

and saved your self the parts cannon :D

Edited by DeNiX
Link to comment

2 days ago I turned everything back on. figured I could just start with dockers if it crashed since I felt like 15 days was enough to convince it me it was stable as just a NAS. So far so good! No issues!!

also, I managed to get temps and fan readings without using the thor kernel by adding the drivers manually to the driver file in the plugin folder like your post mentions and installing the it87 plugin (for it8689 sensor) from ich777. I believe I figured out which one was the cpu temp by just running a trancode in plex and waiting for that temp to go up what I know to be normal for my cpu. As far as the motherboard temp though, I am lost. This is the last piece of the puzzle lol.

EDIT: nevermind.. the readings are all wrong. CPU temp doesn't match what I see in corefreq and what I see in corefreq makes a lot more sense.

image.png.eecae53dba4604aad974d445f2740163.png

 

Edited by MightyRufo
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.