critical temperature reached unrad is shutting down

TIE Fighter · June 12

hi all

My unraid server with two win 10 vms running is force shutting down and syslog shows warning "critical temperature reached, shutting down"

after abt 40min of gameplay testing.

cinebench testing on each vm goes without problems.

the server was updated recently but i cant remember having this problems before.

I'm not sure what the correct way to find the device associated with the thermal_zone0 name is.

something in /sys/class/thermal/thermal_zone0 folder?

sensors att

i have searched for why this is happening but to no avail.

any help would be much appreciated

syslog att

syslog-previous

Kilrah · June 12

What were the drive temps? Only aware of drive temp related auto-shutdown on unraid.

TIE Fighter · June 12

no high temp warnings from any of the drives in a well ventilated game case.

JorgeB · June 13

HDD temps should not cause shutdown, usually only CPU overheating, and this is controlled by the kernel and firmware, not Unraid.

Kilrah · June 13

16 minutes ago, JorgeB said:

HDD temps should not cause shutdown

The parity check tuning plugin has a feature to shutdown based on drive temps, although it's not clear whether it's only active during a parity op or all the time.

OP doesn't have it installed though so yeah, not that.

JorgeB · June 13

Yep, and that should be logged as coming from the kernel.

itimpi · June 13

3 hours ago, Kilrah said:

temps, although it's not clear whether it's only active during a parity op or all the time.

I would have to check the code, bit I think it is only active during a parity check although it would be easy to adjust it to always be active. However if it is triggered that way you end up with messages in the syslog from the plugin and notifications (assuming you get a chance to see them) so it would be obvious what triggered it.

Mainfrezzer · June 13

17 hours ago, TIE Fighter said:

I'm not sure what the correct way to find the device associated with the thermal_zone0 name is.

something in /sys/class/thermal/thermal_zone0 folder?

"sensors -u" should show your the devices listed under thermal_zone0

Edit:

Although, it seems like this might be the culprit

17 hours ago, TIE Fighter said:

That seems oddly low of a critical temp, for anything really. You could try to start with "thermal.nocrt=1" to disable the automatic shutdown feature and re-do what you did when it originally triggered while having an eye on the sensor to see which one hit critical.

Edited June 13 by Mainfrezzer

TIE Fighter · June 13

23 hours ago, itimpi said:

I would have to check the code, bit I think it is only active during a parity check although it would be easy to adjust it to always be active. However if it is triggered that way you end up with messages in the syslog from the plugin and notifications (assuming you get a chance to see them) so it would be obvious what triggered it.

no "parity check tuning" plugin installed as i do no have a parity drive in the array yet.

23 hours ago, Mainfrezzer said:

"sensors -u" should show your the devices listed under thermal_zone0

Edit:

Although, it seems like this might be the culprit

That seems oddly low of a critical temp, for anything really. You could try to start with "thermal.nocrt=1" to disable the automatic shutdown feature and re-do what you did when it originally triggered while having an eye on the sensor to see which one hit critical.

I disabled the plugin "corefreq" deamon and uninstall the plugin.

gamed on both Vm:s for about one hour and no force shutdown yet.

however one Vm was crashing with " vfio-pci 0000:4a:00.0: vfio_bar_restore: reset recovery - restoring BARs" in syslog,

i did some more readings in the forums and added "pcie_aspm=off" to flash syslinux config file and that seems to solve it.

i'll return if critical temp issue persist after more testing.

Edited June 14 by TIE Fighter

TIE Fighter · June 14

On 6/13/2024 at 1:55 PM, Mainfrezzer said:

"sensors -u" should show your the devices listed under thermal_zone0

Edit:

Although, it seems like this might be the culprit

That seems oddly low of a critical temp, for anything really. You could try to start with "thermal.nocrt=1" to disable the automatic shutdown feature and re-do what you did when it originally triggered while having an eye on the sensor to see which one hit critical.

Did some more test gaming and yet again the server auto shutdown due to critical temp reached, this was again after about 40min of gameplay.

where do you suggest the "thermal.nocrt=1" ? in terminal, command not found.

Edited June 14 by TIE Fighter

critical temperature reached unrad is shutting down

Recommended Posts

TIE Fighter

Link to comment

Kilrah

Link to comment

TIE Fighter

Link to comment

JorgeB

Link to comment

Kilrah

Link to comment

JorgeB

Link to comment

itimpi

Link to comment

Mainfrezzer

Link to comment

TIE Fighter

Link to comment

TIE Fighter

Link to comment

Join the conversation