Jump to content

critical temperature reached unrad is shutting down


Recommended Posts

hi all

 

My unraid server with two win 10 vms running is force shutting down and syslog shows warning "critical temperature reached, shutting down"

after abt 40min of gameplay testing.

cinebench testing on each vm goes without problems. 

the server was updated recently but i cant remember having this problems before.       

Screenshot_20240612_104123_Chrome.thumb.jpg.b124eed375382298bdcf093204602fc1.jpg

I'm not sure what the correct way to find the device associated with the thermal_zone0 name is. 

something in /sys/class/thermal/thermal_zone0 folder?

sensors att

sensors1.thumb.png.3d810b3e7027b2080f2d042eb9f9b7c7.png

 

sensors2.thumb.png.d8298b3ea9ea2d831a42d14d02fbbf1c.png

i have searched for why this is happening but to no avail.

any help would be much appreciated

 

syslog att

syslog-previous

Link to comment
16 minutes ago, JorgeB said:

HDD temps should not cause shutdown

The parity check tuning plugin has a feature to shutdown based on drive temps, although it's not clear whether it's only active during a parity op or all the time.

OP doesn't have it installed though so yeah, not that.

Link to comment
3 hours ago, Kilrah said:

temps, although it's not clear whether it's only active during a parity op or all the time.

I would have to check the code, bit I think it is only active during a parity check although it would be easy to adjust it to always be active.  However if it is triggered that way you end up with messages in the syslog from the plugin and notifications (assuming you get a chance to see them) so it would be obvious what triggered it.

  • Like 1
Link to comment
17 hours ago, TIE Fighter said:

 

 

I'm not sure what the correct way to find the device associated with the thermal_zone0 name is. 

something in /sys/class/thermal/thermal_zone0 folder?

 

"sensors -u" should show your the devices listed under thermal_zone0


Edit:

Although, it seems like this might be the culprit 

 

  

17 hours ago, TIE Fighter said:

 

sensors2.png.d2fd6047d73db8e11332a78f3869cdc0.png.f4ff2be7328de0b96326ee3ee6f8c3da.png

 

 

That seems oddly low of a critical temp, for anything really. You could try to start with "thermal.nocrt=1" to disable the automatic shutdown feature and re-do what you did when it originally triggered while having an eye on the sensor to see which one hit critical.

Edited by Mainfrezzer
Link to comment
Posted (edited)
23 hours ago, itimpi said:

I would have to check the code, bit I think it is only active during a parity check although it would be easy to adjust it to always be active.  However if it is triggered that way you end up with messages in the syslog from the plugin and notifications (assuming you get a chance to see them) so it would be obvious what triggered it.

no "parity check tuning" plugin installed as i do no have a parity drive in the array yet.

23 hours ago, Mainfrezzer said:

"sensors -u" should show your the devices listed under thermal_zone0


Edit:

Although, it seems like this might be the culprit 

 

  

That seems oddly low of a critical temp, for anything really. You could try to start with "thermal.nocrt=1" to disable the automatic shutdown feature and re-do what you did when it originally triggered while having an eye on the sensor to see which one hit critical.

 

I disabled the plugin "corefreq" deamon and uninstall the plugin.

gamed on both Vm:s for about one hour and no force shutdown yet.  

 

however one Vm was crashing with " vfio-pci 0000:4a:00.0: vfio_bar_restore: reset recovery - restoring BARs" in syslog,

i did some more readings in the forums and added "pcie_aspm=off" to flash syslinux config file and that seems to solve it.

 

i'll return if critical temp issue persist after more testing.

 

Edited by TIE Fighter
Link to comment
Posted (edited)
On 6/13/2024 at 1:55 PM, Mainfrezzer said:

"sensors -u" should show your the devices listed under thermal_zone0


Edit:

Although, it seems like this might be the culprit 

 

  

That seems oddly low of a critical temp, for anything really. You could try to start with "thermal.nocrt=1" to disable the automatic shutdown feature and re-do what you did when it originally triggered while having an eye on the sensor to see which one hit critical.

Did some more test gaming and yet again the server auto shutdown due to critical temp reached, this was again after about 40min of gameplay.

where do you suggest the "thermal.nocrt=1" ? in terminal, command not found.

 

Edited by TIE Fighter
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...