Jump to content

6.12.10 - GTX 1660 SUPER Throttling - 0% Fan Use


Recommended Posts

Hi, I'm running into issues with my GPU thermal throttling under load and could use some help/advice.

 

Here are my relevant server specs:

Dell PowerEdge R720

CPUs - 2x Intel Xeon CPU E5-2697 v2 - 12 cores / 24 HT per

GPU - NVIDIA GTX 1660 SUPER

Power - 2x 750W Power Supplies

 

Additional info:

I've got a 5 cores/10 threads pinned to plex, and 6 cores/12 threads pinned to a modded minecraft server (mentioning because of higher load potentially interfering?). I did follow best practices and left CPU 0 / HT 24 unpinned for system performance. System has 3 cores / 6 HT unpinned at the end as well, will all other cores pinned to specific containers.

I know pinning isn't great but I have a bunch of containers running and I want to be able to support a bunch of streams at once. Currently can support 2-3 streams before things get risky.

 

Here's my testing scenario:

I ran 4 plex streams locally, all of them transcoding down to some other resolution other than original.

 

Immediately, only one stream kept playing back normally.

 

Additionally, the entire time plex is running/buffering, the GPU temperature rises from mid 50's to 90, leading to thermal throttling.

 

Relevant info for test:

The single successful transcode was running at 11.3Mbps at a transcode speed of 3.6, the other unsuccessful ones were trying to transcode at 9.5Mbps, 679kbps, and 11.4Mbps.

 

GPU info as a snapshot when it started to throttle (it was also comparable before the thermal throttling began as well):

Load - Memory:  6% - 9%

Encoder - Decoder: 14% - 17%

GPU - Memory (MHz): 1530 - 6801

Fan - Power: 0% - 56W

Power State - Throttling: P2 - Yes(sw_thermal_slowdown)

Active Apps: Plex

 

CPU core snapshot when GPU was throttling (about the same pre-throttling as well):

CPU 1 - HT 25: 27% - 31%

CPU 2 - HT 26: 2% - 8%

CPU 3 - HT 27: 33% - 21%

CPU 4 HT 28: 6% - 2%

CPU 5 - HT 29: 14% - 17%

 

I have tried both NVIDIA specific production drivers and open source drivers. The main difference that I found is that my GPU reached P0 when running open source drivers (it runs mostly P2 either way), so I've stuck with that in my tests.

 

My main goal is to get the fans running, with a secondary goal of figuring out why the heck transcoding isn't going so well (probably related to thermals).

 

If you've gotten this far, thanks for reading and possibly helping out!

Link to comment

If anyone else is having this problem, a workaround that I found is as follows:

1) Install the Nerd Tools plugin
2) From the nerd tools plugin, install the IPMI tools scripts

3) Run ipmitool -I lanplus -H <idrac IP> -U <idrac user> -P <password> raw 0x30 0x30 0x01 0x00
(This turns on manual fan control)
3) Run ipmitool -I lanplus -H <idrac IP> -U <idrac user> -P <password> raw 0x30 0x30 0x02 0xff 0x23
(For reference, the 0x23 value is what controls the fan speed, which in this case is setting it to 35%. For me this works, but for you it might need to be lower/higher).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...