Tautvis Posted July 21 Share Posted July 21 Hi, I'm running into issues with my GPU thermal throttling under load and could use some help/advice. Here are my relevant server specs: Dell PowerEdge R720 CPUs - 2x Intel Xeon CPU E5-2697 v2 - 12 cores / 24 HT per GPU - NVIDIA GTX 1660 SUPER Power - 2x 750W Power Supplies Additional info: I've got a 5 cores/10 threads pinned to plex, and 6 cores/12 threads pinned to a modded minecraft server (mentioning because of higher load potentially interfering?). I did follow best practices and left CPU 0 / HT 24 unpinned for system performance. System has 3 cores / 6 HT unpinned at the end as well, will all other cores pinned to specific containers. I know pinning isn't great but I have a bunch of containers running and I want to be able to support a bunch of streams at once. Currently can support 2-3 streams before things get risky. Here's my testing scenario: I ran 4 plex streams locally, all of them transcoding down to some other resolution other than original. Immediately, only one stream kept playing back normally. Additionally, the entire time plex is running/buffering, the GPU temperature rises from mid 50's to 90, leading to thermal throttling. Relevant info for test: The single successful transcode was running at 11.3Mbps at a transcode speed of 3.6, the other unsuccessful ones were trying to transcode at 9.5Mbps, 679kbps, and 11.4Mbps. GPU info as a snapshot when it started to throttle (it was also comparable before the thermal throttling began as well): Load - Memory: 6% - 9% Encoder - Decoder: 14% - 17% GPU - Memory (MHz): 1530 - 6801 Fan - Power: 0% - 56W Power State - Throttling: P2 - Yes(sw_thermal_slowdown) Active Apps: Plex CPU core snapshot when GPU was throttling (about the same pre-throttling as well): CPU 1 - HT 25: 27% - 31% CPU 2 - HT 26: 2% - 8% CPU 3 - HT 27: 33% - 21% CPU 4 HT 28: 6% - 2% CPU 5 - HT 29: 14% - 17% I have tried both NVIDIA specific production drivers and open source drivers. The main difference that I found is that my GPU reached P0 when running open source drivers (it runs mostly P2 either way), so I've stuck with that in my tests. My main goal is to get the fans running, with a secondary goal of figuring out why the heck transcoding isn't going so well (probably related to thermals). If you've gotten this far, thanks for reading and possibly helping out! Quote Link to comment
Tautvis Posted July 22 Author Share Posted July 22 If anyone else is having this problem, a workaround that I found is as follows: 1) Install the Nerd Tools plugin 2) From the nerd tools plugin, install the IPMI tools scripts 3) Run ipmitool -I lanplus -H <idrac IP> -U <idrac user> -P <password> raw 0x30 0x30 0x01 0x00 (This turns on manual fan control) 3) Run ipmitool -I lanplus -H <idrac IP> -U <idrac user> -P <password> raw 0x30 0x30 0x02 0xff 0x23 (For reference, the 0x23 value is what controls the fan speed, which in this case is setting it to 35%. For me this works, but for you it might need to be lower/higher). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.