Extreme idle power draw with RTX GPU


Recommended Posts

I have replaced my server + dedicated Gaming system setup in January and couldn't be happier. I have pinned 12 of the 16 threads in my Ryzen 2700X that are passed solely to a Windows 10 VM.

 

Up until yesterday the system was drawing ~65 W in IDLE, which is high, but acceptable for the hardware inside my UNRAID server.

That is until I replaced the GTX 1070 I was previously using with a new RTX 2080.

Idle power draw has doubled to 125 W.

 

Please be aware, that I have stubbed the entire GPU (4 specific devices: GPU, Sound, USB-C, Some additional Controller) and Gaming performance inside the Windows 10 VM is satisfactory with the 2080. Power draw is as high when I boot the server without stubbing. There is also no noticable difference between booting GUI and normal Shell mode.

 

What I have noticed however, is that once I boot the Windows 10 VM and log out there (stopping costly background processes in Windows) the power draw goes down. I cant really measure the difference, because booting up the VM itself draws a few dozen extra W but these figures are the best I got:

 

Running a single plex stream in a docker on my UNRAID server increases power draw from 125 W to about 150 W. Booting the VM to the Desktop increases the Draw to roughly 170 W, but logging off in Windows (still running the VM) drops power draw (still streaming Plex) to 100 W.

 

This looks like a kernel/driver issue to me. From various reviews I can see the 20-series from NVIDIA drawing 3-5 W more than the 1070 in IDLE.

Any help is appreciated, because running 24/7 this doubles my power bill. I have zero clue where the extra 60 W would be going...

 

EDIT: To clarify: idle means only Unraid and docker containers running - array spun down.

Edited by rix
Link to comment
7 minutes ago, techsperion said:

Wish I could help test this but haven't got any kidneys left to sell for a 2080!

 

have you tried running:


top

in the terminal to see running processes with the 1070 vs the 2080?

 

wondering if there's extra resources being taken up by qemu or the like with the RTX popped in.

 

 

Thanks for your reply.

 

top/htop shows my cpu in deep sleep / less than 3% usage

 

the GPU fans spin merrily - the thing is, when stubbing the GPU it becomes invisible to UNRAID - there is no process that should use the gpu as long as the VM is not running.

 

 

What google tells me, the GPUs GDDR6 might not clock down in idle, without a relatively new driver. Sadly the unraid nvidia plugin is not up to date and as long as the GPU is stubbed it will not use the driver anyway...

 

Edited by rix
Link to comment
1 minute ago, bastl said:

I wouldn't really compare the 2080 with an 1070. It's way closer to an 1080ti which is way more power hungry than an 1070. Also each manufacture tunes it's cards differently with different power stages and different overclocks. 

True, though in IDLE both cards should consume less than 20 W:
grafik.png.8ed01b446808d5317a0198abc7685f4b.png

For the 1070 it was around 7-10 W

 

My issue lies with the 2080 consuming more than 50 W in idle.

Link to comment

I remember when the 2080ti came out, couple people had issues with the cards getting stuck in a certain powerstage, never changing the clock speeds as supposed to or had random artefacts without even stressing the cards. Someone knows if that issue also occured on the 2080's? This could explain the high idle load. Is the card changing the clock speeds inside a VM under load and on idle?

Link to comment

It looks like a feature of the cards (both the RTX 2080 and the Vega 56). It can't be an Unraid problem because, as you said, when stubbed Unraid isn't even aware of the card's presence so it just sits there uninitialised. It won't be until your Windows VM is started and a driver loaded that there's any means of controlling it. It looks like it might be something that can only be fixed by a firmware update. I suppose they are not intended to sit there powered but uninitialised. How does it behave if you reboot your server into the BIOS screen so that the card is actually producing a video output? It might be that the only work around is to start your VM in order to load the driver in order to control its power usage.

  • Upvote 1
Link to comment

I have now installed UNRAID NVIDIA which installs driver version 410.78. To use this as well as disable stubbing I have booted in shell mode (not the UNRAID GUI mode). I also checked for BIOS updates - the card is up to date. The RTX 2080 is now seen by Unraid and nvidia-smi.

 

With that I can achieve 100 W at idle, still 35 W more than previously. Also the vm works as previously.

 

Required:

  • Non-GUI boot
  • Disabled Stubbing
  • No booted VM
  • Spun down array

 

Before returning the GPU I would appreciate input from other RTX 2080 owners! Is your idle power consumption as high as mine?

If not, I am very interested in the model you are using.

 

My card is a "Gigabyte RTX 2080 Windforce".

Edited by rix
Link to comment
4 hours ago, John_M said:

It looks like a feature of the cards (both the RTX 2080 and the Vega 56). It can't be an Unraid problem because, as you said, when stubbed Unraid isn't even aware of the card's presence so it just sits there uninitialised. It won't be until your Windows VM is started and a driver loaded that there's any means of controlling it. It looks like it might be something that can only be fixed by a firmware update. I suppose they are not intended to sit there powered but uninitialised. How does it behave if you reboot your server into the BIOS screen so that the card is actually producing a video output? It might be that the only work around is to start your VM in order to load the driver in order to control its power usage.

Just tested this, in BIOS and in GUI Mode the Power Consumption is the same, which is ridiculously too high. So, what does that mean? Any way to fix this?

Link to comment

Have you tried

a) not stubbing the gpu (not using vfio-pci.ids in the flash config)

b) not plugging in anything to the gpu

c) not booting gui mode?

 

Is power draw still high when you run unraid then? for me it is even with a, b and c

If so, the issue could lie somewhere in the kernel... 😫

Link to comment

Is there any way of setting the default power mode of the video card? It's fundamentally broken and the fault of the video card if it doesn't boot into an idle mode by default (as tested by booting into your PC BIOS). This is only something Nvidia and the video card manufacturer can fix through changing settings in the card's BIOS.

Link to comment
9 hours ago, rix said:

Have you tried

a) not stubbing the gpu (not using vfio-pci.ids in the flash config) 

b) not plugging in anything to the gpu

c) not booting gui mode?

 

Is power draw still high when you run unraid then? for me it is even with a, b and c

If so, the issue could lie somewhere in the kernel... 😫

a) did not have to stub the video card. Is there a particular reason to do this?

b) this i will have to try. I remember something of a "zero watt"-state with no cables plugged in.

c) i tend to boot into no-gui mode (since i only have one gpu). this is where i first noticed the problem.

Link to comment
  • 5 months later...
  • 4 weeks later...

I too am having this same issue.  I have noticed actually my Unraid server idles at LOWER watts (10-15 watts lower) when the Windows VM with GPU pass-through is booted than when that VM is shutdown or sleeped.  I might build a 1 CPU 2 GB VM to run and pass-through the GPU to while I am not using it.

 

I have a RTX 2070 Super, and upgraded from a AMD RX 480 which I did not notice this issue.

Link to comment
  • 3 months later...
  • 4 months later...
On 9/23/2019 at 4:28 AM, hammsandwich said:

I too am having this same issue.  I have noticed actually my Unraid server idles at LOWER watts (10-15 watts lower) when the Windows VM with GPU pass-through is booted than when that VM is shutdown or sleeped.

 

Quote

The reason dedicated graphics card used by Unraid doesn't idle properly is because Unraid doesn't contain proper AMD/Nvidia drivers.

If you pass through the card to a VM then the card would run with the right drivers (in the VM) and thus would idle properly. (that is assuming there's always a VM using the card at all times, which should be the case - there's no point shutting down the VM while Unraid is running), @testdasi, Source

 

Link to comment
  • 3 months later...

Had the same issue with my former RTX 2080Ti EVGA SC Gaming.

I believe that the power management and fan management are tied to the drivers, and so, when the gaming VM is not launched, the card is in a bastard state, powered and that's all, she waits for the initialization that never come because her OS isn't running.

So like it has been said, building a microVM just with nvidia driver and running this VM while the big one is off could be the trick.

Link to comment
  • 3 years later...

you can use the scipt by i think spaceinvader to lower the cpu power draw in unraid with the VM shut off:

#!/bin/bash
# check for driver
command -v nvidia-smi &> /dev/null || { echo >&2 "nvidia driver is not installed you will need to install this from community applications ... exiting."; exit 1; }
echo "Nvidia drivers are installed"
echo
echo "I can see these Nvidia gpus in your server"
echo
nvidia-smi --list-gpus 
echo
echo "-------------------------------------------------------------"
# set persistence mode for gpus ( When persistence mode is enabled the NVIDIA driver remains loaded even when no active processes, 
# stops modules being unloaded therefore stops settings changing when modules are reloaded
nvidia-smi --persistence-mode=1
#query power state
gpu_pstate=$(nvidia-smi --query-gpu="pstate" --format=csv,noheader);
#query running processes by pid using gpu
gpupid=$(nvidia-smi --query-compute-apps="pid" --format=csv,noheader);
#check if pstate is zero and no processes are running by checking if any pid is in string
if [ "$gpu_pstate" == "P0" ] && [ -z "$gpupid" ]; then
echo "No pid in string so no processes are running"
fuser -kv /dev/nvidia*
echo "Power state is"
echo "$gpu_pstate" # show what power state is
else
echo "Power state is" 
echo "$gpu_pstate" # show what power state is
fi
echo
echo "-------------------------------------------------------------"
echo
echo "Power draw is now"
# Check current power draw of GPU
nvidia-smi --query-gpu=power.draw --format=csv
exit

In the Windows VM try Nvidia GPU Power Management Tool or nvidia inspector

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.