• 6.12 series - Increased power consumption - GPU/PCI-E Passthrough


    Interstellar
    • Minor

    Prior updating to 6.12-beta11 or so, my average idle draw according to my UPS was in the region of 30W. 

    After updating to 6.12-beta 11 and then 6.12.1, this increased to 40W.

    Downgrading back to 6.11.5 again, back to 30W.

     

    (The UPS recording has been validated against an external power meter which also shows an increase in idle power).

     

    Nothing else changed on the server otherwise.

     

    Checked that all ASPM things were the same, restarted multiple times, checked CPU usage (corefreq-cli showed CPU idling correctly at <3W and Package states correct, etc...). GUI said disks were spun down (but I did not physically check) - so maybe one or two were still running in the array.

     

    Sadly I don't have any logs for 6.12.1 (sorry!) but it may be worth if you have a power meter somewhere doing back to back 6.11.5 to 6.12.2 tests to see if there is something silly going on.

     

    When 6.12.2 is released I'll do some more digging, but for now I'm staying on 6.11.5.

     

     




    User Feedback

    Recommended Comments

    I have now confirmed this again.

     

    226333713_Screenshot2023-09-15at18_30_26.thumb.png.afbb45d701c2a78db810f515a1e5078e.pngSee screenshot.

     

    Very clear and sudden increase in power usage after I've upgraded to 6.12.4 from 6.11.5.

     

    powertop --auto-tune has been run (as normal)

    fans are running at the same speeds

    hard disks are spun down (power goes up by 5-6W each when spun up, 25W in total when all spinning.)

    corefreq shows the CPU still going into C3 state/<3W when idle.

    CPU usage over time is the same.

    All ASPM items seem to be enabled the same.

     

    The only thing I can think of is that the GPU (passthrough, bound to vfio) is not going into sleep after VM shutdown unlike it does in 6.11.5. Reason being I had to start the VM, then shut it down, to get the GPU/PCI-E slot/etc to go into a deeper sleep (if I didn't I also had 9-11W more power draw).

     

    @mgutt - Have you seen anything like this?

     

    @Devs - It would be interesting to know if there are any commands or other checks I can do to see what is going on, but as it stands I think I have to go back to 6.11.5, increasing my power usage by 30% due to SW updates doesn't seem like a good thing :D

    Edited by Interstellar
    Link to comment

    Please post the diags to see the hardware, interestingly most users report less power used with v6.12, and I've also notice a little decrease for both my main servers.

    Link to comment

    Just doing that, doing some additional anonymising. My name is in a lot of the files still (disk.cfg, smb config, lost, etc...) - the anonymising needs a bit of work still.

     

    DM'd you the pertinent files though.

     

    Edit: Another datapoint, although tentative. Instead of shutting the VM down I put it to sleep instead, unfortunately a Plex stream started at the same time (disk spinning plus CPU usage increase) but power usage is down a few W even then, so I'll see how it is overnight.

    Edited by Interstellar
    Link to comment

    Just for public, even with the small test I've done now if I sleep the VM the power levels go down to 'normal', i.e. low 30s.

     

    If I start it up and then shut it down, back to higher power levels.

     

    So without taking the GPU out and testing (not straight forward as it messes all the VFIO binds and stuff up for my OPNSENSE VM) I'm reasonably confident some of the changes w.r.t GPU and/or passthrough are causing this. Maybe the rebar stuff has something to do with it?

     

    Will confirm over the coming weeks that sleeping the VM with the GPU passthrough rather than shutting it down brings the power usage back to 'normal' levels.


    Edit: Starting it up and sleeping it again, back to 31W idle.

     

    So I'm 90% confident something has changed w.r.t the state of a VFIO bind AMD GPU after VM shutdown has changed which means it stays in a higher power level with 6.12 compared to 6.11.

    Edited by Interstellar
    Link to comment

    I do wonder what platform you're using. My j4125 system does consume 1w more compared to 6.11.5 (14,4 - 15 watt idle). I've double checked that multiple times now.

     

    Oddly enough, a i3-4130 system I've consumes less power(6,6 watt) compared to 6.11.5(8ish watt)

    Edited by Mainfrezzer
    Link to comment

    Nothing special:

     

    MSI B650M mATX

    RX6600 - VFIO bind passthrough

    1TB NVME Corsair MP510 - VFIO bind passthrough

    i5-13500

    Intel Quad Port NIC I340-T4

    2x32GB DDR4

    5x 3.5" SATA

    1TB 850 EVO

    320GB 2.5"

     

    221068620_Screenshot2023-09-16at09_28_07.thumb.png.7b89464ec18b9c3938d4333df4b92dd0.png

     

     

    In any case, now confirmed.

     

    If I shutdown a VM rather than sleep it, I get an increase in power usage.

     

    If I sleep the VM, I actually get a reduction in power usage by a few W compared to 6.11.5.

     

    I suspect this is to do with the state the GPU and/or the NVME SSD and/or the PCI-E lanes are in when the VM is asleep opposed to shutdown.

     

    As it's a complete faff to remove/add the GPU (to aid diagnostics). I'm going to make a temporary mini VM (4GB, less cores) that I can sleep (to stop using up 20GB of RAM unnecessarily...!)

     

    Edit: However the GPU seems to not want to give a signal out after a long period in sleep, so I had to resume and then stop the VM via WebGUI and restart it.

     

    Edit 2: I've removed the GPU [from the VM] and started the VM up and slept it again, shall see overnight what affect this has (i.e. does the state the SSD is in effect the power usage). - Power levels a tad lower maybe, but in the noise.

     

    Edited by Interstellar
    Link to comment

    Finally restarted,

     

    Although:

    If I force shutdown a hibernating VM and I try and restart I get this in a loop...

     

    vfio-pci 0000:03:00.0: Unable to change power state from D3cold to DO, device inaccessible

    vfio-pci 0000:03:00.0: Unable to change power state from D3cold to DO, device inaccessible

    vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

    vfio-pci 0000:03:00.1: Unable to change power state from D3cold to DO, device inaccessible

     

    Which whilst forcing a restart, gives some hints as to the problem, I assume the GPU is in D3cold when hibernating (and in 6.11.5 was in D3cold when shutdown) but is no longer going into D3Cold when shutdown in 6.12.

     

    I can't seem to find a command that I can run to check what state the GPU is in, does anyone know how?

     

     

    Link to comment

    Coming back to this, as having to keep a VM started to keep the GPU asleep is quite annoying.

     

    Are there any commands or investigations I can do to help?


    I'm going to avoid removing the GPU because its not easy to do and the server runs the house and the internet 😃

     

    Edit: To reiterate with an AMD GPU:

     

    Shutting down the VM in 6.11.5 results in roughly the same power level as a VM asleep in 6.12.

    Shutting down the VM in 6.12 results in power levels much higher.

    Given the errors I saw in the log above, it seems that it could be that a VM shutdown with a GPU in now doesn't go into D3cold vs something else, but I can't find any commands that would tell me which state the GPU is in.

    Edited by Interstellar
    • Upvote 1
    Link to comment

    This issue is still present in 6.12.6.

     

    I can provide a syslog at the weekend as I may have some spare time, I could also run certain commands if needed at the same time, so if there is any useful information that doesn't come with the diagnostics, please provide the commands now.

     

    Thanks.

    Link to comment
    On 12/19/2023 at 5:29 PM, Interstellar said:

    This issue is still present in 6.12.6.

     

    I can provide a syslog at the weekend as I may have some spare time, I could also run certain commands if needed at the same time, so if there is any useful information that doesn't come with the diagnostics, please provide the commands now.

     

    Thanks.

    I came across your thread here and your posts in the powertop thread trying to understand my own increase in power consumption that I noticed yesterday.
     

    I was running 6.9.X for a long time and decided to upgrade to 6.12.6 after some plugins wouldn’t update some weeks ago.

     

    I haven’t checked power consumption in a long time but I was idling at between 42-50 watts and I was surprised it now is around 72 watts. I have two VMs running and some docker containers.

     

    I can’t really power things down as it runs the internet here as well 😁 But will try to test over the weekend to see if I can see the same issues as you.

    Link to comment
    8 hours ago, minority said:

    I came across your thread here and your posts in the powertop thread trying to understand my own increase in power consumption that I noticed yesterday.
     

    I was running 6.9.X for a long time and decided to upgrade to 6.12.6 after some plugins wouldn’t update some weeks ago.

     

    I haven’t checked power consumption in a long time but I was idling at between 42-50 watts and I was surprised it now is around 72 watts. I have two VMs running and some docker containers.

     

    I can’t really power things down as it runs the internet here as well 😁 But will try to test over the weekend to see if I can see the same issues as you.

     

    Do you have a GPU passed through to one of them? If so sleep that one and see what happens. Immediate drop in power usage for me.

    Link to comment

    It turns out it is because of the Kingston SSD I use. Replaced with a Samsung 980 Pro now, power consumption has dropped. Same experience described here. I also suspect I may have had this issue for longer than I thought, without noticing.

     

    I hope you get to the bottom of your issue.

     

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.