Terrible gaming performance


Recommended Posts

Recently, I upgraded my HP DL180 G6 (2x Intel X5660 @ 2.80 GHz, 72GB ram, Nvidia GTX 1060) to an HP DL380p G8 (2x Intel E5-2690 @ 2.9GHz, 80GB ram, Nvidia GTX 1070).

 

On my previous server, I had a Windows VM that was running on 8 CPU threads with the 1060 passed in and 16GB of ram allocated to it, and it ran great. I was able to play any game I wanted to without any issue. I'm not one who needs the ultra settings, and this ran without issues. Even ran Oculus games with it fine. Everything was amazing.

 

I was wanting some more horsepower for another VM I have that's doing GIS processing, so I upgraded a bit to the new server. I pretty much just migrated my drives and ram over to the new box with the same Unraid stick so I could keep the configs for everything. That worked fine for the most part, but I had performance issues with my Windows VM and so I started from scratch on it, but the GPU performance on it is just awful.

 

I can get acceptable frame rates on some games, but on some games (Quantum Break being the newest culprit), I can't get more than 10 fps.

 

I've tried some benchmarks and the results are so hit or miss. On Unigine, my scores are right where other 1070 cards are at, but on things like Cinebench, my scores are lower than the GTX 650m!

 

Here's the results from UserBenchmark which is quite a bit lower than it should be.

 

VM Config:

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='4'>
  <name>Reynold</name>
  <uuid>be6aab76-20b4-1055-7fb7-a96ebb5fab17</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>12</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='10'/>
    <vcpupin vcpu='1' cpuset='26'/>
    <vcpupin vcpu='2' cpuset='11'/>
    <vcpupin vcpu='3' cpuset='27'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='28'/>
    <vcpupin vcpu='6' cpuset='13'/>
    <vcpupin vcpu='7' cpuset='29'/>
    <vcpupin vcpu='8' cpuset='14'/>
    <vcpupin vcpu='9' cpuset='30'/>
    <vcpupin vcpu='10' cpuset='15'/>
    <vcpupin vcpu='11' cpuset='31'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/be6aab76-20b4-1055-7fb7-a96ebb5fab17_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='6' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/disk/by-id/ata-Crucial_CT525MX300SSD1_1741192D0174'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Reynold/vdisk2.img'/>
      <backingStore/>
      <target dev='hdd' bus='virtio'/>
      <alias name='virtio-disk3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='sata0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:1c:53:e3'/>
      <source bridge='br0'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-4-Reynold/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x24' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x24' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x27' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x0d8c'/>
        <product id='0x0014'/>
        <address bus='1' device='3'/>
      </source>
      <alias name='hostdev3'/>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

Here's my topology showing that the GPU is being used by the correct cores:

YmyrpWH.png

 

I've also attached my diagnostics.


The only thing I can think of is that I had to apply the RMRR patch, and that is somehow slowing down my VM?

 

 

Can anyone point me in the right direction?

gunhaver-diagnostics-20190105-1515.zip

Link to comment

When I was running a similar VM and switched hardware I had to create a new VM from scratch. You may want to also uninstall the Nvidia drivers in the VM and then reinstall them.  Also check the pcie slot like 1812 mentioned but also check the power cables etc. another thing would be to check the power mode for the GPU in the nvidia setting menu. 

Link to comment
3 hours ago, 1812 said:

check your bios to force the highest pcie revision on each slot. automatic can be iffy sometimes. that's where I'd start.

 

 

I had a look in there and that appears to be set already.

 

3 hours ago, ucliker said:

When I was running a similar VM and switched hardware I had to create a new VM from scratch. You may want to also uninstall the Nvidia drivers in the VM and then reinstall them.  Also check the pcie slot like 1812 mentioned but also check the power cables etc. another thing would be to check the power mode for the GPU in the nvidia setting menu. 

Yeah this is a freshly created VM. I ran some stress tests on the GPU and it seemed that it could use up all of it's TDP without any issue.

Link to comment

On my system, I fixed several performance issues, but more importantly several game compatibility issues by switching the machine type from the UnRaid default of i440FX, and used the Q35 machine type instead...  Make a backup of the VM, and just try creating a new VM template that points to the old image file...  Windows will usually take a very long time to boot the first boot, but usually makes it through...  I had Windows license issues once doing something like that though, so don't skip the backups...

 

Also for ucliker, for most systems, the x16 and x8 issue is really a non-issue...  See this link... https://www.gamersnexus.net/guides/2488-pci-e-3-x8-vs-x16-performance-impact-on-gpus

 

Edit: The x16 vs x8 issue boils down to affecting loading speed only sometimes, and some minor micro-stuttering when the GPU has to talk to the rest of the system...  Frame rate and most of the user experience comes from the speed of the card crunching on what it already has loaded into VRam, and so bus speed is mostly irrelevant for that stuff...  You can mostly game on a 4x or even a 1x slot just fine with some games, it just takes longer to load...  The new PCIe v4 slots coming soon are mostly needed for ultra speed network cards and NVME Raid...

Edited by Warrentheo
Link to comment
3 hours ago, Warrentheo said:

On my system, I fixed several performance issues, but more importantly several game compatibility issues by switching the machine type from the UnRaid default of i440FX, and used the Q35 machine type instead...  Make a backup of the VM, and just try creating a new VM template that points to the old image file...  Windows will usually take a very long time to boot the first boot, but usually makes it through...  I had Windows license issues once doing something like that though, so don't skip the backups...

 

Also for ucliker, for most systems, the x16 and x8 issue is really a non-issue...  See this link... https://www.gamersnexus.net/guides/2488-pci-e-3-x8-vs-x16-performance-impact-on-gpus

 

Edit: The x16 vs x8 issue boils down to affecting loading speed only sometimes, and some minor micro-stuttering when the GPU has to talk to the rest of the system...  Frame rate and most of the user experience comes from the speed of the card crunching on what it already has loaded into VRam, and so bus speed is mostly irrelevant for that stuff...  You can mostly game on a 4x or even a 1x slot just fine with some games, it just takes longer to load...  The new PCIe v4 slots coming soon are mostly needed for ultra speed network cards and NVME Raid...

I had terrible performance with x8 but only when I was running my Titan X and not my old Quadro. I do agree Q35 machine type is better even for MacOS vm’s. 

Link to comment
13 hours ago, ucliker said:

I had terrible performance with x8 but only when I was running my Titan X and not my old Quadro. I do agree Q35 machine type is better even for MacOS vm’s. 

A Titan might be one of the few cards that it actually matters on since it might actually be hungry enough to saturate the bus...  Most of us can only dream of having that problem from afar... 😄

  • Like 1
Link to comment

I tried formatting the VM and starting from scratch with the machine type as Q35 and it has the same pathetic performance.

 

I tried a benchmark on Shadow of War and got an average of 12 fps. According to what I've seen from other results is that I should be getting ~80 fps in that.

Link to comment

I have a very similar setup to you and have diagnosed NUMA headaches for longer than I care to remember! 

A few things to try.... (which made my performance better).

 

  • Switch to a Q35 VM. It might not yield any performance increase right now, but there are some changes in the pipeline for QEMU 3.2\4.0 which will increase performance of passed through PCIe devices. (which should be included in the next version of unraid).
  • After youve flipped to Q35, add an emulatorpin value to take the pressure off of core 0 (which it will be using by default). keeping it on the same numa node as your passed through CPUs would most likely be best. so it'll look like this:
     
      <vcpu placement='static'>12</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='10'/>
        <vcpupin vcpu='1' cpuset='26'/>
        <vcpupin vcpu='2' cpuset='11'/>
        <vcpupin vcpu='3' cpuset='27'/>
        <vcpupin vcpu='4' cpuset='12'/>
        <vcpupin vcpu='5' cpuset='28'/>
        <vcpupin vcpu='6' cpuset='13'/>
        <vcpupin vcpu='7' cpuset='29'/>
        <vcpupin vcpu='8' cpuset='14'/>
        <vcpupin vcpu='9' cpuset='30'/>
        <vcpupin vcpu='10' cpuset='15'/>
        <vcpupin vcpu='11' cpuset='31'/>
        <emulatorpin cpuset='9,25'/>
      </cputune>

    Personally, I have my main workstation VM running off of cores on NUMA node 0, so I have my emulatorpin there. With the QEMU service running on Node0 too, it might be worth testing your emulatorpin on that node too, so 7,23 maybe. personally, I also stub those cpu cores the same as the rest to ensure nothing else is stealing cycles from my VM.

  • Add some additional hyper-v enlightenments (i cant remember if all of these are standard with unraid, but here they are anyway)

    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='none'/>
      <frequencies state='on'/>
    </hyperv>
  • MSI fix will most likely need to be applied to your GPU and GPU Audio device. https://forums.guru3d.com/threads/windows-line-based-vs-message-signaled-based-interrupts.378044/ (Use the v2 utility)
  • Last but by no means least is that your storage is based on NUMA node 0, and everything else is on node 1. Latency will be an issue here. not sure how viable this will be, but if you can, flip your 1070 into a PCIe slot associated with NUMA node 0, change your cpus to that node too (and your emulatorpin), and see how things are there. 
    Another alternative is if you have a spare hdd controller, with only the SSD you're using, pass that through if you're able to, as it'll cut out the QEMU middleman between Windows and the SSD. 

I think you'll notice the biggest difference with the emulatorpin change. 

Edited by billington.mark
  • Upvote 1
Link to comment
8 hours ago, 1812 said:

If after that, the issue remains, and you still suspect numa  issues, try pulling a processor out.

 

amd if you’re bored waiting, download gpuz and see what speed and slot size it shows just for fun.

FYI, GPU-z lies...

if you really want to see what your PCIe lane situation is for your passed through NVIDIA card, have a look in NVIDIA control panel> help> System information. Then scroll down to BUS.

 

This is because the PCIe root ports created on a Q35 machine are x1 ports by default. 

in QEMU 3.2 (I think), you can add some extra XML to force the root port to be x16. And in 4.0 all root ports will be x16 by default. 

 

Link to comment
7 hours ago, billington.mark said:

I have a very similar setup to you and have diagnosed NUMA headaches for longer than I care to remember! 

A few things to try.... (which made my performance better).

 

  • Switch to a Q35 VM. It might not yield any performance increase right now, but there are some changes in the pipeline for QEMU 3.2\4.0 which will increase performance of passed through PCIe devices. (which should be included in the next version of unraid).
  • After youve flipped to Q35, add an emulatorpin value to take the pressure off of core 0 (which it will be using by default). keeping it on the same numa node as your passed through CPUs would most likely be best. so it'll look like this:
     
    
      <vcpu placement='static'>12</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='10'/>
        <vcpupin vcpu='1' cpuset='26'/>
        <vcpupin vcpu='2' cpuset='11'/>
        <vcpupin vcpu='3' cpuset='27'/>
        <vcpupin vcpu='4' cpuset='12'/>
        <vcpupin vcpu='5' cpuset='28'/>
        <vcpupin vcpu='6' cpuset='13'/>
        <vcpupin vcpu='7' cpuset='29'/>
        <vcpupin vcpu='8' cpuset='14'/>
        <vcpupin vcpu='9' cpuset='30'/>
        <vcpupin vcpu='10' cpuset='15'/>
        <vcpupin vcpu='11' cpuset='31'/>
        <emulatorpin cpuset='9,25'/>
      </cputune>

    Personally, I have my main workstation VM running off of cores on NUMA node 0, so I have my emulatorpin there. With the QEMU service running on Node0 too, it might be worth testing your emulatorpin on that node too, so 7,23 maybe. personally, I also stub those cpu cores the same as the rest to ensure nothing else is stealing cycles from my VM.

  • Add some additional hyper-v enlightenments (i cant remember if all of these are standard with unraid, but here they are anyway)


    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='none'/>
      <frequencies state='on'/>
    </hyperv>
  • MSI fix will most likely need to be applied to your GPU and GPU Audio device. https://forums.guru3d.com/threads/windows-line-based-vs-message-signaled-based-interrupts.378044/ (Use the v2 utility)
  • Last but by no means least is that your storage is based on NUMA node 0, and everything else is on node 1. Latency will be an issue here. not sure how viable this will be, but if you can, flip your 1070 into a PCIe slot associated with NUMA node 0, change your cpus to that node too (and your emulatorpin), and see how things are there. 
    Another alternative is if you have a spare hdd controller, with only the SSD you're using, pass that through if you're able to, as it'll cut out the QEMU middleman between Windows and the SSD. 

I think you'll notice the biggest difference with the emulatorpin change. 

Unfortunately I can't move my 1070 to the other NUMA with the way the server case is. It just won't fit.

 

But I have 12 threads dedicated on CPU1 where the GPU is, and I put in the Hyper-V settings you had above, with the emulatorpin tag set on a core+thread on CPU0 and my system is going awesome now. Thank you so much!

 

At 1080p Ultra settings in Shadow of War, I'm getting 60fps in the benchmark now. I can actually play games again. I can't thank you enough!!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.