Jump to content
m0ngr31

Terrible gaming performance

25 posts in this topic Last Reply

Recommended Posts

Recently, I upgraded my HP DL180 G6 (2x Intel X5660 @ 2.80 GHz, 72GB ram, Nvidia GTX 1060) to an HP DL380p G8 (2x Intel E5-2690 @ 2.9GHz, 80GB ram, Nvidia GTX 1070).

 

On my previous server, I had a Windows VM that was running on 8 CPU threads with the 1060 passed in and 16GB of ram allocated to it, and it ran great. I was able to play any game I wanted to without any issue. I'm not one who needs the ultra settings, and this ran without issues. Even ran Oculus games with it fine. Everything was amazing.

 

I was wanting some more horsepower for another VM I have that's doing GIS processing, so I upgraded a bit to the new server. I pretty much just migrated my drives and ram over to the new box with the same Unraid stick so I could keep the configs for everything. That worked fine for the most part, but I had performance issues with my Windows VM and so I started from scratch on it, but the GPU performance on it is just awful.

 

I can get acceptable frame rates on some games, but on some games (Quantum Break being the newest culprit), I can't get more than 10 fps.

 

I've tried some benchmarks and the results are so hit or miss. On Unigine, my scores are right where other 1070 cards are at, but on things like Cinebench, my scores are lower than the GTX 650m!

 

Here's the results from UserBenchmark which is quite a bit lower than it should be.

 

VM Config:

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='4'>
  <name>Reynold</name>
  <uuid>be6aab76-20b4-1055-7fb7-a96ebb5fab17</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>12</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='10'/>
    <vcpupin vcpu='1' cpuset='26'/>
    <vcpupin vcpu='2' cpuset='11'/>
    <vcpupin vcpu='3' cpuset='27'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='28'/>
    <vcpupin vcpu='6' cpuset='13'/>
    <vcpupin vcpu='7' cpuset='29'/>
    <vcpupin vcpu='8' cpuset='14'/>
    <vcpupin vcpu='9' cpuset='30'/>
    <vcpupin vcpu='10' cpuset='15'/>
    <vcpupin vcpu='11' cpuset='31'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/be6aab76-20b4-1055-7fb7-a96ebb5fab17_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='6' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/disk/by-id/ata-Crucial_CT525MX300SSD1_1741192D0174'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Reynold/vdisk2.img'/>
      <backingStore/>
      <target dev='hdd' bus='virtio'/>
      <alias name='virtio-disk3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='sata0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:1c:53:e3'/>
      <source bridge='br0'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-4-Reynold/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x24' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x24' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x27' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x0d8c'/>
        <product id='0x0014'/>
        <address bus='1' device='3'/>
      </source>
      <alias name='hostdev3'/>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

Here's my topology showing that the GPU is being used by the correct cores:

YmyrpWH.png

 

I've also attached my diagnostics.


The only thing I can think of is that I had to apply the RMRR patch, and that is somehow slowing down my VM?

 

 

Can anyone point me in the right direction?

gunhaver-diagnostics-20190105-1515.zip

Share this post


Link to post

check your bios to force the highest pcie revision on each slot. automatic can be iffy sometimes. that's where I'd start.

 

 

Share this post


Link to post

When I was running a similar VM and switched hardware I had to create a new VM from scratch. You may want to also uninstall the Nvidia drivers in the VM and then reinstall them.  Also check the pcie slot like 1812 mentioned but also check the power cables etc. another thing would be to check the power mode for the GPU in the nvidia setting menu. 

Share this post


Link to post
3 hours ago, 1812 said:

check your bios to force the highest pcie revision on each slot. automatic can be iffy sometimes. that's where I'd start.

 

 

I had a look in there and that appears to be set already.

 

3 hours ago, ucliker said:

When I was running a similar VM and switched hardware I had to create a new VM from scratch. You may want to also uninstall the Nvidia drivers in the VM and then reinstall them.  Also check the pcie slot like 1812 mentioned but also check the power cables etc. another thing would be to check the power mode for the GPU in the nvidia setting menu. 

Yeah this is a freshly created VM. I ran some stress tests on the GPU and it seemed that it could use up all of it's TDP without any issue.

Share this post


Link to post

My guess would be driver issue. Did you check the power supply and cables?

Share this post


Link to post

Yes, I re-installed the drivers and cables look fine. Same issue. I think it might be a numa issue. I might make a new thread with that information...

Share this post


Link to post

I disabled NUMA in my BIOS and it didn't help any.

Share this post


Link to post

Can you check and make sure the 1070 is running at x16 and not x8?

Share this post


Link to post

It is running in an x8 slot, but the performance shouldn't be this bad, and it's really only bad for certain things... I don't understand it.

Share this post


Link to post

On my system, I fixed several performance issues, but more importantly several game compatibility issues by switching the machine type from the UnRaid default of i440FX, and used the Q35 machine type instead...  Make a backup of the VM, and just try creating a new VM template that points to the old image file...  Windows will usually take a very long time to boot the first boot, but usually makes it through...  I had Windows license issues once doing something like that though, so don't skip the backups...

 

Also for ucliker, for most systems, the x16 and x8 issue is really a non-issue...  See this link... https://www.gamersnexus.net/guides/2488-pci-e-3-x8-vs-x16-performance-impact-on-gpus

 

Edit: The x16 vs x8 issue boils down to affecting loading speed only sometimes, and some minor micro-stuttering when the GPU has to talk to the rest of the system...  Frame rate and most of the user experience comes from the speed of the card crunching on what it already has loaded into VRam, and so bus speed is mostly irrelevant for that stuff...  You can mostly game on a 4x or even a 1x slot just fine with some games, it just takes longer to load...  The new PCIe v4 slots coming soon are mostly needed for ultra speed network cards and NVME Raid...

Edited by Warrentheo

Share this post


Link to post

I swear I tried that. But maybe I need to start from scratch with the Q35.

Share this post


Link to post

Helped me quite a bit, and as I said it fixed several compatibility issues for me...  Specifically PUBG really didn't like the i440fx and Windows UWP games didn't like it either (Sea of Theives)

 

Related to that, I was literally posting this post when you were posting your original post...

 

 

Share this post


Link to post
3 hours ago, Warrentheo said:

On my system, I fixed several performance issues, but more importantly several game compatibility issues by switching the machine type from the UnRaid default of i440FX, and used the Q35 machine type instead...  Make a backup of the VM, and just try creating a new VM template that points to the old image file...  Windows will usually take a very long time to boot the first boot, but usually makes it through...  I had Windows license issues once doing something like that though, so don't skip the backups...

 

Also for ucliker, for most systems, the x16 and x8 issue is really a non-issue...  See this link... https://www.gamersnexus.net/guides/2488-pci-e-3-x8-vs-x16-performance-impact-on-gpus

 

Edit: The x16 vs x8 issue boils down to affecting loading speed only sometimes, and some minor micro-stuttering when the GPU has to talk to the rest of the system...  Frame rate and most of the user experience comes from the speed of the card crunching on what it already has loaded into VRam, and so bus speed is mostly irrelevant for that stuff...  You can mostly game on a 4x or even a 1x slot just fine with some games, it just takes longer to load...  The new PCIe v4 slots coming soon are mostly needed for ultra speed network cards and NVME Raid...

I had terrible performance with x8 but only when I was running my Titan X and not my old Quadro. I do agree Q35 machine type is better even for MacOS vm’s. 

Share this post


Link to post
13 hours ago, ucliker said:

I had terrible performance with x8 but only when I was running my Titan X and not my old Quadro. I do agree Q35 machine type is better even for MacOS vm’s. 

A Titan might be one of the few cards that it actually matters on since it might actually be hungry enough to saturate the bus...  Most of us can only dream of having that problem from afar... 😄

Share this post


Link to post

I think you’re correct. I was testing blender at the time in that VM and then Ghost Recon wildlands and it had huge performance loss. I tested with a  GTX 1060 and Quadro k4200 and those saw no performance hit at x8. 

Share this post


Link to post

I tried formatting the VM and starting from scratch with the machine type as Q35 and it has the same pathetic performance.

 

I tried a benchmark on Shadow of War and got an average of 12 fps. According to what I've seen from other results is that I should be getting ~80 fps in that.

Share this post


Link to post

I have similair problems with my 1080Ti in my HP DL370. I'm thinking it's because there's no way to force VMs to stay in their own NUMA pools yet. When I get to a computer I'll update this with the link to what should fix it.

Sent from my SM-G955U using Tapatalk

Share this post


Link to post

If after that, the issue remains, and you still suspect numa  issues, try pulling a processor out.

 

amd if you’re bored waiting, download gpuz and see what speed and slot size it shows just for fun.

Edited by 1812

Share this post


Link to post

I have a very similar setup to you and have diagnosed NUMA headaches for longer than I care to remember! 

A few things to try.... (which made my performance better).

 

  • Switch to a Q35 VM. It might not yield any performance increase right now, but there are some changes in the pipeline for QEMU 3.2\4.0 which will increase performance of passed through PCIe devices. (which should be included in the next version of unraid).
  • After youve flipped to Q35, add an emulatorpin value to take the pressure off of core 0 (which it will be using by default). keeping it on the same numa node as your passed through CPUs would most likely be best. so it'll look like this:
     
      <vcpu placement='static'>12</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='10'/>
        <vcpupin vcpu='1' cpuset='26'/>
        <vcpupin vcpu='2' cpuset='11'/>
        <vcpupin vcpu='3' cpuset='27'/>
        <vcpupin vcpu='4' cpuset='12'/>
        <vcpupin vcpu='5' cpuset='28'/>
        <vcpupin vcpu='6' cpuset='13'/>
        <vcpupin vcpu='7' cpuset='29'/>
        <vcpupin vcpu='8' cpuset='14'/>
        <vcpupin vcpu='9' cpuset='30'/>
        <vcpupin vcpu='10' cpuset='15'/>
        <vcpupin vcpu='11' cpuset='31'/>
        <emulatorpin cpuset='9,25'/>
      </cputune>

    Personally, I have my main workstation VM running off of cores on NUMA node 0, so I have my emulatorpin there. With the QEMU service running on Node0 too, it might be worth testing your emulatorpin on that node too, so 7,23 maybe. personally, I also stub those cpu cores the same as the rest to ensure nothing else is stealing cycles from my VM.

  • Add some additional hyper-v enlightenments (i cant remember if all of these are standard with unraid, but here they are anyway)

    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='none'/>
      <frequencies state='on'/>
    </hyperv>
  • MSI fix will most likely need to be applied to your GPU and GPU Audio device. https://forums.guru3d.com/threads/windows-line-based-vs-message-signaled-based-interrupts.378044/ (Use the v2 utility)
  • Last but by no means least is that your storage is based on NUMA node 0, and everything else is on node 1. Latency will be an issue here. not sure how viable this will be, but if you can, flip your 1070 into a PCIe slot associated with NUMA node 0, change your cpus to that node too (and your emulatorpin), and see how things are there. 
    Another alternative is if you have a spare hdd controller, with only the SSD you're using, pass that through if you're able to, as it'll cut out the QEMU middleman between Windows and the SSD. 

I think you'll notice the biggest difference with the emulatorpin change. 

Edited by billington.mark
  • Upvote 1

Share this post


Link to post
8 hours ago, 1812 said:

If after that, the issue remains, and you still suspect numa  issues, try pulling a processor out.

 

amd if you’re bored waiting, download gpuz and see what speed and slot size it shows just for fun.

FYI, GPU-z lies...

if you really want to see what your PCIe lane situation is for your passed through NVIDIA card, have a look in NVIDIA control panel> help> System information. Then scroll down to BUS.

 

This is because the PCIe root ports created on a Q35 machine are x1 ports by default. 

in QEMU 3.2 (I think), you can add some extra XML to force the root port to be x16. And in 4.0 all root ports will be x16 by default. 

 

Share this post


Link to post

NUMAd needs to be added, Here is what I was talking about. 

I'm sorry to make you wait so long but when I said it should help I meant I think this will help.

Here is the main thread where we were talking about it.

 

Edited by AnnabellaRenee87

Share this post


Link to post
7 hours ago, billington.mark said:

I have a very similar setup to you and have diagnosed NUMA headaches for longer than I care to remember! 

A few things to try.... (which made my performance better).

 

  • Switch to a Q35 VM. It might not yield any performance increase right now, but there are some changes in the pipeline for QEMU 3.2\4.0 which will increase performance of passed through PCIe devices. (which should be included in the next version of unraid).
  • After youve flipped to Q35, add an emulatorpin value to take the pressure off of core 0 (which it will be using by default). keeping it on the same numa node as your passed through CPUs would most likely be best. so it'll look like this:
     
    
      <vcpu placement='static'>12</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='10'/>
        <vcpupin vcpu='1' cpuset='26'/>
        <vcpupin vcpu='2' cpuset='11'/>
        <vcpupin vcpu='3' cpuset='27'/>
        <vcpupin vcpu='4' cpuset='12'/>
        <vcpupin vcpu='5' cpuset='28'/>
        <vcpupin vcpu='6' cpuset='13'/>
        <vcpupin vcpu='7' cpuset='29'/>
        <vcpupin vcpu='8' cpuset='14'/>
        <vcpupin vcpu='9' cpuset='30'/>
        <vcpupin vcpu='10' cpuset='15'/>
        <vcpupin vcpu='11' cpuset='31'/>
        <emulatorpin cpuset='9,25'/>
      </cputune>

    Personally, I have my main workstation VM running off of cores on NUMA node 0, so I have my emulatorpin there. With the QEMU service running on Node0 too, it might be worth testing your emulatorpin on that node too, so 7,23 maybe. personally, I also stub those cpu cores the same as the rest to ensure nothing else is stealing cycles from my VM.

  • Add some additional hyper-v enlightenments (i cant remember if all of these are standard with unraid, but here they are anyway)


    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='none'/>
      <frequencies state='on'/>
    </hyperv>
  • MSI fix will most likely need to be applied to your GPU and GPU Audio device. https://forums.guru3d.com/threads/windows-line-based-vs-message-signaled-based-interrupts.378044/ (Use the v2 utility)
  • Last but by no means least is that your storage is based on NUMA node 0, and everything else is on node 1. Latency will be an issue here. not sure how viable this will be, but if you can, flip your 1070 into a PCIe slot associated with NUMA node 0, change your cpus to that node too (and your emulatorpin), and see how things are there. 
    Another alternative is if you have a spare hdd controller, with only the SSD you're using, pass that through if you're able to, as it'll cut out the QEMU middleman between Windows and the SSD. 

I think you'll notice the biggest difference with the emulatorpin change. 

Unfortunately I can't move my 1070 to the other NUMA with the way the server case is. It just won't fit.

 

But I have 12 threads dedicated on CPU1 where the GPU is, and I put in the Hyper-V settings you had above, with the emulatorpin tag set on a core+thread on CPU0 and my system is going awesome now. Thank you so much!

 

At 1080p Ultra settings in Shadow of War, I'm getting 60fps in the benchmark now. I can actually play games again. I can't thank you enough!!

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now