Maximizing Gaming Performance (CPU Bottleneck) - or give up a build a new gaming rig?


Recommended Posts

So, a bit of history.

 

I was planing to move house in the middle of last year and consolidated my Unraid server and my gaming rig into one box. I was shocked the performance was almost identical at the time, so stuck with it when I moved as the convenience was much better, and I was getting more out of my PC. this was with a 3900x and GTX1080.

 

Fast-forward to now, I have an RTX308FE in my box, and it looks like I'm starting to lose an unacceptable amount of performance due to virtualization overhead. This is most noticeable in CPU bound games where the GPU now sits at 60% utilization and I have no performance uplift over my GTX1080.

 

Full system spec:

 

x470 gaming plus max

3900x - 6 cores (one die isolated for VM use)

32GB 3200Mhz Ram

6 - array drives

1TB sabrent NVME - stubbed to VM

 

I have tried a lot of different configs.

 

Most of them have come from this thread:

 

 

CPU Scaling Governor: On Demand vs Performance = no noticeable difference

isolating first 6 cores vs last 6 cores = better performance on last 6 (marginal)

Assigned difference combination of cores - same CCX vs cross CCX. Including a 3C6T system

 

image.png.de72431ff7322399c9991b5cfab8a069.png

 

I have pined emulation core off / on the isolated thread and now leave it on 0/12.

I have benchmarked i440fx vs Q35 with little to no difference.

 

I am still seeing a significant lack in single core performance, but no core is overutilized in game.

 

My benchmark results are: (all ran 3 times)

 

Cinebench

R20 Multi - BM = 6912.3     VM = 3393.0 (49.09%)

R20 Single - BM = 502.3    VM = 461.0 (91.77%)

 

R15 Multi - BM = 254.4       VM = 1484.7 (48.6%)

R15 Single - BM = 201.7      VM = 184.0 (91.24%)

 

3DMark - Fire strike Extreme

 

Score - BM = 17375.3          VM = 16468.0 (94.78%)

Graphics -BM = 18300.7      VM = 18574.3 (101.50%)

Physics - BM = 19214.0        VM = 17701.7 (92.13%)

Combined - BM = 9005.7     VM = 8430.3 (93.61%)

 

3DMark - Tiem Spy

 

Score -BM = 14876.0         VM = 13238 (88.99%)

Graphics -BM = 16085.7     VM = 16495 (102.54%)

CPU -BM = 10432.0           VM = 6248 (59.89%)

 

Mankined Devided - Ultra Preset 1440P

 

Average BM = 74.3 VM = 72.8 (97.9%)

 

RD2 - All ultra 1440P

Average BM = 72.7 VM = 67.5 (92.8%)

 

Civ 6

Graphics ultra

Av FPS BM - 197.3 VM = 136.0 (68.96%)

Tern Time

BM = 6.64 VM = 7.14 (93%)

 

F1 2019

Av FPS BM = 170.7 VM = 127.0 (74.41%) - note low GPU usage

 

CSGO

Av FPS Benchmark Map BM = 402.1 VM = 306.7 (76.3%) - note low GPU usage

 

Control

Average framerate BM = 82.2 VM = 75.8 (92.21%) - MAX GPU usage

 

Warzone Call of Duty

not benchmarked, but can happily run the game at 140-160 FPS BM, but around 100 FPS in the VM.

 

my KVM config

 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='4'>
  <name>XXXXXXXXX</name>
  <uuid>XXXXXXXXX</uuid>
  <description>Dual Boot WIN 10 VM</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>12</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='6'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='19'/>
    <vcpupin vcpu='4' cpuset='8'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <vcpupin vcpu='6' cpuset='9'/>
    <vcpupin vcpu='7' cpuset='21'/>
    <vcpupin vcpu='8' cpuset='10'/>
    <vcpupin vcpu='9' cpuset='22'/>
    <vcpupin vcpu='10' cpuset='11'/>
    <vcpupin vcpu='11' cpuset='23'/>
    <emulatorpin cpuset='0,12'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-5.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/5f45766e-314f-6714-c52b-7b7a581ab713_VARS-pure-efi.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='6' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:1e:9d:c0'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-4-Sideswipe/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x27' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/mnt/user/domains/vbios/RTX3080FE.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x27' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x26' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

Any pointers in how to librate a bit more CPU performance?

 

 

image.png

  • Like 1
Link to comment

So after reading some more troubleshooting threads, I have also added the following options.

 

Added iothreads:

 

<vcpu placement='static'>12</vcpu>
  <iothreads>2</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='6'/>
    <vcpupin vcpu='1' cpuset='18'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='19'/>
    <vcpupin vcpu='4' cpuset='8'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <vcpupin vcpu='6' cpuset='9'/>
    <vcpupin vcpu='7' cpuset='21'/>
    <vcpupin vcpu='8' cpuset='10'/>
    <vcpupin vcpu='9' cpuset='22'/>
    <vcpupin vcpu='10' cpuset='11'/>
    <vcpupin vcpu='11' cpuset='23'/>
    <emulatorpin cpuset='0,12'/>
    <iothreadpin iothread='2' cpuset='1,13'/>
  </cputune>

 

changed the topology to 2 dies to represent the 2 CCX i have passed though to the OS.

 

  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='2' cores='3' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>

 

No impact to single core performance.

 

I have noticed that my CPU doesn't boost passed 4.2. Even in the single core R15 test. i need to check if it's the same on my BM install. However, i thought a 3900x was good for 4.6GHz single core. It would be useful if anybody else with a 3900x could run the same test and report back frequency. my cooling is more than adequate for this chip.

 

I get the same results with performance / on-demand governor and bosting is enabled.

 

root@Megatron:~# grep MHz /proc/cpuinfo
cpu MHz         : 3631.139
cpu MHz         : 2954.867
cpu MHz         : 2688.229
cpu MHz         : 2471.153
cpu MHz         : 2918.217
cpu MHz         : 2566.181
cpu MHz         : 2799.979
cpu MHz         : 1866.616
cpu MHz         : 2176.914
cpu MHz         : 2524.751
cpu MHz         : 2484.969
cpu MHz         : 4196.191
cpu MHz         : 3308.898
cpu MHz         : 2822.740
cpu MHz         : 2706.690
cpu MHz         : 2804.207
cpu MHz         : 2732.578
cpu MHz         : 2215.498
cpu MHz         : 2799.974
cpu MHz         : 1866.641
cpu MHz         : 2264.219
cpu MHz         : 2163.178
cpu MHz         : 2569.725
cpu MHz         : 4216.841

 

EDIT:

 

a quick BM boot into windows shows cores 4.4Ghz

 

image.png.534a6b9e18b8e3d58a66349721c0991b.png

 

 

 

Edited by gray squirrel
update
  • Like 1
Link to comment

So I have taken a look at cache and Memory within the VM and compared against results i have found online for a 3600x (effectively what i have passed though the VM).

 

Results in VM

 

image.png.e65e2c2682a9fae602ef0033fc319246.png

 

Example 3600x - note  memory is running at 3466 - im using 3200 RAM.

 

image.png.ee447ca6e06275d600783dabaefec855.png

 

future work,

 

disable SMT to see if that makes a difference?

Benchmark one die to create a close representation to my VM,

  • Like 1
Link to comment

Disabling SMT is a bad idea. Very poor multicore performance with no improvement to single core or gaming performance.

 

Unraid also doesn't like this, with very high utilization across the non isolated cores, even with docker service off.

 

I have noticed that assigning more emulator pin's does provide about a 15% uplift in CSGO.

 

However, GPU usage is still very low in COD and F1 2019 (in the 50-70% range) but no CPU core goes over 60% utilization

Edited by gray squirrel
  • Like 1
Link to comment
  • 8 months later...

I had the issue, that on a R9 3900x + RTX 3070 a few games were really slow (30fps) on a W10/W11 machine (passthrough).

 

This fixed it:

 

"fixed by disable HyperV / Sandbox / Subsystem for Linux in windows features.

If enabled -> gpu limited by ~ 50% and windows feels laggy. disabeld sandbox + hyperv (windows features!) + subsystem for linux - i let the unraid hyper-v on template enabled) and all runs flawless!

 

and dont passtrough NVME via VFIO, pass it via unassigned device + adjust XML"

 

--> configure VM as HyperV, rest was not the case in my VMs, also upgraded VM to i440fx-6.2 + set to PCIe ACS override and bound GPU+Audio.

 

Now i have really good performance for example in Cyberpunk 2077 and Gears 2 (before around 30fps in Full HD) and outstanding performance in benchmarks, I think it's similar to baremetal now.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.