Strange GPU problem, that i can`t get my head around it


Recommended Posts

Hi, i have very strange problem with windows vm and GPU passthrough. 

First this is my hardware:

MB:Jingsha/KLISSRE x99-D8 C612 chipset

CPU: E5-2678 V3 12C 24T 2.5-3.3GHz

RAM: 32GB DDR4 2133 QUAD CHANEL

GPU1: RX480 8GB

GPU2; R9 290 4GB

NVME: 240GB

PSU: EVGA SUPERNOVA 1000G2

UNRAID: Version: 6.8.3

 

 

The windows VM is:

8 cores pined and isolated (whit corresponding treads)(4/8 for unraid)

16GB ram

GPU RX480 passed to VM (edited xml so to make it multifunctional and on the same bus)

NVME passed to VM as pcie device isolated with VFIO-config plugin

USB 3.0 card passed to VM as pcie device isolated with VFIO-config plugin

GPU and NVME set to high from msi utility V3

CPU Scaling Governor: Performance

Enable Intel Turbo/AMD Performance Boost: YES

no other VMs or Dockers

 

The vm feels perfect unigine haeven result was:

2799 baremetal (same windows)(same NVME)

2646 in VM (same windows)(same NVME)

 

 

GTA-V feels the same, FPS the same no noticeable drops in fps or frametimes.

 

The problem is with Battlefield 1.

 

There is unimaginable  stutter and drops in frametimes.

In Riva tuner i see in the graphic for the GPU that every time that is drop in frame time, there is drop in gpu load from 100 to like 6 percent. This is every second, the graphic for gpu load looks like comb. 

The cpu load stays 60-70 percent.

Bf1 is set to use the maximum 11 treads that the game engine supports.

I tried:

First moved from vfio hard on the cache drive (sata ssd) to  nvme passthrough.  - no difference

Changing pcie slot for the GPU on the motherboard. - no difference

VFIO allow unsafe interrupts: yes/no - no difference

XEON TURBOBOOST UNLOCK HACK - no difference with i without it (except cinebench)

Tried smaller/bigger number cores and ram.

tried: 6.9.0-rc2 - no difference

tried: different graphic drivers - there is difference in other games but not in BF1

 

Because the nvme is passed, i can tried in VM and baremetal. In baremetal no stutter in bf1, same cpu load and same number of cores are loaded because of game engine limitation. I just can`t understand what makes this game behave so poor in VM. And everything else feel the same.

 

This is my VM xml.

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='1'>
  <name>ASROCK-VM-NVME</name>
  <uuid>bac5863c-dead-e638-e244-c94a7c331d46</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>17301504</memory>
  <currentMemory unit='KiB'>17301504</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>16</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='16'/>
    <vcpupin vcpu='2' cpuset='5'/>
    <vcpupin vcpu='3' cpuset='17'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='18'/>
    <vcpupin vcpu='6' cpuset='7'/>
    <vcpupin vcpu='7' cpuset='19'/>
    <vcpupin vcpu='8' cpuset='8'/>
    <vcpupin vcpu='9' cpuset='20'/>
    <vcpupin vcpu='10' cpuset='9'/>
    <vcpupin vcpu='11' cpuset='21'/>
    <vcpupin vcpu='12' cpuset='10'/>
    <vcpupin vcpu='13' cpuset='22'/>
    <vcpupin vcpu='14' cpuset='11'/>
    <vcpupin vcpu='15' cpuset='23'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/bac5863c-dead-e638-e244-c94a7c331d46_VARS-pure-efi.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='8' threads='2'/>
    <cache mode='passthrough'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.173-2.iso' index='1'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:0a:aa:54'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-1-ASROCK-VM-NVME/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev3'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source startupPolicy='optional'>
        <vendor id='0x2318'/>
        <product id='0x2808'/>
        <address bus='3' device='3'/>
      </source>
      <alias name='hostdev4'/>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

 

I attach diagnostic too.

 

 

x99-diagnostics-20210201-0633.zip

Edited by ilarion
Link to comment

Nope it wasn't this, but 10x for the answer. I found resolution for the time.... if i switch to DX11 the performance is nearly the same as in baremetal on DX11. The solution is not perfect because on baremetal on dx12 i have much more good fps, but it ok for now because the frames are stable though lower.

I tried to put all cores to the VM and the performance on dx12 was much better which is very strange because cpu usage was like 20%. So there is some kind of cpu bottleneck in dx12 in this game on my system.

Link to comment

I didn't run Battlefiled 1 , but any DX12 game I tried runs perfectly fine.

 

Have you tried any other DX12 game ?

 

It still worth checking that you clock timer used is TSC because some gaming APIs are dependent on this timer during draw calls.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.