Windows 10 KVM high DPC latency


slize

Recommended Posts

Hey, i successfully created a Windows 10 gaming vm with gpu passthrough but im experiencing microstutters in games compared to bare metal gameplay. I think the issue could be related to bad DPC latency's. One screenshot is while im gaming and the other is while im copying some huge files from an other unraid server to my vm. Does somebody know how i can fix the high DPC latencies or whats causing them?

Bootparameter:

append initrd=/bzroot pcie_acs_override=downstream,multifunction isolcpus=1-5,7-11 vfio-pci.ids=10de:1e87,10de:10f8,10de:1ad8,10de:1ad9



Devices of the system:

IOMMU group 0:	[8086:3ec2] 00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
IOMMU group 1:	[8086:1901] 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 07)
IOMMU group 2:	[8086:a36d] 00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
[8086:a36f] 00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
IOMMU group 3:	[8086:a360] 00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
IOMMU group 4:	[8086:a352] 00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
IOMMU group 5:	[8086:a340] 00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0)
IOMMU group 6:	[8086:a338] 00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
IOMMU group 7:	[8086:a330] 00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
IOMMU group 8:	[8086:a305] 00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
[8086:a348] 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
[8086:a323] 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
[8086:a324] 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
[8086:15bc] 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)
IOMMU group 9:	[10de:1e87] 01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
IOMMU group 10:	[10de:10f8] 01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
IOMMU group 11:	[10de:1ad8] 01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
IOMMU group 12:	[10de:1ad9] 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
IOMMU group 13:	[1987:5016] 02:00.0 Non-Volatile memory controller: Phison Electronics Corporation E16 PCIe4 NVMe Controller (rev 01)
IOMMU group 14:	[144d:a808] 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983


CPU Thread Pairings

Pair 1:	cpu 0 / cpu 6
Pair 2:	cpu 1 / cpu 7
Pair 3:	cpu 2 / cpu 8
Pair 4:	cpu 3 / cpu 9
Pair 5:	cpu 4 / cpu 10
Pair 6:	cpu 5 / cpu 11


USB Devices

Bus 001 Device 001:	ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002:	ID 1e71:170e NZXT NZXT USB Device
Bus 002 Device 001:	ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 002 Device 002:	ID 0781:5583 SanDisk Corp. Ultra Fit


SCSI Devices

[0:0:0:0]	disk    SanDisk  Ultra Fit        1.00  /dev/sda   15.3GB
[N:0:1:1]	disk    Force MP600__1                             /dev/nvme0n1  2.00TB
[N:1:4:1]	disk    Samsung SSD 970 EVO 1TB__1                 /dev/nvme1n1  1.00TB


 

This is the xml of the vm:

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='3'>
  <name>Windows 10</name>
  <uuid>ee1750e8-629d-9ca9-860f-d590e2b60e77</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>57671680</memory>
  <currentMemory unit='KiB'>57671680</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>10</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='8'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='9'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='10'/>
    <vcpupin vcpu='8' cpuset='5'/>
    <vcpupin vcpu='9' cpuset='11'/>
    <emulatorpin cpuset='0,6'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/ee1750e8-629d-9ca9-860f-d590e2b60e56_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='5' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/mnt/disks/Force_MP600_19288230000128565326/Windows 10/vdisk1.img' index='3'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='sata0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:c2:44:ce'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-3-Windows 10/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/mnt/disk1/isos/Gigabyte.RTX2080.8192.190116_EDIT.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x2'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x2'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x3'/>
      </source>
      <alias name='hostdev3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x3'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

Anmerkung 2020-01-29 180002.jpg

Anmerkung 2020-01-29 180513.jpg

Anmerkung 2020-01-29 211925.png

Anmerkung 2020-01-29 2119251.png

Anmerkung 2020-01-29 2119252.png

Edited by slize
Link to comment

Try reducing your VM core count to 8 (e.g. remove core 1,7) and emulator pin to the 2 removed cores (e.g. 1,7).

Anything that is shared with 0 (and its hyperthreading core i.e. 6) will be doomed to lag under IO load because Unraid itself needs processing power during high IO.

 

Also try creating a new template and pick Q35 machine type. It tends to work better with PCIe devices.

Edited by testdasi
Link to comment
2 hours ago, testdasi said:

Try reducing your VM core count to 8 (e.g. remove core 1,7) and emulator pin to the 2 removed cores (e.g. 1,7).

Anything that is shared with 0 (and its hyperthreading core i.e. 6) will be doomed to lag under IO load because Unraid itself needs processing power during high IO.

 

Also try creating a new template and pick Q35 machine type. It tends to work better with PCIe devices.

Thank you for your reply!

I tried to switch to Q35 but it does not seem to make a difference.

Im currently experimenting with the settings in the xml below (also as far as i can tell for now including or excluding the cpu 1 and 7 doesnt make a difference aswell). I got the best results by adding a custom scheduler (which needs some tricks on unraid).

 

SSH to the server and type (would be nice if you @testdasi or someone else can tell me how i can get these commands to be persistant after a rebbot):

ulimit -r 99
sysctl -w kernel.sched_rt_runtime_us=-1
<vcpusched vcpus='0-7' scheduler='rr' priority='99'/>
<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='8'>
  <name>Windows 10</name>
  <uuid>ee1750e8-629d-9ca9-860f-d590e2b60e56</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>57671680</memory>
  <currentMemory unit='KiB'>57671680</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <iothreads>2</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='8'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='9'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='10'/>
    <vcpupin vcpu='6' cpuset='5'/>
    <vcpupin vcpu='7' cpuset='11'/>
    <emulatorpin cpuset='1,7'/>
    <iothreadpin iothread='1' cpuset='1'/>
    <iothreadpin iothread='2' cpuset='7'/>
    <vcpusched vcpus='0' scheduler='rr' priority='99'/>
    <vcpusched vcpus='1' scheduler='rr' priority='99'/>
    <vcpusched vcpus='2' scheduler='rr' priority='99'/>
    <vcpusched vcpus='3' scheduler='rr' priority='99'/>
    <vcpusched vcpus='4' scheduler='rr' priority='99'/>
    <vcpusched vcpus='5' scheduler='rr' priority='99'/>
    <vcpusched vcpus='6' scheduler='rr' priority='99'/>
    <vcpusched vcpus='7' scheduler='rr' priority='99'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/ee1750e8-6211-9c38-860f-d590e2b60e56_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/mnt/disks/Force_MP600_19288230000124356/Windows 10/vdisk1.img' index='1'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:b3:52:21'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-8-Windows 10/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/mnt/disk1/isos/Gigabyte.RTX2080.8192.190116_EDIT.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x2'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x2'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x3'/>
      </source>
      <alias name='hostdev3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x3'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

Anmerkung 2020-01-30 180147.jpg

Link to comment
5 hours ago, slize said:

SSH to the server and type (would be nice if you @testdasi or someone else can tell me how i can get these commands to be persistant after a rebbot):


ulimit -r 99
sysctl -w kernel.sched_rt_runtime_us=-1

<vcpusched vcpus='0-7' scheduler='rr' priority='99'/>

 

 

Install the Userscript plugin and create a script that runs at array start (it's a scheduler option for the plugin).

 

Interesting though. vcpusched isn't something frequently used. Would you mind explain what those things do?

Link to comment

Thanks for the hint with Userscript.
As for the vcpusched - i searched the whole internet for solutions and found one post with a guy who was mentioning it. I tried "rr" and "fifo" - for me "rr" works the best. But its still not perfect. Im getting little peaks and max execution times up to 3248us. But most of them in the 20us to 200us range which is quite good.

Link to comment
  • 1 year later...

I am having this exact same latency problems ever since I upgraded to 6.9.2, but exerything was flawless before that update. When I rollback the server to 6.8.3 the problem goes away.

 

I have tried all the things mentioned here, but unfortunately it made no difference for me. The "vcpusched" change actually caused worse latency for me. I have also tried installing a fresh Windows 10 install, but the latency issues remained.

 

 

Oddly enough, I happened to try playing around with a Windows 11 VM (following the @SpaceInvaderOne video) and the Windows 11 has had absolutely no latency problems, despite passing through to exact same hardware: In fact, the XML for the Windows 11 and Windows 10 VMs are nearly identical, and yet the Windows 10 show all these latency issue while to other does not.

 

Edit: Upon further testing, the same latency issue do persist in Windows 11 Insider Build.

Edited by NerdyGriffin
Corrections to my previous claim
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.