celborn Posted July 9, 2020 Posted July 9, 2020 Hello all, I have been battling a GPU issue with a Windows 10 VM (poor performance, Passmark score of 7413 instead of the 12000+ i should be getting). My server specs are as follows. Unraid 6.9.0-beta 22 Motherboard X9DRE-LN4F Dual E5-2643 v2 16x4GB PC3-12800R The VM has a 1TB m.2 SSD to PCIE (in a slot for CPU 2) assigned to it for the OS drive, Latest Nvidia drivers 8GB Ram I have all the windows performance settings set to High/never sleep. I believe isolated all of the cores on CPU 2 and assigned most of them to the VM Ive made sure that video card is installed in the PCIe 16x slot assigned to CPU2. Can someone help me, I'm at a complete loss as to why the graphics card is so wildly under-performing Also, here is what the CPU Parings are showing in UNRAID Here is the VM's XML <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm' id='1'> <name>Gaming</name> <uuid>2d3c072c-ebaf-d861-862b-c0eb1270347d</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>12</vcpu> <cputune> <vcpupin vcpu='0' cpuset='6'/> <vcpupin vcpu='1' cpuset='18'/> <vcpupin vcpu='2' cpuset='7'/> <vcpupin vcpu='3' cpuset='19'/> <vcpupin vcpu='4' cpuset='8'/> <vcpupin vcpu='5' cpuset='20'/> <vcpupin vcpu='6' cpuset='9'/> <vcpupin vcpu='7' cpuset='21'/> <vcpupin vcpu='8' cpuset='10'/> <vcpupin vcpu='9' cpuset='22'/> <vcpupin vcpu='10' cpuset='11'/> <vcpupin vcpu='11' cpuset='23'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-5.0'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/2d3c072c-ebaf-d861-862b-c0eb1270347d_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' dies='1' cores='6' threads='2'/> <cache mode='passthrough'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source dev='/dev/disk/by-id/nvme-ADATA_SX8200PNP_2K1220143762' index='3'/> <backingStore/> <target dev='hdc' bus='sata'/> <boot order='1'/> <alias name='sata0-0-2'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Gaming/vdisk2.img' index='2'/> <backingStore/> <target dev='hdd' bus='sata'/> <alias name='sata0-0-3'/> <address type='drive' controller='0' bus='0' target='0' unit='3'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/disks/ST2000LM007-1R8174_ZDZ260WC/Gaming/vdisk3.img' index='1'/> <backingStore/> <target dev='hde' bus='virtio'/> <alias name='virtio-disk4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='sata' index='0'> <alias name='sata0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <controller type='usb' index='0' model='qemu-xhci' ports='15'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:80:a1:8e'/> <source bridge='br0'/> <target dev='vnet0'/> <model type='virtio-net'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/0'/> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/0'> <source path='/dev/pts/0'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-1-Gaming/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <alias name='input0'/> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'> <alias name='input1'/> </input> <input type='keyboard' bus='ps2'> <alias name='input2'/> </input> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x83' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <rom file='/mnt/user/isos/Videocard BIOS/Gigabyte.GTX1660Super.6144.190918-DumpedRom.rom'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x83' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x83' slot='0x00' function='0x2'/> </source> <alias name='hostdev2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x83' slot='0x00' function='0x3'/> </source> <alias name='hostdev3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x046d'/> <product id='0xc049'/> <address bus='2' device='4'/> </source> <alias name='hostdev4'/> <address type='usb' bus='0' port='2'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x046d'/> <product id='0xc226'/> <address bus='2' device='8'/> </source> <alias name='hostdev5'/> <address type='usb' bus='0' port='3'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x046d'/> <product id='0xc227'/> <address bus='2' device='9'/> </source> <alias name='hostdev6'/> <address type='usb' bus='0' port='4'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1b1c'/> <product id='0x0a4d'/> <address bus='2' device='3'/> </source> <alias name='hostdev7'/> <address type='usb' bus='0' port='5'/> </hostdev> <memballoon model='none'/> </devices> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+0:+100</label> <imagelabel>+0:+100</imagelabel> </seclabel> </domain> Quote
Jerky_san Posted July 9, 2020 Posted July 9, 2020 hmm I'd say if you could first make sure in CPUZ all the L caches appear correct(they should since it's an intel). Check GPUZ and make sure it's boosting properly when your running the passmark as well. Also I'd say run a CPUZ benchmark with and without the GPU passed through to see if the score changes at all. If so to what degree? Quote
celborn Posted July 9, 2020 Author Posted July 9, 2020 Also, if this matters much, here is the result from running numactl --hardware root@Tower:~# numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17 node 0 size: 32200 MB node 0 free: 4389 MB node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23 node 1 size: 32253 MB node 1 free: 31673 MB node distances: node 0 1 0: 10 21 1: 21 10 Quote
celborn Posted July 9, 2020 Author Posted July 9, 2020 (edited) Just now, Jerky_san said: hmm I'd say if you could first make sure in CPUZ all the L caches appear correct(they should since it's an intel). Check GPUZ and make sure it's boosting properly when your running the passmark as well. Also I'd say run a CPUZ benchmark with and without the GPU passed through to see if the score changes at all. If so to what degree? here are the screenshots of CPU-Z and GPU-Z- It looks like the GPU clock speed peaks at 1905 MHZ Edited July 9, 2020 by celborn Quote
Jerky_san Posted July 9, 2020 Posted July 9, 2020 (edited) @celbornI'll be honest on this.. I have only dealt with QEMU/KVM when it comes to AMD. But given that you have dual numa nodes very similar to the 2990wx what I wonder may be occurring is that ram is being allocated from the wrong numa node. The reason I say this is because at least when I was doing this with my 2990wx it ALWAYS allocated from node0 no matter what unless node0 was out of ram. Then it would start taking from the other node. To fix this you need to tell the VM how things look better and how to get its stuff "better". I am going to paste an example XML below. On the numatune you'll need to say nodeset='0,1' and adjust the memnode's accordingly as well. Under <numa> I believe you'll simply have cell id='1' with all cpus='0-11' and divide the memory in half since you only have 8gb allocated. I HIGHLY recommend you copy your whole xml config to notepad++ or something before manually tinkering so you can always just slap it back in. Remember this is just a guide and not intended to be directly copy/pasted. Should also say I have no idea if this will fix your performance problems but I assume crossing numa to get ram isn't working great for you. <cputune> <vcpupin vcpu='0' cpuset='4'/> <vcpupin vcpu='1' cpuset='36'/> <vcpupin vcpu='2' cpuset='5'/> <vcpupin vcpu='3' cpuset='37'/> <vcpupin vcpu='4' cpuset='6'/> <vcpupin vcpu='5' cpuset='38'/> <vcpupin vcpu='6' cpuset='7'/> <vcpupin vcpu='7' cpuset='39'/> <vcpupin vcpu='8' cpuset='8'/> <vcpupin vcpu='9' cpuset='40'/> <vcpupin vcpu='10' cpuset='9'/> <vcpupin vcpu='11' cpuset='41'/> <vcpupin vcpu='12' cpuset='10'/> <vcpupin vcpu='13' cpuset='42'/> <vcpupin vcpu='14' cpuset='11'/> <vcpupin vcpu='15' cpuset='43'/> <emulatorpin cpuset='4-11,36-43'/> </cputune> <numatune> <memory mode='strict' nodeset='0,2'/> <memnode cellid='0' mode='strict' nodeset='0'/> <memnode cellid='1' mode='strict' nodeset='2'/> </numatune> <os> <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/99642e81-2f13-a916-682c-90191636d75f_VARS-pure-efi.fd</nvram> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <hyperv> <vpindex state='on'/> <synic state='on'/> <stimer state='on'/> <reset state='on'/> <vendor_id state='on' value='KVM Hv'/> <frequencies state='on'/> </hyperv> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' dies='1' cores='8' threads='2'/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> <numa> <cell id='0' cpus='0-7' memory='16777216' unit='KiB'/> <cell id='1' cpus='8-15' memory='16777216' unit='KiB'/> </numa> </cpu> Edited July 9, 2020 by Jerky_san Quote
celborn Posted July 9, 2020 Author Posted July 9, 2020 (edited) @Jerky_san With what you said, my additions should look like the below --update- added emulatorpin <cputune> <vcpupin vcpu='0' cpuset='6'/> <vcpupin vcpu='1' cpuset='18'/> <vcpupin vcpu='2' cpuset='7'/> <vcpupin vcpu='3' cpuset='19'/> <vcpupin vcpu='4' cpuset='8'/> <vcpupin vcpu='5' cpuset='20'/> <vcpupin vcpu='6' cpuset='9'/> <vcpupin vcpu='7' cpuset='21'/> <vcpupin vcpu='8' cpuset='10'/> <vcpupin vcpu='9' cpuset='22'/> <vcpupin vcpu='10' cpuset='11'/> <vcpupin vcpu='11' cpuset='23'/> <emulatorpin cpuset='6-11,18-23'/> </cputune> <resource> <partition>/machine</partition> </resource> <numatune> <memory mode='strict' nodeset='0,1'/> <memnode cellid='0' mode='strict' nodeset='0'/> <memnode cellid='1' mode='strict' nodeset='1'/> </numatune> <os> and <cpu mode='host-passthrough' check='none'> <topology sockets='1' dies='1' cores='6' threads='2'/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> <numa> <cell id='0' cpus='6-11' memory='4194304' unit='KiB'/> <cell id='1' cpus='18-23' memory='4194304' unit='KiB'/> </numa> </cpu> Edited July 9, 2020 by celborn Quote
Jerky_san Posted July 9, 2020 Posted July 9, 2020 Just now, celborn said: @Jerky_san With what you said, my additions should look like the below <cputune> <vcpupin vcpu='0' cpuset='6'/> <vcpupin vcpu='1' cpuset='18'/> <vcpupin vcpu='2' cpuset='7'/> <vcpupin vcpu='3' cpuset='19'/> <vcpupin vcpu='4' cpuset='8'/> <vcpupin vcpu='5' cpuset='20'/> <vcpupin vcpu='6' cpuset='9'/> <vcpupin vcpu='7' cpuset='21'/> <vcpupin vcpu='8' cpuset='10'/> <vcpupin vcpu='9' cpuset='22'/> <vcpupin vcpu='10' cpuset='11'/> <vcpupin vcpu='11' cpuset='23'/> </cputune> <resource> <partition>/machine</partition> </resource> <numatune> <memory mode='strict' nodeset='0,1'/> <memnode cellid='0' mode='strict' nodeset='0'/> <memnode cellid='1' mode='strict' nodeset='1'/> </numatune> <os> and <cpu mode='host-passthrough' check='none'> <topology sockets='1' dies='1' cores='6' threads='2'/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> <numa> <cell id='0' cpus='6-11' memory='4194304' unit='KiB'/> <cell id='1' cpus='18-23' memory='4194304' unit='KiB'/> </numa> </cpu> Ok I may of been slightly stupid on this. Because your not spanning both nodes you should be able to simplify this. Change <numatune> to <numatune> <memory mode='strict' nodeset='1'/> </numatune> Drop the <numa> part. The reason I say this is because I was unable to come up with a way for you to say <cell id='0' cpus='' memory='0' unit='KiB'/> <- since CPUS= can't be blank. So a little switching around here. Also you may want to consider the <emulatorpin cpuset="7-9"/> to keep the emulator from jumping out of the numa as well. Quote
Jerky_san Posted July 9, 2020 Posted July 9, 2020 Btw if your wondering where you can get info on all this https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirt Quote
celborn Posted July 14, 2020 Author Posted July 14, 2020 On 7/9/2020 at 2:57 PM, Jerky_san said: Btw if your wondering where you can get info on all this https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirt Thank you that was a good read. I ended up having to reset my bios (messed up enabling uefi to dualboot off the VM m.2). Luckily switching back to legacy had no negate affects on UNRAID. and created a 'new' win10 VM with the old drives. and wouldnt you know it, the video card is acting just like it should and performing properly... *sigh* not sure exactly which step 'fixed' the issue but its up and running now. @Jerky_san thank you for your input. My brain hurts just a little from the added knowledge but i enjoyed every minute of it Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.