SweetPeachez
-
Posts
23 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by SweetPeachez
-
-
Hey all, I am having an issue with a new build when under load in my gaming VM
Very easy to recreate, all I have to do is boot up the game, play for a bit and usually within 20 min I will get a crash that shuts down the VM and seems like it crashes unraid (am unable to access through webui or through ssh)
I have upgraded my bios, and played around with various settings in both bios and unraid with no luck so far
C-State has been disabled
power supply is set to typical current Idle
tried setting rcu_nocbs=0-47 in syslinux config
Specs are
MSI trx40 pro 10g
AMD 3960x
128GB of Corsair 3200MHz RAM
2TB Samsung EVO 970 Plus NVME
GTX 980Ti being passed through to Windows 10 VM
am attaching the diagnostic zip that I downloaded after reboot of the server after the crash (not sure if because I got it after reboot makes it useful or not)
Update: am unable to ping the system after crash
-
On 1/21/2020 at 9:30 AM, PeteUnraid said:
@SweetPeachez Interesting and good info from testdasi. Maybe as a last test if you have time you could set the
<vcpu placement='static'>24</vcpu> <cputune>
<vcpupin vcpu='0' cpuset='1-47'/>
<vcpupin vcpu='1' cpuset='1-47'/>
...
<vcpupin vcpu='24' cpuset='1-47'/>
and
topology sockets='1' cores='24' threads='1'/>
As I would be interested to know how that performed with all the other tweaks you have done, if you have the time. You would have to ensure you didnt run any other VMs while doing the benchmark though. The above does exclude core 0 (for unraid to use).
P
I'll be able to test this at some point this weekend...if y'all are wanting a guinea pig to test stuff on the 3960x please let me know, I'm more than willing to try whatever or test code and such.
-
20 minutes ago, PeteUnraid said:
My bad, apologies. I should have noticed. Try with all 48 cores pinned as well.
No biggie! Pinning all cores and turning isolation off all cores got me 10646/12950
1 hour ago, Jerky_san said:Does appear that because you have the newer arch it isn't doing the caching properly. On the 2990wx I have(and my old 1700) it was basically a requirement. What you could "technically" do is try the old way we fixed it before they fixed it on QEMU. Adjust cores below to match whatever your doing. It will be half of whatever you have assigned.. This will pass it as an "EPYC" processor. See if CPUZ sees your cache the same as your baremetal with this. If not you may have to wait till they resolve the issue.
<cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC</model> <topology sockets='1' cores='22' threads='2'/> <feature policy='require' name='topoext'/> <feature policy='disable' name='monitor'/> <feature policy='require' name='hypervisor'/> <feature policy='disable' name='svm'/> <feature policy='disable' name='x2apic'/> </cpu>
Cache matching with settings. If your cache doesn't match you'll get hitching and stuff due to cache hit misses. It is ESPECIALLY important for L1 and L2 but also important for L3 given how much cache the 3960 has.
2990wx Baremetal CPUZ
2990wx VM CPUZ
my Cache layout doesnt match bare metal and the settings you provided put windows into an unusable state.
So it seems like I will be getting best performance with just banking with even cores on one VM and odd cores on the other? At least until another update comes out?
-
It's "she/her" by the way ....anyways yeah I'll check out Jerkys latest suggestion in a moment...
Most of the time I wont need all 48 cpus, but I do a lot of very heavy data processing (a lot of it in GPU) but I do at times still find I need as many CPU cores I can get as well.
For typical daily use the machine will be serving 2-3 Gaming VMs though, so still in my best interest to get everything working correctly. Anyways, I'll get to trying stuff after I grab some lunch.
-
Just ran my most recent configuration that gave me the highest score with the 2 threads 12 cores settings which scored more about 300 points lower, so gonna change that back and look into trying the settings that Jerky suggested
-
15 minutes ago, PeteUnraid said:
Interesting. his suggestions were quite good so im surprised no improvement from that.
The current best config looks like your original, except you have changed machine type and are using 46 cores now... Did you try with 44 cores (unpin cores 0,1,2,3) and see how the performance was? I wold be interested to know if there is actually a noticeable difference between 46 and 44 cores.
It appears currently you are getting about 82% of the performance from the VM as opposed to bare metal - would you agree?
If you can confirm exactly where we are I will go away and try to think of what other improvements could be tried and come back to you
P
Per my post right before this one I ran that same XML but with just the odd bank cores and got a pretty large increase that I think brings me within range of bare metal (even though I am not using half the cores)
EDIT: going to fix my threads per core in the XML and rerun
-
1 hour ago, testdasi said:
One thing I have seen is if you don't load the CCX evenly, you will end up losing performance if your software doesn't scale too well. 3DMark Time Spy doesn't really scale that well beyond about 12 cores or so.
From my own testing (albeit not with 3DMark Time Spy but with a workload that similarly doesn't scale too well beyond 12-16 cores or so), 7 uneven is about the same as 6 even (i.e. the extra core performance is essentially "lost", so to speak).
It's impossible to spread 22 physical cores evenly + your VM benchmark performance is approximate 2/3 that of bare metal (your CPU has 8 CCX with 3 core each) = it sounds like the uneven load is causing you to "lose" a core performance for each CCX (which is 1/3 of each CCX), which is kinda similar to my testing.
You might want to test assigning the odd bank of the 48 logical cores (e.g. cpu 0 + cpu 1 = 1 physical core -> assign cpu 1 to VM and so on -> assign all the odd cpu to your VM = the odd bank) to your VM and see if it helps (i.e. your VM has 24 cores instead of 44).
WoW! I ran 3dMark time spy on the odd bank just now and got 11309 in the VM where as the bare metal ran at 12950
-
5 minutes ago, PeteUnraid said:
With that latest xml you just posted - is that giving you the 10089 score?
the XML I had just posted before this post was my XML with the changes that Jerky suggested...In this post I'll post the XML that got me the highest score so far....which was my VM minus Jekys suggestions
Here is the XML for the "Best" config, this config gives me the highest score in 3dMark time spy
<?xml version='1.0' encoding='UTF-8'?> <domain type='kvm' id='2'> <name>Windows 10_A</name> <uuid>af734937-bee3-267c-5a93-9fa189e66e7d</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>126877696</memory> <currentMemory unit='KiB'>126877696</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>46</vcpu> <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='25'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='26'/> <vcpupin vcpu='4' cpuset='3'/> <vcpupin vcpu='5' cpuset='27'/> <vcpupin vcpu='6' cpuset='4'/> <vcpupin vcpu='7' cpuset='28'/> <vcpupin vcpu='8' cpuset='5'/> <vcpupin vcpu='9' cpuset='29'/> <vcpupin vcpu='10' cpuset='6'/> <vcpupin vcpu='11' cpuset='30'/> <vcpupin vcpu='12' cpuset='7'/> <vcpupin vcpu='13' cpuset='31'/> <vcpupin vcpu='14' cpuset='8'/> <vcpupin vcpu='15' cpuset='32'/> <vcpupin vcpu='16' cpuset='9'/> <vcpupin vcpu='17' cpuset='33'/> <vcpupin vcpu='18' cpuset='10'/> <vcpupin vcpu='19' cpuset='34'/> <vcpupin vcpu='20' cpuset='11'/> <vcpupin vcpu='21' cpuset='35'/> <vcpupin vcpu='22' cpuset='12'/> <vcpupin vcpu='23' cpuset='36'/> <vcpupin vcpu='24' cpuset='13'/> <vcpupin vcpu='25' cpuset='37'/> <vcpupin vcpu='26' cpuset='14'/> <vcpupin vcpu='27' cpuset='38'/> <vcpupin vcpu='28' cpuset='15'/> <vcpupin vcpu='29' cpuset='39'/> <vcpupin vcpu='30' cpuset='16'/> <vcpupin vcpu='31' cpuset='40'/> <vcpupin vcpu='32' cpuset='17'/> <vcpupin vcpu='33' cpuset='41'/> <vcpupin vcpu='34' cpuset='18'/> <vcpupin vcpu='35' cpuset='42'/> <vcpupin vcpu='36' cpuset='19'/> <vcpupin vcpu='37' cpuset='43'/> <vcpupin vcpu='38' cpuset='20'/> <vcpupin vcpu='39' cpuset='44'/> <vcpupin vcpu='40' cpuset='21'/> <vcpupin vcpu='41' cpuset='45'/> <vcpupin vcpu='42' cpuset='22'/> <vcpupin vcpu='43' cpuset='46'/> <vcpupin vcpu='44' cpuset='23'/> <vcpupin vcpu='45' cpuset='47'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor_id state='on' value='none'/> </hyperv> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='46' threads='1'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Windows 10/vdisk1.img' index='3'/> <backingStore/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Windows 10/vdisk2.img' index='2'/> <backingStore/> <target dev='hdd' bus='virtio'/> <alias name='virtio-disk3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/virtio-win-0.1.171.iso' index='1'/> <backingStore/> <target dev='hdb' bus='ide'/> <readonly/> <alias name='ide0-0-1'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <controller type='usb' index='0' model='qemu-xhci' ports='15'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='ide' index='0'> <alias name='ide'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:26:17:8b'/> <source bridge='br0'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/0'/> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/0'> <source path='/dev/pts/0'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-2-Windows 10_A/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='mouse' bus='ps2'> <alias name='input0'/> </input> <input type='keyboard' bus='ps2'> <alias name='input1'/> </input> <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x21' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <rom file='/mnt/user/vBios/myVBios.rom'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x21' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x04d9'/> <product id='0x0245'/> <address bus='1' device='4'/> </source> <alias name='hostdev2'/> <address type='usb' bus='0' port='1'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x0db0'/> <product id='0x543d'/> <address bus='7' device='2'/> </source> <alias name='hostdev3'/> <address type='usb' bus='0' port='2'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x13fe'/> <product id='0x5500'/> <address bus='2' device='4'/> </source> <alias name='hostdev4'/> <address type='usb' bus='0' port='3'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1462'/> <product id='0x7c60'/> <address bus='7' device='3'/> </source> <alias name='hostdev5'/> <address type='usb' bus='0' port='4'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1b1c'/> <product id='0x1b2a'/> <address bus='1' device='6'/> </source> <alias name='hostdev6'/> <address type='usb' bus='0' port='5'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1b1c'/> <product id='0x1b2e'/> <address bus='1' device='5'/> </source> <alias name='hostdev7'/> <address type='usb' bus='0' port='6'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x264a'/> <product id='0x1fa5'/> <address bus='9' device='8'/> </source> <alias name='hostdev8'/> <address type='usb' bus='0' port='7'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x264a'/> <product id='0x1fa6'/> <address bus='9' device='10'/> </source> <alias name='hostdev9'/> <address type='usb' bus='0' port='8'/> </hostdev> <memballoon model='none'/> </devices> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+0:+100</label> <imagelabel>+0:+100</imagelabel> </seclabel> </domain>
-
-
10 minutes ago, testdasi said:
One thing I have seen is if you don't load the CCX evenly, you will end up losing performance if your software doesn't scale too well. 3DMark Time Spy doesn't really scale that well beyond about 12 cores or so.
From my own testing (albeit not with 3DMark Time Spy but with a workload that similarly doesn't scale too well beyond 12-16 cores or so), 7 uneven is about the same as 6 even (i.e. the extra core performance is essentially "lost", so to speak).
It's impossible to spread 22 physical cores evenly + your VM benchmark performance is approximate 2/3 that of bare metal (your CPU has 8 CCX with 3 core each) = it sounds like the uneven load is causing you to "lose" a core performance for each CCX (which is 1/3 of each CCX), which is kinda similar to my testing.
You might want to test assigning the odd bank of the 48 logical cores (e.g. cpu 0 + cpu 1 = 1 physical core -> assign cpu 1 to VM and so on -> assign all the odd cpu to your VM = the odd bank) to your VM and see if it helps (i.e. your VM has 24 cores instead of 44).
I'll give this a shot...my intentions after making sure I have the VM tuned correctly is to run 2 VMs that use half resources each (of course saving some resources for unraid as well)
-
7 minutes ago, Jerky_san said:
Which part? The CPU thing is basically required for better performance/feel of the machine. The stock system doesn't detect cache right and so your VM will be running with all sorts of wonky cache. It also doesn't detect hyper threading right. Keep in mind that these settings can revert everytime you change something in the gui side instead of the XML side. The timer stuff is to lower CPU usage at idle and a slight increase in performance. CPU pinning is required to make sure Unraid doesn't use those cores and dockers don't either so you need to make that you did that. Lastly you have a 3970 so at least you don't have to deal with all the NUMA tuning crap like me with a 2990wx and others on the board. Though we've basically got that down to a science now as well.
I tried all of your suggestions above, also...I'm on the 3960x.
My CPUZ bench in the VM
Single thread is 508.8
Multi Thread is 16174.8
My CPUZ bench on bare metal
Single thread is 518.8
Multi thread is 16823.0
So this leads me to believe that there may be something wrong with 3dMark benchmarks running in VMs?
Also, are you saying that the CPU pinning as displayed in the unraid pinning menu can be incorrect? attaching a screenshot of my pinning
my current XML
<?xml version='1.0' encoding='UTF-8'?> <domain type='kvm' id='4'> <name>Windows 10</name> <uuid>bfdb9f54-3503-24a2-979a-261040b9f2af</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>126877696</memory> <currentMemory unit='KiB'>126877696</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>46</vcpu> <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='25'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='26'/> <vcpupin vcpu='4' cpuset='3'/> <vcpupin vcpu='5' cpuset='27'/> <vcpupin vcpu='6' cpuset='4'/> <vcpupin vcpu='7' cpuset='28'/> <vcpupin vcpu='8' cpuset='5'/> <vcpupin vcpu='9' cpuset='29'/> <vcpupin vcpu='10' cpuset='6'/> <vcpupin vcpu='11' cpuset='30'/> <vcpupin vcpu='12' cpuset='7'/> <vcpupin vcpu='13' cpuset='31'/> <vcpupin vcpu='14' cpuset='8'/> <vcpupin vcpu='15' cpuset='32'/> <vcpupin vcpu='16' cpuset='9'/> <vcpupin vcpu='17' cpuset='33'/> <vcpupin vcpu='18' cpuset='10'/> <vcpupin vcpu='19' cpuset='34'/> <vcpupin vcpu='20' cpuset='11'/> <vcpupin vcpu='21' cpuset='35'/> <vcpupin vcpu='22' cpuset='12'/> <vcpupin vcpu='23' cpuset='36'/> <vcpupin vcpu='24' cpuset='13'/> <vcpupin vcpu='25' cpuset='37'/> <vcpupin vcpu='26' cpuset='14'/> <vcpupin vcpu='27' cpuset='38'/> <vcpupin vcpu='28' cpuset='15'/> <vcpupin vcpu='29' cpuset='39'/> <vcpupin vcpu='30' cpuset='16'/> <vcpupin vcpu='31' cpuset='40'/> <vcpupin vcpu='32' cpuset='17'/> <vcpupin vcpu='33' cpuset='41'/> <vcpupin vcpu='34' cpuset='18'/> <vcpupin vcpu='35' cpuset='42'/> <vcpupin vcpu='36' cpuset='19'/> <vcpupin vcpu='37' cpuset='43'/> <vcpupin vcpu='38' cpuset='20'/> <vcpupin vcpu='39' cpuset='44'/> <vcpupin vcpu='40' cpuset='21'/> <vcpupin vcpu='41' cpuset='45'/> <vcpupin vcpu='42' cpuset='22'/> <vcpupin vcpu='43' cpuset='46'/> <vcpupin vcpu='44' cpuset='23'/> <vcpupin vcpu='45' cpuset='47'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type> </os> <features> <acpi/> <apic/> <hyperv> <vpindex state='on'/> <synic state='on'/> <stimer state='on'/> <reset state='on'/> <vendor_id state='on' value='KVM Hv'/> <frequencies state='on'/> </hyperv> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='23' threads='2'/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> <feature policy='disable' name='monitor'/> <feature policy='require' name='hypervisor'/> <feature policy='disable' name='svm'/> <feature policy='disable' name='x2apic'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='yes'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Windows 10/vdisk1.img' index='4'/> <backingStore/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Windows 10/vdisk2.img' index='3'/> <backingStore/> <target dev='hdd' bus='virtio'/> <alias name='virtio-disk3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/Windows10.iso' index='2'/> <backingStore/> <target dev='hda' bus='sata'/> <readonly/> <boot order='2'/> <alias name='sata0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/virtio-win-0.1.171.iso' index='1'/> <backingStore/> <target dev='hdb' bus='sata'/> <readonly/> <alias name='sata0-0-1'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='sata' index='0'> <alias name='sata0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <controller type='usb' index='0' model='qemu-xhci' ports='15'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:94:be:2f'/> <source bridge='br0'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/0'/> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/0'> <source path='/dev/pts/0'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-4-Windows 10/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='mouse' bus='ps2'> <alias name='input0'/> </input> <input type='keyboard' bus='ps2'> <alias name='input1'/> </input> <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x21' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <rom file='/mnt/user/vBios/myVBios.rom'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x21' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x04d9'/> <product id='0x0245'/> <address bus='1' device='4'/> </source> <alias name='hostdev2'/> <address type='usb' bus='0' port='1'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x0db0'/> <product id='0x543d'/> <address bus='7' device='2'/> </source> <alias name='hostdev3'/> <address type='usb' bus='0' port='2'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x13fe'/> <product id='0x5500'/> <address bus='2' device='4'/> </source> <alias name='hostdev4'/> <address type='usb' bus='0' port='3'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1462'/> <product id='0x7c60'/> <address bus='7' device='3'/> </source> <alias name='hostdev5'/> <address type='usb' bus='0' port='4'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1b1c'/> <product id='0x1b2a'/> <address bus='1' device='6'/> </source> <alias name='hostdev6'/> <address type='usb' bus='0' port='5'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1b1c'/> <product id='0x1b2e'/> <address bus='1' device='5'/> </source> <alias name='hostdev7'/> <address type='usb' bus='0' port='6'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x264a'/> <product id='0x1fa5'/> <address bus='9' device='5'/> </source> <alias name='hostdev8'/> <address type='usb' bus='0' port='7'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x264a'/> <product id='0x1fa6'/> <address bus='9' device='7'/> </source> <alias name='hostdev9'/> <address type='usb' bus='0' port='8'/> </hostdev> <memballoon model='none'/> </devices> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+0:+100</label> <imagelabel>+0:+100</imagelabel> </seclabel> </domain>
-
5 minutes ago, Jerky_san said:
Try what I said above and you should get within 90-95% of baremetal.
Just tried it, with the different machine type / seabios machine and it gave me a decrease to about the level I was seeing before....working on getting you CPU-z benchmarks in a moment....also, I'm sure I should be on the latest stable version of unraid as I just downloaded it this last week.
Edit: version 6.8.1
-
41 minutes ago, PeteAsking said:
Ok my bad - have you tried pinning again and pinning all the cpu’s except for cpu’s 0,1,2 and 3? I think currently you had first 2 free and last 2 free.
Another thing I noticed is you are using a different machine type than most people use for windows. I normally see pc-i440fx-4.2 for example and seabios (maybe ovmf for the graphics passthrough). I also didnt see the <hyperv> section in your xml (related to machine type?) have you tried changing that and enabling the hyperv settings? Normally they are added in by default.
I checked the forums and randomly it seems some people say they get better performance sometimes with one or the other so maybe you just have to try both
P
Just made a fresh VM with that machine type and seabios and hyperv on and i got a decent boost...now at 10089 on VM compared to 12950 on bare metal
-
9 hours ago, PeteAsking said:
I just did a test with a 2 cpu vm and ran a process to max both cores continually while I could monitor the cpu load in htop on the unraid server. For me I could see the 2 cpu cores at 100% roam around the host CPU cores ‘randomly’ so it does allow threads to move around. No clue if it will be faster that way or slower but would be keen to know also.
edit: you should also try with threads = 2 just in case but Im thinking logically that doesnt make sense in this scenario now.
ok, just tried this both ways and no dice..it put windows in an unusable state (Extremely laggy / unresponsive)
-
1 minute ago, PeteAsking said:
Well you could disable pinning and let the CPU use its own optimisation algorithm to roam the threads as it sees fit. It seems like a newer cpu so it might be better than trying to manually do it (or not in which case just change it back). To do this I would set threads back to 1 and cores to 44 then in each of the 44 lines regarding the cpu pin (cpuset) let them roam any of the 48 cores as they need to... so I will edit the first 4, but you can do all 44 lines.
placement='static'>44</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0-47'/>
<vcpupin vcpu='1' cpuset='0-47'/><vcpupin vcpu='2' cpuset='0-47'/>
<vcpupin vcpu='3' cpuset='0-47'/>
also each of the numbers can be sequential now for the vcpu, 0,1,2,3...43 (43 is 44 cpus as 0 counts as the first one) rather than out of order in your current config. I am thinking this might be faster as a maxed core in the vm is no longer constrained to a domain on the cpu where a possibility may arise where its HT partner is also maxed, which would hamper performance (possibly). Sorry in advance if this is wrong, but the red hat documentation suggests this scenario is possible and this could work around such an eventuality. Font hate me if its wrong I havent tested it.
p
This seems interesting, an I certainly won't mind trying it. It'll be the first thing I try in the morning, thanks! And of course I'll be back to report findings
- 1
-
2 minutes ago, PeteAsking said:
I have been reading the kvm documentation and have another idea... its fairly dramatic and might make performance worse you want to try it anyway if you have time for testing?
Yeah, what's the idea? I have tomorrow off of work so I plan on hammering away at this issue as much as I can!
-
Fixing the threads per core count gave me very negligible improvement.
-
6 minutes ago, PeteAsking said:
It was actually a question. I dont know what other people are using for this cpu but maybe they are tweaking the settings I dont know. Maybe some more experienced people can double check or compare to other people on the forum how cpus are being passed. The cpu has 24 cores and 48 threads so Im wondering if you should be passing 22 cores and threads 2 to match that, but I dont really know. That would be like passing 22*2= 44 cores (per your xml).
ok yeah, that's what I assumed...testing this in a moment.
-
2 minutes ago, PeteAsking said:
Is threads = 1 correct for this cpu?
oh my...I just noticed that....so am I to change this to the number of threads I am allocating to the VM? or the amount of threads total the CPU has?
-
29 minutes ago, david279 said:
check what governor it running using the tips and tweaks plugin. Make sure its on demand and not something like power save.
I had it set to performance before, changed it to on demand and it benchmarks the same.
-
Hey All, I have just started using unraid with my new home server recently, I plan on hosting 2 gaming VMs simultaneously on the machine and have gotten GPU passthrough working correctly and such.
Currently I am running benchmarks on one of my VMs and am getting much lower benchmarks for VM compared to a bare metal score I was getting on the same machine.
Specs are
MSI trx40 pro 10g
AMD 3960x (Passing through all but 2 physical cores)
128GB of Corsair 3200MHz RAM (I have tried just passing through 32GB and 124GB)
2TB Samsung EVO 970 Plus NVME
I have gone through guides online and I believe I am Isolating and Pinning cores correctly and have tried a variety of options
My 3dmark Time Spy benchmark on bare metal for CPU reads 12950
and on my VM I'm getting 8131
Am I configuring something wrong? Or is this kind of expected performance hit through VM?
attached is my XML
<?xml version='1.0' encoding='UTF-8'?> <domain type='kvm' id='3'> <name>Box</name> <uuid>a534ebe7-3862-b961-720b-5706768c7147</uuid> <description></description> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>33030144</memory> <currentMemory unit='KiB'>33030144</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>44</vcpu> <cputune> <vcpupin vcpu='0' cpuset='2'/> <vcpupin vcpu='1' cpuset='8'/> <vcpupin vcpu='2' cpuset='3'/> <vcpupin vcpu='3' cpuset='9'/> <vcpupin vcpu='4' cpuset='4'/> <vcpupin vcpu='5' cpuset='10'/> <vcpupin vcpu='6' cpuset='5'/> <vcpupin vcpu='7' cpuset='11'/> <vcpupin vcpu='8' cpuset='6'/> <vcpupin vcpu='9' cpuset='12'/> <vcpupin vcpu='10' cpuset='13'/> <vcpupin vcpu='11' cpuset='19'/> <vcpupin vcpu='12' cpuset='14'/> <vcpupin vcpu='13' cpuset='20'/> <vcpupin vcpu='14' cpuset='15'/> <vcpupin vcpu='15' cpuset='21'/> <vcpupin vcpu='16' cpuset='16'/> <vcpupin vcpu='17' cpuset='22'/> <vcpupin vcpu='18' cpuset='17'/> <vcpupin vcpu='19' cpuset='23'/> <vcpupin vcpu='20' cpuset='18'/> <vcpupin vcpu='21' cpuset='24'/> <vcpupin vcpu='22' cpuset='25'/> <vcpupin vcpu='23' cpuset='31'/> <vcpupin vcpu='24' cpuset='26'/> <vcpupin vcpu='25' cpuset='32'/> <vcpupin vcpu='26' cpuset='27'/> <vcpupin vcpu='27' cpuset='33'/> <vcpupin vcpu='28' cpuset='28'/> <vcpupin vcpu='29' cpuset='34'/> <vcpupin vcpu='30' cpuset='29'/> <vcpupin vcpu='31' cpuset='35'/> <vcpupin vcpu='32' cpuset='36'/> <vcpupin vcpu='33' cpuset='42'/> <vcpupin vcpu='34' cpuset='37'/> <vcpupin vcpu='35' cpuset='43'/> <vcpupin vcpu='36' cpuset='38'/> <vcpupin vcpu='37' cpuset='44'/> <vcpupin vcpu='38' cpuset='39'/> <vcpupin vcpu='39' cpuset='45'/> <vcpupin vcpu='40' cpuset='40'/> <vcpupin vcpu='41' cpuset='46'/> <vcpupin vcpu='42' cpuset='41'/> <vcpupin vcpu='43' cpuset='47'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-q35-4.2'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/a534ebe7-3862-b961-720b-5706768c7147_VARS-pure-efi.fd</nvram> <smbios mode='host'/> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='44' threads='1'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Samanthas Box/vdisk1.img' index='4'/> <backingStore/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Samanthas Box/vdisk2.img' index='3'/> <backingStore/> <target dev='hdd' bus='virtio'/> <alias name='virtio-disk3'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/Windows10.iso' index='2'/> <backingStore/> <target dev='hda' bus='sata'/> <readonly/> <boot order='2'/> <alias name='sata0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/virtio-win-0.1.171.iso' index='1'/> <backingStore/> <target dev='hdb' bus='sata'/> <readonly/> <alias name='sata0-0-1'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <controller type='pci' index='0' model='pcie-root'> <alias name='pcie.0'/> </controller> <controller type='pci' index='1' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='1' port='0x10'/> <alias name='pci.1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/> </controller> <controller type='pci' index='2' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='2' port='0x11'/> <alias name='pci.2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/> </controller> <controller type='pci' index='3' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='3' port='0x12'/> <alias name='pci.3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/> </controller> <controller type='pci' index='4' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='4' port='0x13'/> <alias name='pci.4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/> </controller> <controller type='pci' index='5' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='5' port='0x14'/> <alias name='pci.5'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/> </controller> <controller type='pci' index='6' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='6' port='0x8'/> <alias name='pci.6'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </controller> <controller type='sata' index='0'> <alias name='ide'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> </controller> <controller type='usb' index='0' model='qemu-xhci' ports='15'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:0f:13:1f'/> <source bridge='br0'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/2'/> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-3-Samanthas Box/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <alias name='input0'/> <address type='usb' bus='0' port='8'/> </input> <input type='mouse' bus='ps2'> <alias name='input1'/> </input> <input type='keyboard' bus='ps2'> <alias name='input2'/> </input> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x21' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <rom file='/mnt/user/vBios/myVBios.rom'/> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x21' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x04d9'/> <product id='0x0245'/> <address bus='1' device='4'/> </source> <alias name='hostdev2'/> <address type='usb' bus='0' port='1'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x0db0'/> <product id='0x543d'/> <address bus='7' device='2'/> </source> <alias name='hostdev3'/> <address type='usb' bus='0' port='2'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x13fe'/> <product id='0x5500'/> <address bus='8' device='2'/> </source> <alias name='hostdev4'/> <address type='usb' bus='0' port='3'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1462'/> <product id='0x7c60'/> <address bus='7' device='3'/> </source> <alias name='hostdev5'/> <address type='usb' bus='0' port='4'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1b1c'/> <product id='0x1b2a'/> <address bus='1' device='6'/> </source> <alias name='hostdev6'/> <address type='usb' bus='0' port='5'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1b1c'/> <product id='0x1b2e'/> <address bus='1' device='5'/> </source> <alias name='hostdev7'/> <address type='usb' bus='0' port='6'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x264a'/> <product id='0x1fa5'/> <address bus='9' device='5'/> </source> <alias name='hostdev8'/> <address type='usb' bus='0' port='7'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x264a'/> <product id='0x1fa6'/> <address bus='9' device='7'/> </source> <alias name='hostdev9'/> <address type='usb' bus='0' port='9'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x264a'/> <product id='0x232a'/> <address bus='9' device='3'/> </source> <alias name='hostdev10'/> <address type='usb' bus='0' port='10'/> </hostdev> <memballoon model='none'/> </devices> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+0:+100</label> <imagelabel>+0:+100</imagelabel> </seclabel> </domain>
VM Very low performance compared to bare metal
in VM Engine (KVM)
Posted
Hello, unfortunately I ended up giving up due to not getting close to bare metal performance, this was mainly due cache not being mapped correctly for the 3960x under the Linux kernel that was being used at the time, funny enough is that I'm checking this week to see if this is still an issue and will post if there's any progress.