VM Very low performance compared to bare metal


Recommended Posts

Hey All, I have just started using unraid with my new home server recently, I plan on hosting 2 gaming VMs simultaneously on the machine and have gotten GPU passthrough working correctly and such.

Currently I am running benchmarks on one of my VMs and am getting much lower benchmarks for VM compared to a bare metal score I was getting on the same machine.

 

Specs are

MSI trx40 pro 10g

AMD 3960x (Passing through all but 2 physical cores)

128GB of Corsair 3200MHz RAM (I have tried just passing through 32GB and 124GB)

2TB Samsung EVO 970 Plus NVME

 

I have gone through guides online and I believe I am Isolating and Pinning cores correctly and have tried a variety of options

 

My 3dmark Time Spy benchmark on bare metal for CPU reads 12950

and on my VM I'm getting 8131

 

Am I configuring something wrong? Or is this kind of expected performance hit through VM?

attached is my XML

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='3'>
  <name>Box</name>
  <uuid>a534ebe7-3862-b961-720b-5706768c7147</uuid>
  <description></description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>33030144</memory>
  <currentMemory unit='KiB'>33030144</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>44</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='8'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='9'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='10'/>
    <vcpupin vcpu='6' cpuset='5'/>
    <vcpupin vcpu='7' cpuset='11'/>
    <vcpupin vcpu='8' cpuset='6'/>
    <vcpupin vcpu='9' cpuset='12'/>
    <vcpupin vcpu='10' cpuset='13'/>
    <vcpupin vcpu='11' cpuset='19'/>
    <vcpupin vcpu='12' cpuset='14'/>
    <vcpupin vcpu='13' cpuset='20'/>
    <vcpupin vcpu='14' cpuset='15'/>
    <vcpupin vcpu='15' cpuset='21'/>
    <vcpupin vcpu='16' cpuset='16'/>
    <vcpupin vcpu='17' cpuset='22'/>
    <vcpupin vcpu='18' cpuset='17'/>
    <vcpupin vcpu='19' cpuset='23'/>
    <vcpupin vcpu='20' cpuset='18'/>
    <vcpupin vcpu='21' cpuset='24'/>
    <vcpupin vcpu='22' cpuset='25'/>
    <vcpupin vcpu='23' cpuset='31'/>
    <vcpupin vcpu='24' cpuset='26'/>
    <vcpupin vcpu='25' cpuset='32'/>
    <vcpupin vcpu='26' cpuset='27'/>
    <vcpupin vcpu='27' cpuset='33'/>
    <vcpupin vcpu='28' cpuset='28'/>
    <vcpupin vcpu='29' cpuset='34'/>
    <vcpupin vcpu='30' cpuset='29'/>
    <vcpupin vcpu='31' cpuset='35'/>
    <vcpupin vcpu='32' cpuset='36'/>
    <vcpupin vcpu='33' cpuset='42'/>
    <vcpupin vcpu='34' cpuset='37'/>
    <vcpupin vcpu='35' cpuset='43'/>
    <vcpupin vcpu='36' cpuset='38'/>
    <vcpupin vcpu='37' cpuset='44'/>
    <vcpupin vcpu='38' cpuset='39'/>
    <vcpupin vcpu='39' cpuset='45'/>
    <vcpupin vcpu='40' cpuset='40'/>
    <vcpupin vcpu='41' cpuset='46'/>
    <vcpupin vcpu='42' cpuset='41'/>
    <vcpupin vcpu='43' cpuset='47'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/a534ebe7-3862-b961-720b-5706768c7147_VARS-pure-efi.fd</nvram>
    <smbios mode='host'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='44' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Samanthas Box/vdisk1.img' index='4'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Samanthas Box/vdisk2.img' index='3'/>
      <backingStore/>
      <target dev='hdd' bus='virtio'/>
      <alias name='virtio-disk3'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Windows10.iso' index='2'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.171.iso' index='1'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x8'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:0f:13:1f'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-3-Samanthas Box/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='8'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x21' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/mnt/user/vBios/myVBios.rom'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x21' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x04d9'/>
        <product id='0x0245'/>
        <address bus='1' device='4'/>
      </source>
      <alias name='hostdev2'/>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x0db0'/>
        <product id='0x543d'/>
        <address bus='7' device='2'/>
      </source>
      <alias name='hostdev3'/>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x13fe'/>
        <product id='0x5500'/>
        <address bus='8' device='2'/>
      </source>
      <alias name='hostdev4'/>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1462'/>
        <product id='0x7c60'/>
        <address bus='7' device='3'/>
      </source>
      <alias name='hostdev5'/>
      <address type='usb' bus='0' port='4'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1b2a'/>
        <address bus='1' device='6'/>
      </source>
      <alias name='hostdev6'/>
      <address type='usb' bus='0' port='5'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1b2e'/>
        <address bus='1' device='5'/>
      </source>
      <alias name='hostdev7'/>
      <address type='usb' bus='0' port='6'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x264a'/>
        <product id='0x1fa5'/>
        <address bus='9' device='5'/>
      </source>
      <alias name='hostdev8'/>
      <address type='usb' bus='0' port='7'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x264a'/>
        <product id='0x1fa6'/>
        <address bus='9' device='7'/>
      </source>
      <alias name='hostdev9'/>
      <address type='usb' bus='0' port='9'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x264a'/>
        <product id='0x232a'/>
        <address bus='9' device='3'/>
      </source>
      <alias name='hostdev10'/>
      <address type='usb' bus='0' port='10'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

Link to comment
11 minutes ago, SweetPeachez said:

oh my...I just noticed that....so am I to change this to the number of threads I am allocating to the VM? or the amount of threads total the CPU has?

It was actually a question. I dont know what other people are using for this cpu but maybe they are tweaking the settings I dont know. Maybe some more experienced people can double check or compare to other people on the forum how cpus are being passed. The cpu has 24 cores and 48 threads so Im wondering if you should be passing 22 cores and threads 2 to match that, but I dont really know. That would be like passing 22*2= 44 cores (per your xml). 

Edited by PeteAsking
Link to comment

 

6 minutes ago, PeteAsking said:

It was actually a question. I dont know what other people are using for this cpu but maybe they are tweaking the settings I dont know. Maybe some more experienced people can double check or compare to other people on the forum how cpus are being passed. The cpu has 24 cores and 48 threads so Im wondering if you should be passing 22 cores and threads 2 to match that, but I dont really know. That would be like passing 22*2= 44 cores (per your xml). 

ok yeah, that's what I assumed...testing this in a moment.

Link to comment
2 minutes ago, PeteAsking said:

I have been reading the kvm documentation and have another idea... its fairly dramatic and might make performance worse you want to try it anyway if you have time for testing?

Yeah, what's the idea? I have tomorrow off of work so I plan on hammering away at this issue as much as I can!

Link to comment
13 minutes ago, SweetPeachez said:

Yeah, what's the idea? I have tomorrow off of work so I plan on hammering away at this issue as much as I can!

Well you could disable pinning and let the CPU use its own optimisation algorithm to roam the threads as it sees fit. It seems like a newer cpu so it might be better than trying to manually do it (or not in which case just change it back). To do this I would set threads back to 1 and cores to 44 then in each of the 44 lines regarding the cpu pin (cpuset) let them roam any of the 48 cores as they need to... so I will edit the first 4 (to show an example), but you can do all 44 lines. 

 

placement='static'>44</vcpu> <cputune>

<vcpupin vcpu='0' cpuset='0-47'/>
<vcpupin vcpu='1' cpuset='0-47'/>

<vcpupin vcpu='2' cpuset='0-47'/>

<vcpupin vcpu='3' cpuset='0-47'/>


also each of the numbers can be sequential now for the vcpu, 0,1,2,3...43 (43 is 44 cpus as 0 counts as the first one) rather than out of order in your current config. I am thinking this might be faster as a maxed core in the vm is no longer constrained to a domain on the cpu where a possibility may arise where its HT partner is also maxed, which would hamper performance (possibly). Sorry in advance if this is wrong, but the red hat documentation suggests this scenario is possible and this could work around such an eventuality. Dont hate me if its wrong I havent tested it. 
 

p

Edited by PeteAsking
Link to comment
1 minute ago, PeteAsking said:

Well you could disable pinning and let the CPU use its own optimisation algorithm to roam the threads as it sees fit. It seems like a newer cpu so it might be better than trying to manually do it (or not in which case just change it back). To do this I would set threads back to 1 and cores to 44 then in each of the 44 lines regarding the cpu pin (cpuset) let them roam any of the 48 cores as they need to... so I will edit the first 4, but you can do all 44 lines. 

 

placement='static'>44</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0-47'/>
<vcpupin vcpu='1' cpuset='0-47'/>

<vcpupin vcpu='2' cpuset='0-47'/>

<vcpupin vcpu='3' cpuset='0-47'/>


also each of the numbers can be sequential now for the vcpu, 0,1,2,3...43 (43 is 44 cpus as 0 counts as the first one) rather than out of order in your current config. I am thinking this might be faster as a maxed core in the vm is no longer constrained to a domain on the cpu where a possibility may arise where its HT partner is also maxed, which would hamper performance (possibly). Sorry in advance if this is wrong, but the red hat documentation suggests this scenario is possible and this could work around such an eventuality. Font hate me if its wrong I havent tested it. 
 

p

This seems interesting, an I certainly won't mind trying it. It'll be the first thing I try in the morning, thanks! And of course I'll be back to report findings :)

  • Like 1
Link to comment

I just did a test with a 2 cpu vm and ran a process to max both cores continually while I could monitor the cpu load in htop on the unraid server. For me I could see the 2 cpu cores at 100% roam around the host CPU cores ‘randomly’ so it does allow threads to move around. No clue if it will be faster that way or slower but would be keen to know also. 
 

edit: you should also try with threads = 2 just in case but Im thinking logically that doesnt make sense in this scenario now. 

Edited by PeteAsking
Link to comment
9 hours ago, PeteAsking said:

I just did a test with a 2 cpu vm and ran a process to max both cores continually while I could monitor the cpu load in htop on the unraid server. For me I could see the 2 cpu cores at 100% roam around the host CPU cores ‘randomly’ so it does allow threads to move around. No clue if it will be faster that way or slower but would be keen to know also. 
 

edit: you should also try with threads = 2 just in case but Im thinking logically that doesnt make sense in this scenario now. 

ok, just tried this both ways and no dice..it put windows in an unusable state (Extremely laggy / unresponsive)

Link to comment

Ok my bad - have you tried pinning again and pinning all the cpu’s except for cpu’s 0,1,2 and 3? I think currently you had first 2 free and last 2 free. 
 

Another thing I noticed is you are using a different machine type than most people use for windows. I normally see pc-i440fx-4.2 for example and seabios (maybe ovmf for the graphics passthrough). I also didnt see the <hyperv> section in your xml (related to machine type?) have you tried changing that and enabling the hyperv settings? Normally they are added in by default.
 

I checked the forums and randomly it seems some people say they get better performance sometimes with one or the other so maybe you just have to try both :(

 

P

Edited by PeteAsking
Link to comment

Ok couple things here right off the bat. You need to be on the latest unraid for this to work btw.

 

<cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='44' threads='1'/>
  </cpu>

Needs to be

<cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='22' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='x2apic'/>
</cpu>

Need to set  your clock to this

  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='yes'/>
  </clock>

Features to this below

  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='KVM Hv'/>
      <frequencies state='on'/>
    </hyperv>
  </features>

Finally if you have not. You need to reserve all your CPU cores that you will use for your VM. So make sure that is set in Settings/CPU Pinning. Please report back with a CPUz bench of your baremetal and a CPUz of your VM so I can further help you.

  • Upvote 1
Link to comment
41 minutes ago, PeteAsking said:

Ok my bad - have you tried pinning again and pinning all the cpu’s except for cpu’s 0,1,2 and 3? I think currently you had first 2 free and last 2 free. 
 

Another thing I noticed is you are using a different machine type than most people use for windows. I normally see pc-i440fx-4.2 for example and seabios (maybe ovmf for the graphics passthrough). I also didnt see the <hyperv> section in your xml (related to machine type?) have you tried changing that and enabling the hyperv settings? Normally they are added in by default.
 

I checked the forums and randomly it seems some people say they get better performance sometimes with one or the other so maybe you just have to try both :(

 

P

Just made a fresh VM with that machine type and seabios and hyperv on and i got a decent boost...now at 10089 on VM compared to 12950 on bare metal

Link to comment
5 minutes ago, Jerky_san said:

Try what I said above and you should get within 90-95% of baremetal.

Just tried it, with the different machine type / seabios machine and it gave me a decrease to about the level I was seeing before....working on getting you CPU-z benchmarks in a moment....also, I'm sure I should be on the latest stable version of unraid as I just downloaded it this last week.

 

Edit: version 6.8.1

Edited by SweetPeachez
Link to comment
8 minutes ago, SweetPeachez said:

Just tried it, with the different machine type / seabios machine and it gave me a decrease to about the level I was seeing before....working on getting you CPU-z benchmarks in a moment....also, I'm sure I should be on the latest stable version of unraid as I just downloaded it this last week.

 

Edit: version 6.8.1

Which part? The CPU thing is basically required for better performance/feel of the machine. The stock system doesn't detect cache right and so your VM will be running with all sorts of wonky cache. It also doesn't detect hyper threading right. Keep in mind that these settings can revert everytime you change something in the gui side instead of the XML side. The timer stuff is to lower CPU usage at idle and a slight increase in performance. CPU pinning is required to make sure Unraid doesn't use those cores and dockers don't either so you need to make that you did that. Lastly you have a 3970 so at least you don't have to deal with all the NUMA tuning crap like me with a 2990wx and others on the board. Though we've basically got that down to a science now as well.

Link to comment

One thing I have seen is if you don't load the CCX evenly, you will end up losing performance if your software doesn't scale too well. 3DMark Time Spy doesn't really scale that well beyond about 12 cores or so.

 

From my own testing (albeit not with 3DMark Time Spy but with a workload that similarly doesn't scale too well beyond 12-16 cores or so), 7 uneven is about the same as 6 even (i.e. the extra core performance is essentially "lost", so to speak).

It's impossible to spread 22 physical cores evenly + your VM benchmark performance is approximate 2/3 that of bare metal (your CPU has 8 CCX with 3 core each) = it sounds like the uneven load is causing you to "lose" a core performance for each CCX (which is 1/3 of each CCX), which is kinda similar to my testing.

 

You might want to test assigning the odd bank of the 48 logical cores (e.g. cpu 0 + cpu 1 = 1 physical core -> assign cpu 1 to VM and so on -> assign all the odd cpu to your VM = the odd bank) to your VM and see if it helps (i.e. your VM has 24 cores instead of 44).

Edited by testdasi
Link to comment
7 minutes ago, Jerky_san said:

Which part? The CPU thing is basically required for better performance/feel of the machine. The stock system doesn't detect cache right and so your VM will be running with all sorts of wonky cache. It also doesn't detect hyper threading right. Keep in mind that these settings can revert everytime you change something in the gui side instead of the XML side. The timer stuff is to lower CPU usage at idle and a slight increase in performance. CPU pinning is required to make sure Unraid doesn't use those cores and dockers don't either so you need to make that you did that. Lastly you have a 3970 so at least you don't have to deal with all the NUMA tuning crap like me with a 2990wx and others on the board. Though we've basically got that down to a science now as well.

I tried all of your suggestions above, also...I'm on the 3960x.

My CPUZ bench in the VM 

 Single thread is 508.8

 Multi Thread is 16174.8

 

My CPUZ bench on bare metal

 Single thread is 518.8

 Multi thread is 16823.0

 

So this leads me to believe that there may be something wrong with 3dMark benchmarks running in VMs?

Also, are you saying that the CPU pinning as displayed in the unraid pinning menu can be incorrect? attaching a screenshot of my pinning

 

my current XML

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' id='4'>
  <name>Windows 10</name>
  <uuid>bfdb9f54-3503-24a2-979a-261040b9f2af</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>126877696</memory>
  <currentMemory unit='KiB'>126877696</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>46</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='25'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='26'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='27'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='28'/>
    <vcpupin vcpu='8' cpuset='5'/>
    <vcpupin vcpu='9' cpuset='29'/>
    <vcpupin vcpu='10' cpuset='6'/>
    <vcpupin vcpu='11' cpuset='30'/>
    <vcpupin vcpu='12' cpuset='7'/>
    <vcpupin vcpu='13' cpuset='31'/>
    <vcpupin vcpu='14' cpuset='8'/>
    <vcpupin vcpu='15' cpuset='32'/>
    <vcpupin vcpu='16' cpuset='9'/>
    <vcpupin vcpu='17' cpuset='33'/>
    <vcpupin vcpu='18' cpuset='10'/>
    <vcpupin vcpu='19' cpuset='34'/>
    <vcpupin vcpu='20' cpuset='11'/>
    <vcpupin vcpu='21' cpuset='35'/>
    <vcpupin vcpu='22' cpuset='12'/>
    <vcpupin vcpu='23' cpuset='36'/>
    <vcpupin vcpu='24' cpuset='13'/>
    <vcpupin vcpu='25' cpuset='37'/>
    <vcpupin vcpu='26' cpuset='14'/>
    <vcpupin vcpu='27' cpuset='38'/>
    <vcpupin vcpu='28' cpuset='15'/>
    <vcpupin vcpu='29' cpuset='39'/>
    <vcpupin vcpu='30' cpuset='16'/>
    <vcpupin vcpu='31' cpuset='40'/>
    <vcpupin vcpu='32' cpuset='17'/>
    <vcpupin vcpu='33' cpuset='41'/>
    <vcpupin vcpu='34' cpuset='18'/>
    <vcpupin vcpu='35' cpuset='42'/>
    <vcpupin vcpu='36' cpuset='19'/>
    <vcpupin vcpu='37' cpuset='43'/>
    <vcpupin vcpu='38' cpuset='20'/>
    <vcpupin vcpu='39' cpuset='44'/>
    <vcpupin vcpu='40' cpuset='21'/>
    <vcpupin vcpu='41' cpuset='45'/>
    <vcpupin vcpu='42' cpuset='22'/>
    <vcpupin vcpu='43' cpuset='46'/>
    <vcpupin vcpu='44' cpuset='23'/>
    <vcpupin vcpu='45' cpuset='47'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='KVM Hv'/>
      <frequencies state='on'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='23' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='x2apic'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Windows 10/vdisk1.img' index='4'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Windows 10/vdisk2.img' index='3'/>
      <backingStore/>
      <target dev='hdd' bus='virtio'/>
      <alias name='virtio-disk3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Windows10.iso' index='2'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.171.iso' index='1'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='sata0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:94:be:2f'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-4-Windows 10/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x21' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/mnt/user/vBios/myVBios.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x21' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x04d9'/>
        <product id='0x0245'/>
        <address bus='1' device='4'/>
      </source>
      <alias name='hostdev2'/>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x0db0'/>
        <product id='0x543d'/>
        <address bus='7' device='2'/>
      </source>
      <alias name='hostdev3'/>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x13fe'/>
        <product id='0x5500'/>
        <address bus='2' device='4'/>
      </source>
      <alias name='hostdev4'/>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1462'/>
        <product id='0x7c60'/>
        <address bus='7' device='3'/>
      </source>
      <alias name='hostdev5'/>
      <address type='usb' bus='0' port='4'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1b2a'/>
        <address bus='1' device='6'/>
      </source>
      <alias name='hostdev6'/>
      <address type='usb' bus='0' port='5'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1b2e'/>
        <address bus='1' device='5'/>
      </source>
      <alias name='hostdev7'/>
      <address type='usb' bus='0' port='6'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x264a'/>
        <product id='0x1fa5'/>
        <address bus='9' device='5'/>
      </source>
      <alias name='hostdev8'/>
      <address type='usb' bus='0' port='7'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x264a'/>
        <product id='0x1fa6'/>
        <address bus='9' device='7'/>
      </source>
      <alias name='hostdev9'/>
      <address type='usb' bus='0' port='8'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>

 

pinning.PNG

Link to comment
10 minutes ago, testdasi said:

One thing I have seen is if you don't load the CCX evenly, you will end up losing performance if your software doesn't scale too well. 3DMark Time Spy doesn't really scale that well beyond about 12 cores or so.

 

From my own testing (albeit not with 3DMark Time Spy but with a workload that similarly doesn't scale too well beyond 12-16 cores or so), 7 uneven is about the same as 6 even (i.e. the extra core performance is essentially "lost", so to speak).

It's impossible to spread 22 physical cores evenly + your VM benchmark performance is approximate 2/3 that of bare metal (your CPU has 8 CCX with 3 core each) = it sounds like the uneven load is causing you to "lose" a core performance for each CCX (which is 1/3 of each CCX), which is kinda similar to my testing.

 

You might want to test assigning the odd bank of the 48 logical cores (e.g. cpu 0 + cpu 1 = 1 physical core -> assign cpu 1 to VM and so on -> assign all the odd cpu to your VM = the odd bank) to your VM and see if it helps (i.e. your VM has 24 cores instead of 44).

I'll give this a shot...my intentions after making sure I have the VM tuned correctly is to run 2 VMs that use half resources each (of course saving some resources for unraid as well)

Link to comment

With the performance comparison... 10089 on VM compared to 12950 on baremetal.

12950 is 48 cores so an equivalent is 12950/48*44= 11870. So you are comparing 10089 to 11870 - this is 85% of the speed of baremetal, so we are getting closer.

 

Can we get a new xml with the fastest speeds you were getting so we can review and tweak from there? The suggestions by testdasi and Jerky both sound good but should be tested at seperate times from each other.

 

P

 

EditL just saw you posted one while I wrote this, also the cores are now 46 so feel free to update the values as per what you are now using.

Edited by PeteUnraid
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.