VM CPU speed vs bare metal


TType85

Recommended Posts

I am having a small issue with my windows 10 VM's CPU speeds vs running bare metal.

 

Specs of the machine are

HP Z420 

E5-2667 V2

64GB DDR3 ECC 1866 RDIMMS

4x8TB drives in storage array

2x480GB SSD cache

1x800GB SSD Windows 10 installed

1x GTX 1060 6GB

1x GT 720 

1x USB3 PCIe Card

 

Currently the VM is on a SSD and can boot bare metal or via KVM, The GTX 1060 and USB3 card are passed through and are working well. I have 4 cores/8 threads assigned to the VM and isolated at boot.  Emulator is pinned to the 1st core (0,8) I plan on trying it on 1,9 based on some of the reading i have done tonight. 

 

The problem is the CPU is over 30% slower than bare metal.  This seems like a lot.

Cinebench R15 single core bare metal is 150, in the VM is 109.  CPU-Z bench shows similar issues.

 

No errors in the logs for the vm or Libvirt. The VM runs fine and both Witcher 3 and World Of Warcraft are fully playable. 

 

 

Link to comment

Forgot, XML for my VM

 

<domain type='kvm' id='1'>
  <name>Windows 10</name>
  <uuid>1c9fc3c3-ca85-6c62-ebf9-6a07ff771a54</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='12'/>
    <vcpupin vcpu='2' cpuset='5'/>
    <vcpupin vcpu='3' cpuset='13'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='14'/>
    <vcpupin vcpu='6' cpuset='7'/>
    <vcpupin vcpu='7' cpuset='15'/>
    <emulatorpin cpuset='0,8'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.11'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/1c9fc3c3-ca85-6c62-ebf9-6a07ff771a54_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/disk/by-id/ata-VK0800GDJYA_BTWL4380036J800RGN'/>
      <backingStore/>
      <target dev='hdc' bus='sata'/>
      <boot order='1'/>
      <alias name='sata0-0-2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='sata0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:36:07:75'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-1-Windows 10/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+0:+100</label>
    <imagelabel>+0:+100</imagelabel>
  </seclabel>
</domain>
 

Link to comment
6 hours ago, 1812 said:

download the "tips and tweaks" plugin, set the cpu to performance and retest to see if that changes anything (and to rule out a thing or two.)

 

cpu may not be going to turbo.... also, what guide did you follow to setup the vm?

Forgot to mention, I did make sure it was set to performance and turbo is set on.  I am seeing the right turbo speed from a ssh shell.  I followed SpaceInvaderOne (Gridrunner?) guides and also checked how Linus was doing them in his x Gamers 1 CPU videos.

Link to comment

Frame rates are a bit lower but acceptable.  The problem with World of Warcraft it is pretty single core dependent busy areas hammer 1 core.  The CPU is turboing to 3.6-3.7 vs 4.0 so that may be it.

 

I will be replacing it soon with a Threadripper 1920x so I can roll my wifes machine in to it too (4 core for Unraid/Dockers, 4 core for me, 4 core for her).

Link to comment

I am using: watch -n 1 grep MHz /proc/cpuinfo and I see all cores between 3500 and 3600mhz usually. A couple will spike up to 3700 at times so turbo is working.

 

I made a fresh install of unraid, no dockers running and when I give the VM 4 cores the performance is the same.  The odd thing is CPU-Z is if I give it 6c/12t instead of 4c/8t, the single threaded score goes from 290 to 320 (bare metal is 360), cinebench r15 is the same on single core no matter the c/t count. 

 

I put in a GTX 1070. In WoW I am seeing about a 20fps drop in the busy areas vs running bare meta using 6 coresl.  Heaven benchmark is 2400 for bare metal, 2250 for VM which is fine. Hopefully the 1920x system I am building will eek out a bit more performance and I can roll my wifes machine in to it too (she plays WoW, facebook, plex and netflix so she will be more than fine I think)

Link to comment
17 hours ago, TType85 said:

I am using: watch -n 1 grep MHz /proc/cpuinfo and I see all cores between 3500 and 3600mhz usually. A couple will spike up to 3700 at times so turbo is working.

 

I made a fresh install of unraid, no dockers running and when I give the VM 4 cores the performance is the same.  The odd thing is CPU-Z is if I give it 6c/12t instead of 4c/8t, the single threaded score goes from 290 to 320 (bare metal is 360), cinebench r15 is the same on single core no matter the c/t count. 

 

I put in a GTX 1070. In WoW I am seeing about a 20fps drop in the busy areas vs running bare meta using 6 coresl.  Heaven benchmark is 2400 for bare metal, 2250 for VM which is fine. Hopefully the 1920x system I am building will eek out a bit more performance and I can roll my wifes machine in to it too (she plays WoW, facebook, plex and netflix so she will be more than fine I think)

That sounds about right. The E5-2667 v2 is 3.3GHz base with turbo profile 3/3/3/3/4/5/6/7 so you should expect 4GHz 1-core, 3.9 2-core, 3.8 3-core, 3.7 4-core and 3.6 5+ core.

In a VM envi, you almost always will be in the lowest turbo because there's almost always something occupying a core. This is even more applicable to Linux which in my experience tries to make use of its cores (and threads).

 

Now the 1920X, I read that it has 4-core turbo to 4GHz (or 4.2 with XFR) and all-core to 3.7GHz. (There's no single-core turbo because of different chip design to Intel.) So I would imagine it to have similar performance to your E5-2667 V2, perhaps slightly better.

 

The 1920X is essentially 2 6-core CPUs glued together, with each CPU made up of 2 3-core CCX glued together. Anytime you have to jump across CCX, there will be latency penalty, significant enough to wipe out any benefit of having another core. (Shameless plug: see my test results)

  • So for the 1920X, you are better off assigning 3 cores to a VM.
  • If you want quad-core VM, you are better off getting the 1950X for best gaming performance.

Note, however, that the current core numbering showed for 1950/1920X might not be correct. So you will want to test out various configurations for best performance.

 

Below is summary of my test result on 2990WX

 

Edited by testdasi
  • Like 1
Link to comment

Thanks for the reply.

 

I think going 3-core will be fine for the most part. I got the 1920x for $340 so I will deal with the 2x6 core vs 2x8 core CCX.  Maybe 6 cores for my VM, 3 for my wifes and 3 for Unraid and dockers. :) 

I moved some hardware around so all the dockers and my data is on another machine (e5-1620 v2 / 16GB ram) so I can experiment without upsetting the wife and family by taking down plex :)

Link to comment
  • 4 months later...

Let me add to this thread as my questions are very related. Let me know if fine for you @TType85 and @testdasi.

 

I am running a Windows VM, which I use for simple windows stuff as well as a gaming machine to gamestream from my Unraid server to light-devices in other rooms. I play both windows games (Forza Horizon) as well as dolphin emulator games (all streamed from my VM to devices across the house). My specs are as follows:

 

i7 7800X @ 3.5GHz (no OC)

32GB ram

A bunch of 10TB disks in an Unraid array

128GB Plextor M6E as cache drive (used for docker containers)

1TB Samsung NVME (hosting the VM, not barebone)

Nvidia 1060TI with 6GB ram

 

I have assigned 5C/5T, 16GB, and the Nvidia GPU (passthrough) to the VM. I have not done any CPU isolation and pinning.

 

Docker and Unraid performance seems flawless. Gaming performance is good, but not perfect. I am aiming for 1080p/60fp. Particularly emulator games (dolphin) face occasional stutters or low frame rates. I am thinking of ways how to improve?

 

* Upgrade CPU or GPU or add/assign additional ram?

 

* Do something around CPU pinning or isolation? Would need recommendation on settings

 

* Passthrough my NVME barebone and install the VM on it directly? Would need advice whether recommended to do on a live VM or need to start from scratch

 

* Any other ideas to improve performance

 

Thanks in advance for your help! Your setups and use cases seem very similar.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.