Jump to content
dlandon

Performance Improvements in VMs by adjusting CPU pinning and assignment

239 posts in this topic Last Reply

Recommended Posts

On 4/10/2019 at 5:11 PM, Rick Sanchez said:

Thanks, for your help!

 

How can I prevent the VM from sleeping?

Those settings would be handled by the VM itself. The host OS will not automatically hibernate the VM.

Share this post


Link to post
Posted (edited)

Sorry if this has already been asked, I couldn't find an answer (please point me in the right direction if there is one available); I have a 4-core CPU with HT, so 8 threads total. Most of my VMs have VERY light workload and would be totally fine with 1 hyperthreading thread without needing the full core. In fact, it's been working that way for a while until I decided to question whether it was optimal or not.

 

Let's say I have 10 VMs with the following CPU pinnings:

VM 1: vCPU 0, 4

VM 2: vCPU 0, 4

VM 3: vCPU 1, 5

VM 4: vCPU 1, 5

VM 5: vCPU 2

VM 6: vCPU 3

VM 7: vCPU 6

VM 8: vCPU 7

VM 9: vCPU 7

VM 10: vCPU 7

 

Is there anything wrong here, assuming the VMs sharing the same vCPU never really use more than ~10% load, or is assigning a HyperThreading core to a VM without assigning the full core simply wrong (and why)?

Edited by dnLL

Share this post


Link to post

@dnLL Keep in mind, if a VM fully utilises let's say a HT core, the performance of that physical core is also affected. If you don't have such high workloads you might be fine and won't see any issues but in general it's the best idea to use the physical and the ht core at the same time in a VM. Also core 0 is always used by unraid itself. If you fully utilise this core you kinda affecting all other VMs. All the IO load, networking and storage access for example is handled by unraid and can kinda easily reduce your overall system performance in this way. 10 VMs running on 8 threads sooner or later you will have some issues. 

Share this post


Link to post
Posted (edited)
1 hour ago, bastl said:

@dnLL Keep in mind, if a VM fully utilises let's say a HT core, the performance of that physical core is also affected. If you don't have such high workloads you might be fine and won't see any issues but in general it's the best idea to use the physical and the ht core at the same time in a VM. Also core 0 is always used by unraid itself. If you fully utilise this core you kinda affecting all other VMs. All the IO load, networking and storage access for example is handled by unraid and can kinda easily reduce your overall system performance in this way. 10 VMs running on 8 threads sooner or later you will have some issues. 

In fact I have 7 VMs and 1 docker. Some of the VMs barely use 100MB of RAM and any CPU at all, they could be in dockers in fact but that's another debate.

 

I made some changes now, having all my VMs using a full core with its hyper thread. Core 0 (with HT) is pinned to nothing, Core 1 (with HT) is pinned to my two most important VMs (still light workload), Core 2 (with HT) is pinned to a VM with a higher workload and Core 3 (with HT) is pinned to my last 3 VMs which aren't that important. This way unRAID has Core 0 for itself.

 

Now as for my Plex docker, I used something I don't see suggested at all in the FAQ and it's the --cpus=6 parameter, which gives access to 75% (6/8) of every core/thread rather than locking the docker to specific cores/threads. So assuming my Plex docker is doing very hard work, it will never use more than 75% per core, leaving at least 25% of a full core to the VMs and to unRAID itself. I think I'm covering most of my workload possibilities this way after doing some quick tests (obviously if Plex is taking 75% of everything and a VM needs its full core, they will challenge each other for CPU resources but it's fine).

 

One last question as I was reading from some old posts that the VM couldn't know if its 2 vCPUs were a HyperThread core or just 2 random cores/threads. People were adding parameters to their XMLs to make sure KVM let the VM know it's a core with HT. Is that still necessary?

Edited by dnLL

Share this post


Link to post
23 minutes ago, dnLL said:

One last question as I was reading from some old posts that the VM couldn't know if its 2 vCPUs were a HyperThread core or just 2 random cores/threads. People were adding parameters to their XMLs to make sure KVM let the VM know it's a core with HT. Is that still necessary?

Not actual sure what you mean by that. You can define a core topology in the xml to "emulate" different kind of CPU models and core topolgies. For example you can emulate a 2 socket CPU with let's say 2 cores each or specific CPU features. Good start if you wanna dive deeper into the topic is the RedHat documentation.

 

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-manipulating_the_domain_xml-cpu_model_and_topology

Share this post


Link to post
Posted (edited)
3 minutes ago, bastl said:

Not actual sure what you mean by that. You can define a core topology in the xml to "emulate" different kind of CPU models and core topolgies. For example you can emulate a 2 socket CPU with let's say 2 cores each or specific CPU features. Good start if you wanna dive deeper into the topic is the RedHat documentation.

 

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-manipulating_the_domain_xml-cpu_model_and_topology

Sorry, that probably wasn't very clear. Basically OS handle threads differently if they belong to the same core, but for VMs it's usually just vCPUs without any way for the VM to know if the 2 threads are from the same core or not. I saw some "hacks" (ie. not limiting yourself to the unRAID webUI) to basically make sure the VM would properly use the 2 threads as a full core with HT rather than 2 distinctive threads from 2 separate cores.

Edited by dnLL

Share this post


Link to post

@dnLL the "topology" section basically does this in the xml.

 

1 core with HT:

<topology sockets='1' cores='1' threads='2'/>

2 cores without HT:

<topology sockets='1' cores='2' threads='1'/>

 

Share this post


Link to post

Funnily enough after pinning all my VMs to a full core, some switched their topology correctly to 1c2t, others still had 2c1t. Not sure how the webUI handles all of this when not using the XML form format.

Share this post


Link to post

Not 100% sure how Unraid handles it. Maybe it depends on the template you have choosen during the VM setup or the cores you selected. In general I manual adjust it for the cores I have selected. 

Share this post


Link to post
On 5/12/2016 at 6:13 AM, dlandon said:

There have been several posts on the forum about VM performance improvements by adjusting CPU pinning and assignments in cases of VMs stuttering on media playback and gaming.  I've put together what I think is the best of those ideas.  I don't necessarily think this is the total answer, but it has helped me with a particularly latency sensitive VM.

 

Windows VM Configuration

 

You need to have a well configured Windows VM in order to get any improvement with CPU pinning.  Have your VM configured as follows:

  • Set machine type to the latest i440fx..
  • Boot in OVMF and not seaBIOS for Windows 8 and Windows 10.  Your GPU must support UEFI boot if you are doing GPU passthrough.
  • Set Hyper-V to 'yes' unless you need it off for Nvidia GPUs.
  • Don't initially assign more that 8 GB of memory and set 'Initial' and 'Max' memory at the same value so memory ballooning is off.
  • Don't assign more than 4 CPUs total.  Assign CPUs in pairs to your VM if it supports Hyperthreading.
  • Be sure you are using the latest GPU driver.
  • I have had issues with virtio network drivers newer than 0.1.100 on Windows 7.  Try that driver first and then update once your VM is performing properly.

 

Get the best performance you can by adjusting the memory and CPU settings.  Don't over provision CPUs and memory.  You may find that the performance will decrease.  More is not always better.

 

If you have more than 8GB of memory in your unRAID system, I also suggest installing the 'Tips and Tweaks' plugin and setting the 'Disk Cache' settings to the suggested values for VMs.  Click the 'Help' button for the suggestions.  Also set 'Disable NIC flow control' and 'Disable NIC offload' to 'Yes'.  These settings are known to cause VM performance issues in some cases.  You can always go back and change them later.

 

Once you have your VM running correctly, you can then adjust CPU pinning to possibly improve the performance.  Unless you have your VM configured as above, you will probably be wasting your time with CPU pinning.

 

What is Hyperthreading?

 

Hyper threading is a means to share one CPU core with multiple processes.  The architecture of a hyperthread core is a core and two hyperthreads.  It looks like this:

 

HT ---- core ---- HT

 

It is not a base core and a HT:

 

core ---- HT

 

When isolating CPUs, the best performance is gained by isolating and assigning both pairs for a VM, not just what some think as the '"core".

 

Why Isolate and Assign CPUs

 

Some VMs suffer from latency because of sharing the hyperthreaded cpus.  The method I have described here helps with the latency caused by cpu sharing and context switching between hyperthreads.

 

If you have a VM that is suffering from stuttering or pauses in media playback or gaming, this procedure may help.  Don't assign more cpus to a VM that has latency issues.  That is generally not the issue.  I also don't recommend assigning more than 4 cpus to a VM.  I don't know why any VM needs that kind of horsepower.

 

In my case I have a Xeon 4 core processor with Hyperthreading.  The CPU layout is:

 


0,4
1,5
2,6
3,7
 

The Hyperthread pairs are (0,4) (1,5) (2,6) and (3,7).  This means that one core is used for two Hyperthreads.  When assigning CPUs to a high performance VM, CPUs should be assigned in Hyperthread pairs.

 

I isolated some CPUs to be used by the VM from Linux with the following in the syslinux configuration on the flash drive:

 


append isolcpus=2,3,6,7 initrd=/bzroot
 

This tells Linux that the physical CPUs 2,3,6 and 7 are not to be managed or used by Linux.

 

There is an additional setting for vcpus called 'emulatorpin'.  The 'emulatorpin' entry puts the emulator tasks on other CPUs and off the VM CPUs.

 

I then assigned the isolated CPUs to my VM and added the 'emulatorpin':

 


  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
    <emulatorpin cpuset='0,4'/>
  </cputune>
 

What ends up happening is that the 4 logical CPUs (2,3,6,7) are not used by Linux but are available to assign to VMs.  I then assigned them to the VM and pinned emulator tasks to CPUs (0,4).  This is the first CPU pair.  Linux tends to favor the low numbered CPUs.

 

Make your CPU assignments in the VM editor and then edit the xml and add the emulatorpin assignment.  Don't change any other CPU settings in the xml.  I've seen recommendations to change the topology:

 


  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='2' threads='2'/>
  </cpu>
 

Don't make any changes to this setting.  The VM manager does it appropriately.  There is no advantage in making changes and it can cause problems like a VM that crashes.

 

This has greatly improved the performance of my Windows 7 Media Center VM serving Media Center Extenders.

 

I am not a KVM expert and this may not be the best way to do this, but in reading some forum posts and searching the internet, this is the best I've found so far.

 

I would like to see LT offer some performance tuning settings in the VM manager that would help with these settings to improve performance in a VM without all the gyrations I've done here to get the performance I need in my VM.  They could at least offer some 'emulatorpin' settings.

 

Note: I still see confusion about physical CPUs, vcpus, and hyperthreaded pairs.  CPU pairs like 3,7 are two threads that share a core.  It is not a core with a hyperthread.

 

When isolating and assigning CPUs to a VM, do it in pairs.  Don't isolate and assign one (3) and not its pair (7) unless you don't assign 7 to any other VM.  This is not going to give you what you want.

 

vcpus are relative to the VM only.  You don't isolate vcpus, you isolate physical CPUs that are then assigned to VM vcpus.

3 years later, are there any changes and/or additions to the VM settings available in unRAID's vm creation template that you'd suggest?

Share this post


Link to post

Dear all

 

I'm really struggling to get good performance on W10 machine. It is used mainly for gaming. But always lagging in games. 

image.png.cf801a5ea89d74fdd6eed2fe1f67f4b4.png

 

image.png.416a79893e78b3ac4edfda07d73c278e.png

 

Syslinux:

kernel /bzimage
append isolcpus=4,5,6,7,8,9,10,11,12,13,14,15,20,21,22,23,24,25,26,27,28,29,30,31 pcie_acs_override=downstream vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot

W10 machine : 16GB assigned.

--------
<type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
--------
<vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='12'/>
    <vcpupin vcpu='1' cpuset='28'/>
    <vcpupin vcpu='2' cpuset='13'/>
    <vcpupin vcpu='3' cpuset='29'/>
    <vcpupin vcpu='4' cpuset='14'/>
    <vcpupin vcpu='5' cpuset='30'/>
    <vcpupin vcpu='6' cpuset='15'/>
    <vcpupin vcpu='7' cpuset='31'/>
    <emulatorpin cpuset='0,16'/>
  </cputune>

System is on SSD :

<disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/disk/by-id/ata-CT1000MX500SSD1_1902E1E1D374'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>

Nvidia 960 :

image.png.a9807728a848902ae30ad61fa2d77bdd.png

 

 

Please, any help is really appreciated, I thought should be better...But is not as good as expected in terms of performance.

Share this post


Link to post
8 hours ago, mucflyer said:

Dear all

 

I'm really struggling to get good performance on W10 machine. It is used mainly for gaming. But always lagging in games. 

image.png.cf801a5ea89d74fdd6eed2fe1f67f4b4.png

 

image.png.416a79893e78b3ac4edfda07d73c278e.png

 

Syslinux:


kernel /bzimage
append isolcpus=4,5,6,7,8,9,10,11,12,13,14,15,20,21,22,23,24,25,26,27,28,29,30,31 pcie_acs_override=downstream vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot

W10 machine : 16GB assigned.


--------
<type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
--------
<vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='12'/>
    <vcpupin vcpu='1' cpuset='28'/>
    <vcpupin vcpu='2' cpuset='13'/>
    <vcpupin vcpu='3' cpuset='29'/>
    <vcpupin vcpu='4' cpuset='14'/>
    <vcpupin vcpu='5' cpuset='30'/>
    <vcpupin vcpu='6' cpuset='15'/>
    <vcpupin vcpu='7' cpuset='31'/>
    <emulatorpin cpuset='0,16'/>
  </cputune>

System is on SSD :


<disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/disk/by-id/ata-CT1000MX500SSD1_1902E1E1D374'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>

Nvidia 960 :

image.png.a9807728a848902ae30ad61fa2d77bdd.png

 

 

Please, any help is really appreciated, I thought should be better...But is not as good as expected in terms of performance.

I´m no expert....

but have you tried to run the system with disabled Hyperthreading? ( it got my system a little "boost")

and only using 8GB instead of 16gb?

 

In my setup i always got fps-drops from 60 to 12, dont know why...

I hope i could help.

Share this post


Link to post
11 hours ago, mucflyer said:

Dear all

 

I'm really struggling to get good performance on W10 machine. It is used mainly for gaming. But always lagging in games. 

image.png.cf801a5ea89d74fdd6eed2fe1f67f4b4.png

 

image.png.416a79893e78b3ac4edfda07d73c278e.png

 

Syslinux:


kernel /bzimage
append isolcpus=4,5,6,7,8,9,10,11,12,13,14,15,20,21,22,23,24,25,26,27,28,29,30,31 pcie_acs_override=downstream vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot

W10 machine : 16GB assigned.


--------
<type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
--------
<vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='12'/>
    <vcpupin vcpu='1' cpuset='28'/>
    <vcpupin vcpu='2' cpuset='13'/>
    <vcpupin vcpu='3' cpuset='29'/>
    <vcpupin vcpu='4' cpuset='14'/>
    <vcpupin vcpu='5' cpuset='30'/>
    <vcpupin vcpu='6' cpuset='15'/>
    <vcpupin vcpu='7' cpuset='31'/>
    <emulatorpin cpuset='0,16'/>
  </cputune>

System is on SSD :


<disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/disk/by-id/ata-CT1000MX500SSD1_1902E1E1D374'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>

Nvidia 960 :

image.png.a9807728a848902ae30ad61fa2d77bdd.png

 

 

Please, any help is really appreciated, I thought should be better...But is not as good as expected in terms of performance.

Q35 machine. Do you have this section of code in the xml, before </domain>? If not, add it. Otherwise your GPU runs at PCIe x1.

 

 

<qemu:commandline> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.speed=8'/> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.width=16'/> </qemu:commandline>

Share this post


Link to post

No feedback yet on @mucflyer's request but are you able to pass the graphics card through without using pcie_acs_override? Enabling that is known to cause lag.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.