Performance Improvements in VMs by adjusting CPU pinning and assignment


Recommended Posts

There have been several posts on the forum about VM performance improvements by adjusting CPU pinning and assignments in cases of VMs stuttering on media playback and gaming.  I've put together what I think is the best of those ideas.  I don't necessarily think this is the total answer, but it has helped me with a particularly latency sensitive VM.

 

Windows VM Configuration

 

You need to have a well configured Windows VM in order to get any improvement with CPU pinning.  Have your VM configured as follows:

  • Set machine type to the latest i440fx..
  • Boot in OVMF and not seaBIOS for Windows 8 and Windows 10.  Your GPU must support UEFI boot if you are doing GPU passthrough.
  • Set Hyper-V to 'yes' unless you need it off for Nvidia GPUs.
  • Don't initially assign more that 8 GB of memory and set 'Initial' and 'Max' memory at the same value so memory ballooning is off.
  • Don't assign more than 4 CPUs total.  Assign CPUs in pairs to your VM if it supports Hyperthreading.
  • Be sure you are using the latest GPU driver.
  • I have had issues with virtio network drivers newer than 0.1.100 on Windows 7.  Try that driver first and then update once your VM is performing properly.

 

Get the best performance you can by adjusting the memory and CPU settings.  Don't over provision CPUs and memory.  You may find that the performance will decrease.  More is not always better.

 

If you have more than 8GB of memory in your unRAID system, I also suggest installing the 'Tips and Tweaks' plugin and setting the 'Disk Cache' settings to the suggested values for VMs.  Click the 'Help' button for the suggestions.  Also set 'Disable NIC flow control' and 'Disable NIC offload' to 'Yes'.  These settings are known to cause VM performance issues in some cases.  You can always go back and change them later.

 

Once you have your VM running correctly, you can then adjust CPU pinning to possibly improve the performance.  Unless you have your VM configured as above, you will probably be wasting your time with CPU pinning.

 

What is Hyperthreading?

 

Hyper threading is a means to share one CPU core with multiple processes.  The architecture of a hyperthread core is a core and two hyperthreads.  It looks like this:

 

HT ---- core ---- HT

 

It is not a base core and a HT:

 

core ---- HT

 

When isolating CPUs, the best performance is gained by isolating and assigning both pairs for a VM, not just what some think as the '"core".

 

Why Isolate and Assign CPUs

 

Some VMs suffer from latency because of sharing the hyperthreaded cpus.  The method I have described here helps with the latency caused by cpu sharing and context switching between hyperthreads.

 

If you have a VM that is suffering from stuttering or pauses in media playback or gaming, this procedure may help.  Don't assign more cpus to a VM that has latency issues.  That is generally not the issue.  I also don't recommend assigning more than 4 cpus to a VM.  I don't know why any VM needs that kind of horsepower.

 

In my case I have a Xeon 4 core processor with Hyperthreading.  The CPU layout is:

 

0,4
1,5
2,6
3,7
 

The Hyperthread pairs are (0,4) (1,5) (2,6) and (3,7).  This means that one core is used for two Hyperthreads.  When assigning CPUs to a high performance VM, CPUs should be assigned in Hyperthread pairs.

 

I isolated some CPUs to be used by the VM from Linux with the following in the syslinux configuration on the flash drive:

 

append isolcpus=2,3,6,7 initrd=/bzroot
 

This tells Linux that the physical CPUs 2,3,6 and 7 are not to be managed or used by Linux.

 

There is an additional setting for vcpus called 'emulatorpin'.  The 'emulatorpin' entry puts the emulator tasks on other CPUs and off the VM CPUs.

 

I then assigned the isolated CPUs to my VM and added the 'emulatorpin':

 

  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
    <emulatorpin cpuset='0,4'/>
  </cputune>
 

What ends up happening is that the 4 logical CPUs (2,3,6,7) are not used by Linux but are available to assign to VMs.  I then assigned them to the VM and pinned emulator tasks to CPUs (0,4).  This is the first CPU pair.  Linux tends to favor the low numbered CPUs.

 

Make your CPU assignments in the VM editor and then edit the xml and add the emulatorpin assignment.  Don't change any other CPU settings in the xml.  I've seen recommendations to change the topology:

 

  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='2' threads='2'/>
  </cpu>
 

Don't make any changes to this setting.  The VM manager does it appropriately.  There is no advantage in making changes and it can cause problems like a VM that crashes.

 

This has greatly improved the performance of my Windows 7 Media Center VM serving Media Center Extenders.

 

I am not a KVM expert and this may not be the best way to do this, but in reading some forum posts and searching the internet, this is the best I've found so far.

 

I would like to see LT offer some performance tuning settings in the VM manager that would help with these settings to improve performance in a VM without all the gyrations I've done here to get the performance I need in my VM.  They could at least offer some 'emulatorpin' settings.

 

Note: I still see confusion about physical CPUs, vcpus, and hyperthreaded pairs.  CPU pairs like 3,7 are two threads that share a core.  It is not a core with a hyperthread.

 

When isolating and assigning CPUs to a VM, do it in pairs.  Don't isolate and assign one (3) and not its pair (7) unless you don't assign 7 to any other VM.  This is not going to give you what you want.

 

vcpus are relative to the VM only.  You don't isolate vcpus, you isolate physical CPUs that are then assigned to VM vcpus.

Edited by dlandon
  • Like 5
  • Thanks 2
  • Upvote 1
Link to comment

Thats interesting.

I did some testing a few days ago pinning hyperthreaded pairs then pinning none hyperthreaded pairs and running passmark cpu benchmark software afterwards

 

this is my results using a 12 core xeon @ 2.4ghz 

 

Using 8 threads or 4 cpu cores

 

firstly 8 threads  with paired hyperthreads  the passmark score is    7256

next  8 threads  all from non paired separate so one thread from different core passmark score is 10417

 

I as the threads were all on separate cores the speed was faster but i guess wouldnt have been if the other thread on each core was being used by another process.

 

I know this is a bit off topic but thought it interesting as i never expected this result

  • Like 2
Link to comment

Thats interesting.

I did some testing a few days ago pinning hyperthreaded pairs then pinning none hyperthreaded pairs and running passmark cpu benchmark software afterwards

 

this is my results using a 12 core xeon @ 2.4ghz 

 

Using 8 threads or 4 cpu cores

 

firstly 8 threads  with paired hyperthreads  the passmark score is    7256

next  8 threads  all from non paired separate so one thread from different core passmark score is 10417

 

I as the threads were all on separate cores the speed was faster but i guess wouldnt have been if the other thread on each core was being used by another process.

 

I know this is a bit off topic but thought it interesting as i never expected this result

 

This is exactly what I would expect in this situation where you are testing maximum speed.  8 cores will be faster than 4 cores with Hyperthreading.  If the cores are being used by other processes, you will see a passmark decrease.  In your first case, you were actually sharing the 4 cores with yourself.

  • Like 1
Link to comment

That's interesting indeed as I'm trying to improve performance of a specific VM for gaming.

 

Is there a reference or technique for determining which pairs match? I have an AMD FX Piledriver and a quick google didn't reveal much of use.

 

Thanks

 

Link to comment

That's interesting indeed as I'm trying to improve performance of a specific VM for gaming.

 

Is there a reference or technique for determining which pairs match? I have an AMD FX Piledriver and a quick google didn't reveal much of use.

 

Thanks

 

The unRaid dashboard shows the paired CPUs.

Link to comment

There have been several posts on the forum about VM performance improvements by adjusting CPU pinning and assignments in cases of VMs stuttering on media playback and gaming.  I've put together what I think is the best of those ideas.  I don't necessarily think this is the total answer, but it has helped me with a particularly latency sensitive VM.

 

In my case I have a Xeon 4 core processor with Hyperthreading.  The CPU layout is:

 

0,4
1,5
2,6
3,7

 

The Hyperthread pairs are (0,4) (1,5) (2,6) and (3,7).  This means that one core is used for two Hyperthreads.  When assigning CPUs to a high performance VM, CPUs should be assigned in Hyperthread pairs.

 

I isolated some CPUs to be used by the VM from Linux with the following in the syslinux configuration on the flash drive:

 

append isolcpus=2,3,6,7 initrd=/bzroot

 

This tells Linux that the physical CPUs 2,3,6 and 7 are not to be managed or used by Linux.

 

So can they be pinned to dockers or only vms after isolocpus

 

Link to comment

There have been several posts on the forum about VM performance improvements by adjusting CPU pinning and assignments in cases of VMs stuttering on media playback and gaming.  I've put together what I think is the best of those ideas.  I don't necessarily think this is the total answer, but it has helped me with a particularly latency sensitive VM.

 

In my case I have a Xeon 4 core processor with Hyperthreading.  The CPU layout is:

 

0,4
1,5
2,6
3,7

 

The Hyperthread pairs are (0,4) (1,5) (2,6) and (3,7).  This means that one core is used for two Hyperthreads.  When assigning CPUs to a high performance VM, CPUs should be assigned in Hyperthread pairs.

 

I isolated some CPUs to be used by the VM from Linux with the following in the syslinux configuration on the flash drive:

 

append isolcpus=2,3,6,7 initrd=/bzroot

 

This tells Linux that the physical CPUs 2,3,6 and 7 are not to be managed or used by Linux.

 

So can they be pinned to dockers or only vms after isolocpus

 

I don't know.  I've not tried.

Link to comment

There have been several posts on the forum about VM performance improvements by adjusting CPU pinning and assignments in cases of VMs stuttering on media playback and gaming.  I've put together what I think is the best of those ideas.  I don't necessarily think this is the total answer, but it has helped me with a particularly latency sensitive VM.

 

In my case I have a Xeon 4 core processor with Hyperthreading.  The CPU layout is:

 

0,4
1,5
2,6
3,7

 

The Hyperthread pairs are (0,4) (1,5) (2,6) and (3,7).  This means that one core is used for two Hyperthreads.  When assigning CPUs to a high performance VM, CPUs should be assigned in Hyperthread pairs.

 

I isolated some CPUs to be used by the VM from Linux with the following in the syslinux configuration on the flash drive:

 

append isolcpus=2,3,6,7 initrd=/bzroot

 

This tells Linux that the physical CPUs 2,3,6 and 7 are not to be managed or used by Linux.

 

So can they be pinned to dockers or only vms after isolocpus

 

I don't know.  I've not tried.

 

I emailed the limetech guys about this and thought i would post what they replied in case others are interested.

 

isolcpus prevents the host OS from assigning the logical CPUs specified in the parameter to any processes (kernel or user space) at boot.  Users can manually assign processes back to those CPUs using Linux's cpuset capabilities.  Docker and virtual machines both support this.  So you can use isolcpus to reserve a set of logical CPUs and then assign docker containers to some and VMs to others and leave the unisolated CPUs for host-based tasks, if you so desire.  There may be some advantages to this, there may be some disadvantages to this.

 

With respect to adding isolcpus capability post-boot, so that you can toggle its usage on and off, that is an interesting concept that we have been investigating, but it's not something that's built into Linux's capability set today just yet.  We could manually manipulate some things in the user space for this, but it's not something we're tackling as a project right now.

  • Thanks 1
Link to comment

There have been several posts on the forum about VM performance improvements by adjusting CPU pinning and assignments in cases of VMs stuttering on media playback and gaming.  I've put together what I think is the best of those ideas.  I don't necessarily think this is the total answer, but it has helped me with a particularly latency sensitive VM.

 

In my case I have a Xeon 4 core processor with Hyperthreading.  The CPU layout is:

 

0,4
1,5
2,6
3,7

 

The Hyperthread pairs are (0,4) (1,5) (2,6) and (3,7).  This means that one core is used for two Hyperthreads.  When assigning CPUs to a high performance VM, CPUs should be assigned in Hyperthread pairs.

 

I isolated some CPUs to be used by the VM from Linux with the following in the syslinux configuration on the flash drive:

 

append isolcpus=2,3,6,7 initrd=/bzroot

 

This tells Linux that the physical CPUs 2,3,6 and 7 are not to be managed or used by Linux.

 

There is an additional setting for vcpus called 'emulatorpin'.  The 'emulatorpin' entry puts the emulator tasks on other CPUs and off the VM CPUs.

 

I then assigned those CPUs to my VM and added the 'emulatorpin':

 

  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
    <emulatorpin cpuset='0,1,4,5'/>
  </cputune>

 

What ends up happening is that the 4 physical CPUs (2,3,6,7) are not used by Linux but are available to assign to VMs.  I then assigned them to the VM and pinned emulator tasks to the other CPUs (0,1,4,5).

 

This has greatly improved the performance of my Windows 7 Media Center VM serving Media Center Extenders.

 

I am not a KVM expert and this may not be the best way to do this, but in reading some forum posts and searching the internet, this is the best I've found so far.

 

I would like to see LT offer some performance tuning settings in the VM manager that would help with these settings to improve performance in a VM without all the gyrations I've done here to get the performance I need in my VM.  They could at least offer some 'emulatorpin' settings.

 

Thankyou dlandon for that excellent info. I have isolcpu 8 cores for my vms and left the rest for unraid and dockers and pinned emulator tasks to 2 of those cores.

Wow i get an extra 1000 score running geek bench3 on my osx vm. And i get no latency on my windows 10 vm.

What a difference..... great info thanks so much for sharing :)

Link to comment

Yes should be stickied!

Yes dockers can be pinned aswell.

click the advanced view then add to extra parameters 

cpuset-cpus= 

 

The cpus can be pinned if you have or havent used isolcpus. I dont pin to isolcpus cores myself just to the ones used by unraid, i just avoid pinning dockers to the first pair of threads as i have heard the linux os prefers these itself.

 

Edit.....oops sorry squid see you just replied before me!!

Link to comment

Wow this is great information, thanks for taking the time to investigate as I have noticed significant differences when using previous CPU pinning options. I have a 5930k and if I try to isolate 1,7,2,8 and then assign them to the Windows 10 VM I notice considerable stutter in games and on Youtube, if I change this to 4 physical cores ie 1,2,3,4 then the issues no longer occur. I did not use cores 0, 6 as I have also heard that unRAID prefers these cores. I am going to test out the new method you have posted and will report back.

Link to comment
  • 2 weeks later...

Robj has stickied this thread, so hopefully it will be easier to find.

 

I am reconsidering my position on emulator pinning cpus and I am doing some testing right now to see if my idea has merit.  If it does, I will suggest a VM manager change to LT that is very straight forward that will help people get through this without so many gyrations.

 

If someone wants help in assigning cpus please post the following information and I will make recommendations on your setup.

  • Post your cpu pairing from the dashboard.
  • List your VMs and what each VM is doing.
  • Any issues you are having with VMs as far as performance and stuttering or choppy performance.

 

I will review and make recommendations about how to set up your configuration.

Link to comment

not pinning the ht cores fixed my problems with fps drops in games! i can now play gta v (3440x1440, ultra settings, fxaa, no msaa/txaa) at ~35-45 fps (before only ~20-30 fps)

I remember also the user gridrunner saying that he was getting better performance when not pinning them in HT pairs :o

Link to comment

not pinning the ht cores fixed my problems with fps drops in games! i can now play gta v (3440x1440, ultra settings, fxaa, no msaa/txaa) at ~35-45 fps (before only ~20-30 fps)

I remember also the user gridrunner saying that he was getting better performance when not pinning them in HT pairs :o

 

If I recall the post, he was referring to only using one of the cpu pairs and letting the other be unassigned.  I believe it looked like this:

 

0,4

1,5

2,6

3,7

 

Isolate 2,3,6,7

 

Assign 2,3 to the VM and leave 6 and 7 unassigned.

 

That would leave only one hyperthread running and the other idle.  That would mean that there would not be any context switching delays between cpu pairs 2,3 and 6,7.

 

This could in theory work to reduce latency, but I'm not sure that is generally necessary.

 

 

Link to comment

I've changed the OP to reflect some new thinking I have about the emulator pinning.  I think that it would be best to pin the emulator cpus to only the first cpu pair.  In fact I am thinking all VMs should have the emulator cpus pinned to the first cpu pair.

 

I have done this for both my VMs and it is working quite well.

 

I am thinking that we should not pin any VM cpus to the first pair and leave those for Linux, unRAID, and emulator cpus.

 

Let me know if pinning the emulator cpus only to the first cpu pair works for you.

Link to comment

Hey dlandon, I had a quick question about the emulator pinning.

 

I've noticed on my system that the Dashboard's System Status in the unRAID webGUI can frequently show the first pair (0,8) topping out at 2400/2400MHz on my 8 core xeon. I know that you recommended using only the first pair for emulator pinning (even with multiple VMs), but I was curious if it's possible to emulator pin more than one pair per VM? Also, what would the xml for a VM like that look like?

 

I have all the other pairs (sans 0,1,8,9) isolated in the syslinux config.

I'm currently using this xml for the VM in question.

<emulatorpin cpuset='0,8'/>

 

Could I emulator pin more pairs using something like this?

<emulatorpin cpuset='0,1,8,9'/>

 

Bit of a side note; I've also noticed that though 1,9 isn't explicitly assigned to anything, that it idles around 2200/2200MHz since I isolated and pinned the other cpus. Normally they all rest at 1200/1200MHz.

Link to comment

I've read something about disabling hyperthreading may solve some problems regarding latency. Is that done in BIOS?

 

What performance gain/loss will that result in? Someone said 10% gain in using hyperthreading?

 

Then I guess just 6 cpus will show up, instead of 12 in my system, and it makes pinning and assignment somewhat easier to manage.

Link to comment

I've read something about disabling hyperthreading may solve some problems regarding latency. Is that done in BIOS?

 

What performance gain/loss will that result in? Someone said 10% gain in using hyperthreading?

 

Then I guess just 6 cpus will show up, instead of 12 in my system, and it makes pinning and assignment somewhat easier to manage.

Yes, you can disable HT in the BIOS!

Link to comment

Hey dlandon, I had a quick question about the emulator pinning.

 

I've noticed on my system that the Dashboard's System Status in the unRAID webGUI can frequently show the first pair (0,8) topping out at 2400/2400MHz on my 8 core xeon. I know that you recommended using only the first pair for emulator pinning (even with multiple VMs), but I was curious if it's possible to emulator pin more than one pair per VM? Also, what would the xml for a VM like that look like?

 

I have all the other pairs (sans 0,1,8,9) isolated in the syslinux config.

I'm currently using this xml for the VM in question.

<emulatorpin cpuset='0,8'/>

 

Could I emulator pin more pairs using something like this?

<emulatorpin cpuset='0,1,8,9'/>

 

Bit of a side note; I've also noticed that though 1,9 isn't explicitly assigned to anything, that it idles around 2200/2200MHz since I isolated and pinned the other cpus. Normally they all rest at 1200/1200MHz.

 

Linux tends to favor the low numbered cpus, so I would expect the first pair would be loaded more.  My recommendation is based on an idea I have presented to LT as a feature request to emulator pin the first pair in all VMs using the VM manager.  The idea is that in a lot of cases doing this would remove latency from some VMs and cut down on the the gyrations needed to get VMs running better.  I was looking for feedback like yours to see if my recommendation has merit.

 

Yes, you can emulatorpin other cpus.  I have been unable to find anything on the internet about how much load the emulator tasks on a VM load cpus, but my gut feel is not much.

 

Linux will use additional cpus as it needs them, so any that are not isolated will be available for Linux.

 

Don't get too concerned about cpu frequency.  The cpu scaling driver/governor that adjusts the cpu frequency can switch up the cpu frequency under a relatively small load.  The Intel pstate driver I am using on my Xeon will ramp up to full speed under a 10-15% load.  Full speed on a cpu does not necessarily mean it is fully loaded.

Link to comment

I have been unable to find anything on the internet about how much load the emulator tasks on a VM load cpus, but my gut feel is not much.

Maybe it could be tested by isolating 1 or 2 cores and assign them only to the emulatorpin and see how much usage the VM does?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.