Performance Improvements in VMs by adjusting CPU pinning and assignment


Recommended Posts

41 minutes ago, 1812 said:

 

I don't use those, so you'll have to try and see.

 

 

nope. 

 

 

thanks for that.

 

last question,

can I isolate 2 cpus and use 4 cpus for that VM?

I mean,

just like now when I'm passing the 4 cpus, only that 2 of them will be isolated for this VM only ?

 

Thanks.

Link to comment
17 minutes ago, amstel said:

can I isolate 2 cpus and use 4 cpus for that VM?

I mean,

just like now when I'm passing the 4 cpus, only that 2 of them will be isolated for this VM only ?

 

Yes. Just because a cpu is isolated/not isolated from unRaid doesn't keep it from being assigned to a vm.

 

If you do that, my suggestion would be to ensure that vcpu 0 of the vm is an isolated core, as windows and most other operating systems will use that as a primary resource and favor it, especially during initial booting of the vm. If you did not change it, then the vm and unraid would try to use that as primary, resealing in decreased performance. Also note that using the shared cores (non isolated) may contribute to latency, diminishing he point of isolating cores.

Link to comment
20 minutes ago, 1812 said:

 

Yes. Just because a cpu is isolated/not isolated from unRaid doesn't keep it from being assigned to a vm.

 

If you do that, my suggestion would be to ensure that vcpu 0 of the vm is an isolated core, as windows and most other operating systems will use that as a primary resource and favor it, especially during initial booting of the vm. If you did not change it, then the vm and unraid would try to use that as primary, resealing in decreased performance. Also note that using the shared cores (non isolated) may contribute to latency, diminishing he point of isolating cores.

 

yes I know that,

I just don't want the VM to be too slow, cause using 'only' 2 hyperthreaded cores (1 cpu).

 

I will give those 2 different configurations a try and will test each to see which will act better.

 

Thanks for the help!

Edited by amstel
Link to comment
1 hour ago, 1812 said:

 

Yes. Just because a cpu is isolated/not isolated from unRaid doesn't keep it from being assigned to a vm.

 

If you do that, my suggestion would be to ensure that vcpu 0 of the vm is an isolated core, as windows and most other operating systems will use that as a primary resource and favor it, especially during initial booting of the vm. If you did not change it, then the vm and unraid would try to use that as primary, resealing in decreased performance. Also note that using the shared cores (non isolated) may contribute to latency, diminishing he point of isolating cores.

well,

I have changed cpu (1,3) to be isolated.

 

now,

I'd like to assign all 4 of them to the VM.

 

  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='0'/>
    <vcpupin vcpu='3' cpuset='2'/>
    <emulatorpin cpuset='2'/>
  </cputune>

 

do I still need to use the emulatorpin feature?

if yes, which CPU should I assign there?

 

Thanks.

Edited by amstel
Link to comment
7 minutes ago, amstel said:

well,

I have changed cpu (1,3) to be isolated.

 

now,

I'd like to assign all 4 of them to the VM.

 

  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='0'/>
    <vcpupin vcpu='3' cpuset='2'/>
    <emulatorpin cpuset='2'/>
  </cputune>

 

do I still need to use the emulatorpin feature?

if yes, which CPU should I assign there?

 

Thanks.

 

your core assignments look soooo wonky, which is fine because windows 10 doesn't care about where the cores come from (as shown through benchmarking tests) or even using threaded pairs.

 

If you're using all the cores, there is no need to really specify an emulator pin. There might be a super small gain? But if you want to, make it 2 or 3

Link to comment
20 minutes ago, 1812 said:

 

your core assignments look soooo wonky, which is fine because windows 10 doesn't care about where the cores come from (as shown through benchmarking tests) or even using threaded pairs.

 

If you're using all the cores, there is no need to really specify an emulator pin. There might be a super small gain? But if you want to, make it 2 or 3

 

mmm,

well, I tested 4 different combinations with GeekBench cpu benchmark

and those are the results (singlecore, multicore):

4 cpu's, no isolation: 3805, 6951

2 isolated cpus (1,3): 3323, 4041

2 isolated cpus (1,3) + 2 shared cpus: 3632, 6588

2 isolated cpus (1,3) + 2 shared cpus + emulatorpin=2: 3554, 6781

 

it seems that the normal configurations gives the best benchmark result.

what am I missing here?

 

Thanks.

Link to comment

typically you should run each test a minimum of three times to find the average. your system could have been doing something in the background at any time causing a variance. Your first and last tests are within 3% of each other and both using all the cores.  

 

VM performance is more than just a raw benchmark score. On one of my transcoding cluster servers, I usually give it all processor cores since the entire server is only doing 1 task assigned remotely. This does not take into account any audio/video latency that could occur because I don't never use those interfaces. CPU pinnings are typically done to improve the ability to run multiple instances of something, be it vm's, dockers, etc... And find a balance that works to be able to do all at the same time. So it's not surprising that your tests showing all cores being used are similar in speed. 

 

Additionally, your numbers will be different depending on what you have running in the background. If you have a docker or two trying to use 30% of 2 cores, vs 30% of 4 cores, you vm will have a noticeable difference in your benchmark scores.

 

 

 

 

 

 

Link to comment

You are moving the vm/host functions off the cores being utilized for the vm, so you are allowing more processing power to be dedicated to the vm, regardless of the os.

 

But, i've found that when benchmarking with and without an emulator pin on a vm with many cores, there is only a very small difference in performance. My suspicion is that when you have 1-3 cores that are being pushed to full utilization, or on a host with low cpu power, that is when you'll find the most benefit.

 

Link to comment

I'm using a higher core count Xeon (22 CPUs/44 threads).  I'm wondering if I should take into account the routing of these cores.  In this generation of Xeon, there are multiple rings (see attached).  

 

It seems like you would want to keep VM cores near each other in the ring, so that they would use the fabric more efficiently.  In the simplest case, I would create two VMs out of this topology using ring 1 and ring 2.  When we go to 4 VMs, maybe align to the 4 columns.  

 

Of course, I can just as easily convince myself that the CPUs should be spread out so that each one has more distributed bandwidth.  I think this works out if there are multiple VMs that are not necessarily running full tilt at the same time.  Spreading out the CPUs ensures that each CPU lives in a "quiet neighborhood."  Therefore, the isolated VM with spread out CPUs has more paths to memory and fewer rivals.  But when all VMs are enabled and under load, they will end up stepping all over each other.

 

2017-03-21_113639.png

Link to comment

I took an educated guess on the cpu numbering, then mapped all 8 VMs + 1 core for unraid.  It looks like the diagram.  I mainly mapped things based on what was easy in Visio.  The final mapping does the following:

Unraid:  Gets first 2 threads

Vanaheim and Muspelheim:  Get the middle of each ring.  Consume some of the other VM mappings if I want to run just these two.

Alfheim and Svartalfheim:  My next two VMs getting prepped for a new FOVE headset

Niflheim, Helheim, Asgard, Midgard: Minimum size VMs are placeholders for future expansion

Utgard:  I'd like to have a dual-boot option into windows (machine name Jotunheim).  Utgard is a VM that can read and execute from Jotunheim's unmanaged disk.

 

2017-03-22_114615.thumb.png.3826fd29953395748aa802d26a88f9ca.png

 

 

 

Edited by harperhendee
Removed early draft of mapping
Link to comment

BTW, I was able to confirm a few things about CPU numbering and physical layout from Anandtech and a few other sources:

1)  The CPUs are numbered in roughly consecutive order around the rings.  The numbering is determined by a correlated variable, so they might not be exactly as you'd expect, but there's no functional difference between 0,1,2,3,4,5 and 1,0,3,2,5,4 ordering.

2)  Base layouts give you basically 1, 1.5, or 2 iterations of the ring structure.  If there are 12 cores in a ring, there will be 3 versions with 12/18/24 physical cores. 

3)  The CPUs are always fused off in equal numbers from each ring.  This includes half-rings.  So the 22 core Xeon has two rings of 11 cores each.  There's never a 10 and 12 core ring.

4)  There is a small latency price to pay when crossing rings.  Try to minimize cross-ring traffic.  From a topological point of view, if you cross a ring, you generate 2x the bandwidth.

 

 

 

  • Upvote 1
Link to comment
  • 1 month later...

Trying to work out the best approach having read through all this for me.

 

I have a 14 core E5-2695 v3 and 80GB of RAM for doing various VM type scenarios. I often run a few different Windows VMs at once.

 

cpu 0 / 14

cpu 1 / 15

cpu 2 / 16

cpu 3 / 17

cpu 4 / 18

cpu 5 / 19

cpu 6 / 20

cpu 7 / 21

cpu 8 / 22

cpu 9 / 23

cpu 10 / 24

cpu 11 / 25

cpu 12 / 26

cpu 13 / 27

 

I am thinking of assigning pairs 03/17 onwards in pairs  to VMs. So a beefy server would get 12/26 and 13/27 for instance. Another server might get 10/24 and 11/25 assigned. That way I am not trying to put threads across VMs. Not sure it is worth pinning the emulator stuff? Should I add the isolcpu bits to syslinux so that unRAID can only use say 0/14, 1/15 and 02/16 for itself and Dockers...

 

 

 

Link to comment
  • 4 weeks later...

Hello, i have a hp z800 with 2 xeon x5650, my sysdevs show the pairing like this:

 

cpu 0 <===> cpu 12
cpu 1 <===> cpu 13
cpu 2 <===> cpu 14
cpu 3 <===> cpu 15
cpu 4 <===> cpu 16
cpu 5 <===> cpu 17
cpu 6 <===> cpu 18
cpu 7 <===> cpu 19
cpu 8 <===> cpu 20
cpu 9 <===> cpu 21
cpu 10 <===> cpu 22
cpu 11 <===> cpu 23

 

i dont know how to differ the CORE vs the ht... HALP! plzzzz

Link to comment

That's because there is no difference between Core and HyperThread. Core and Hyperthread is 2 sides to the same coin. It depends which side is face up.

 

Both 0 and 12 are part of the same CORE and the HT. Which one is operating at the time determines which one is considered the "core" and the one not currently operating can be considered the "hyperthread". 

Link to comment
  • 3 weeks later...

Hi,

 

I've followed most of the advice on this entry and still falling short of native performances by 16% for an i7-6700k without OC.

 

For CPU Single Thread I score 393 (average of three tests with nothing else running) while the reference for my processor is 474 according to CPU-Z. The CPU multi-thread test is irrelevant as not all the cores are linked to the VM. 

 

Do any of you achieve to have more negligible losses (~5%)? 

 

My config:

name>Windows 10</name>
  <uuid>d560712f-74e7-e728-d16e-7f42e6209349</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>20971520</memory>
  <currentMemory unit='KiB'>20971520</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>6</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='3'/>
    <vcpupin vcpu='1' cpuset='7'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='6'/>
    <vcpupin vcpu='4' cpuset='1'/>
    <vcpupin vcpu='5' cpuset='5'/>
    <emulatorpin cpuset='0,4'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.5'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/d560712f-74e7-e728-d16e-7f42e6209349_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor id='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='3' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>

 

Untitled.png

Edited by mathieuNls
Link to comment
  • 3 weeks later...

Hi, i have a 1080TI and i have hyper v On

In the OP i see:

  • Set Hyper-V to 'yes' unless you need it off for Nvidia GPUs.

Can anyone tell me why that is? I have read its been fixed with v 6.2. And can be left Enabled.

 

Except for the audio lag, everything else works fine, including gaming

I still suffer audio issues, mostly when windows detects a USB plugin or right after logging in.
I already have Enable+ on both the GPU and the GPU its audio chip GM200

 

I have an i7-6700k

cpu 0 / 4

cpu 1 / 5

cpu 2 / 6

cpu 3 / 7

 

syslinux.conf:

append isolcpus=1,2,3,5,6,7 initrd=/bzroot,/bzroot-gui

 

vm xml:
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
    <emulatorpin cpuset='0-1,4-5'/>
  </cputune>

 

I believe that I have core 1 cpu 0 / 4 for unraid and dockers. (docker names start with --cpuset=0,4) Did i understand this correct?

core2 = 1 / 5 for reserved for later virtual machines

core3 = 2 / 6 for win 10 gaming vm

core4 = 3 / 7 for win 10 gaming vm

Edited by Thomas van Dalen
Link to comment
17 hours ago, Thomas van Dalen said:

Hi, i have a 1080TI and i have hyper v On

In the OP i see:

  • Set Hyper-V to 'yes' unless you need it off for Nvidia GPUs.

Can anyone tell me why that is? I have read its been fixed with v 6.2. And can be left Enabled.

 

Except for the audio lag, everything else works fine, including gaming

I still suffer audio issues, mostly when windows detects a USB plugin or right after logging in.
I already have Enable+ on both the GPU and the GPU its audio chip GM200

 

I have an i7-6700k

cpu 0 / 4

cpu 1 / 5

cpu 2 / 6

cpu 3 / 7

 

syslinux.conf:

append isolcpus=1,2,3,5,6,7 initrd=/bzroot,/bzroot-gui

 

vm xml:
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
    <emulatorpin cpuset='0-1,4-5'/>
  </cputune>

 

I believe that I have core 1 cpu 0 / 4 for unraid and dockers. (docker names start with --cpuset=0,4) Did i understand this correct?

core2 = 1 / 5 for reserved for later virtual machines

core3 = 2 / 6 for win 10 gaming vm

core4 = 3 / 7 for win 10 gaming vm

4

 

Hi no you dont need to disable hyperv for nvidia cards now.

 

IMO dont isocpu cores on quad core servers. There are not enough cores for this. You can achieve what you want by careful pinning of containers and vms.

However you are not pinning your containers correctly. This needs to be done in the extra parameters of the docker container template.

 

You say you are leaving core 2 (1/5) for the future. Don't!!   Worry about future the VMs when you come to set them up. Not now.! :)

Anyway, will you ever run 2 vms at once or just one at a time? Get working what you have first...

That core  (1/5) is totally wasted. Because you have isocpu, three of your cores,  they can only be used when you manually pin a process to them. The host can't touch them. The only process you have put on this core is emulatorpin. That is very light. So this core is idle really.

Then unRAID and all your containers are on the first core (0/4). Your working that first core hard but giving that second core a day off ;) 

Use both these cores for your containers, unraid and emulator pin.

Now also to remember is when you have a normal bare metal windows pc and you want to play a game ( I am guessing you are a gamer from the 1080TI ! ) you wouldn't expect great performance if you were also running handbrake at the same time encoding video. You would stop that play the game and start that later.

Same with unRAID VMs and Containers. We have to sometimes start and stop them.

I mention about how to setup different profiles for containers so when gaming a container will use one core and when not it can use all cores. It was in the second video in the server tuning series that I did. May watch those for some ideas.

 

Edited by gridrunner
  • Upvote 1
Link to comment
3 hours ago, gridrunner said:

IMO dont isocpu cores on quad core servers. There are not enough cores for this. You can achieve what you want by careful pinning of containers and vms.

However you are not pinning your containers correctly. This needs to be done in the extra parameters of the docker container template.

I have seen the video and adjusted all dockers extra parameters to --cpuset-cpus=0,4

Dockers: deluge, filezilla, xeoma and zoneminder, nothing more.

with append isolcpus=1,2,3,5,6,7 initrd=/bzroot,/bzroot-gui this so unraid wont use other cores right? so that should be good right? even the op use this line in syslinux.cfg for a quad setup

 

Quote

You say you are leaving core 2 (1/5) for the future. Don't!!   Worry about future the VMs when you come to set them up. Not now.! :)

I have set that extra core 2 also to win 10, I am not sure if only core 3 and 4 are enough for ultra high setting gaming.

Capture.PNG

 

Quote

Anyway, will you ever run 2 vms at once or just one at a time? Get working what you have first...

Properly not no, but if 2 cores (core 3 and 4) are enough for windows ultra gaming settings, id like to have kali to be able to launch next to it on a single core.(core 2)

 

In Windows xml i change the line to:

<emulatorpin cpuset='0,4'/> so now it uses core 2 3 and 4.

2.PNG

 

The video was usefull

Sorry still having troubling understand the great picture. And thank you for replying before
 

Anyway, still issuing the audio lag when usb devices get attached

Edited by Thomas van Dalen
Link to comment
44 minutes ago, Thomas van Dalen said:

I have seen the video and adjusted all dockers extra parameters to --cpuset-cpus=0,4

Dockers: deluge, filezilla, xeoma and zoneminder, nothing more.

with append isolcpus=1,2,3,5,6,7 initrd=/bzroot,/bzroot-gui this so unraid wont use other cores right? so that should be good right? even the op use this line in syslinux.cfg for a quad setup

 

I have set that extra core 2 also to win 10, I am not sure if only core 3 and 4 are enough for ultra high setting gaming.

Capture.PNG

 

Properly not no, but if 2 cores (core 3 and 4) are enough for windows ultra gaming settings, id like to have kali to be able to launch next to it on a single core.(core 2)

 

In Windows xml i change the line to:

<emulatorpin cpuset='0,4'/> so now it uses core 2 3 and 4.

2.PNG

 

The video was usefull

Sorry still having troubling understand the great picture. And thank you for replying before
 

Anyway, still issuing the audio lag when usb devices get attached

 

Yes, isolcpu will isolate CPU cores from unRAID. But 'unRAID' doesn't just mean unRAID running its NAS duties. unRAID runs your Docker engine and VMs too.

So yes the cores are Isolated from unRAID this way. So if cores 2,3,4 are isolated then that means unRAID cant use them itself when running Docker containers, Nas functions, VMs etc. 

It isn't so noticable for the VMs because of how the unRAID VM manager handles creating VMs as you only have the option to pin cores.

However, when not using the template manager in KVM you can have the host handle the vcpus with the scheduler itself. You don't 'have' to pin cores. If you didn't unRAID would handle a VM in the same way it would a Docker container (that hasn't been pinned)

unRAID vm manager makes you manually pin the vCPU cores as in 99% of cases this will give the best results. 

Anyway, even when you pin the vcps with the template it only pins the vcps. As you know it doesn't pin the emulator functions. You have to do this manually in the XML.

So the problem can occur when you have isolated say 3 out of the 4 cores unRAID cant use them. So it's emulator functions will only be able to run on the non-isolated core remaining because that's all unRAID has access to. Had the cores not have been isolated unraid could have used all 4 cores to put that function where it sees best.

Normally we pin the emulator function so it stays off cores. But when you have isolated cores it stays off those anyway, so there isn't any point pinning that function, unless you want to pin it back to the isolated cores.

 

Another reason not to isolate cores is when the VM isn't running they are doing nothing. You may only be using a few docker containers now but later you may use something like plex and when it wants to transcode some streams, then one core wouldn't be enough. If all cores were free unRAID could then allow Plex to use more resources as it needs. However, if you didn't want it to, then you would pin it to only the cores you want it to use.

To be honest with server tuning there is no right way. No one size fits all. You have to mess around with various things and find whats best for you.

 

You say you have an audio lag when you plug USB devices in the server. So I assume that you have passed through a physical USB 3 controller to your VM? How do you mean an audio lag. Do you mean the sound windows makes when the device is plugged in is 'strange' then everything ok. Or when you plug in a USB device all the sound goes out of sync on the VM (such as a video playing etc) ? 

Also when you post bits of your XML. Best to just copy and paste it all in the post so people can see everything that's in the VM :) 

That way we can see if you are using your onboard motherboard sound or only your HDMI 1080ti sound and the USB pass through etc.

 

 

 

Link to comment
18 minutes ago, gridrunner said:

 

Also when you post bits of your XML. Best to just copy and paste it all in the post so people can see everything that's in the VM :) 

That way we can see if you are using your on board motherboard sound or only your HDMI 1080ti sound and the USB pass through etc.

 

Thank you for your time and explanation. My English is not great, and I am reading over and over to try to understand more... I am still on trial method, but yes when i got my last hiccups gone I am planning to buy the software.

Please tell me if i have to remove the  isolcpus=1,2,3,5,6,7 initrd=/bzroot,/bzroot-gui or not. The audio lag is another issue and i do not feel like i have to put that in this post so i made a new post here: 

 

Link to comment
1 minute ago, Thomas van Dalen said:

 

Thank you for your time and explanation. My English is not great, and I am reading over and over to try to understand more... I am still on trial method, but yes when i got my last hiccups gone I am planning to buy the software.

Please tell me if i have to remove the  isolcpus=1,2,3,5,6,7 initrd=/bzroot,/bzroot-gui or not. The audio lag is another issue and i do not feel like i have to put that in this post so i made a new post here: 

 

6

Yes I would remove the  isolcpu,  so

append  isolcpus=1,2,3,5,6,7 initrd=/bzroot 

putting it back to 

append  initrd=/bzroot 

 

  • Upvote 1
Link to comment
5 hours ago, gridrunner said:

Yes I would remove the  isolcpu,  so


append  isolcpus=1,2,3,5,6,7 initrd=/bzroot 

putting it back to 


append  initrd=/bzroot 

 

 

I have done that, should i keep the 

<emulatorpin cpuset='0,4'/> on the windows vm? and --cpuset-cpus=0,4 in dockers? just to start with, or leave those fiels blank and only select the cvpus from the template? agian TY

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.