Performance Improvements in VMs by adjusting CPU pinning and assignment


Recommended Posts

I' still seeing some fluctuations in my win7 VM when dockers are used. I've set my VM to use cores 7 and 15 (both isolated), set the emulator pin to 0,1,8,9 and set all dockers to use all of the cores that are left over.

 

Generally the vm is rock solid, I can use Emby docker and watch transcoded video for instance without it affecting the VM but if I do something such as update or restart a docker, you will see a spike of latency on the vm.

 

Not sure what I can do to address this? I'm assuming it's because the docker engine is using some cpu time from 0,1,8,9 to do its upgrade/restart.

Link to comment

you don't need that many cores assigned to emulator pin. It's not going to hurt but really, not needed. I run a 23 core vm that has 100% utilization on 1 emulator pin.

 

post your diagnostics so we can see how things are actually setup.

 

are you using a spinning disk for your cache drive? if so, is your vm stored there with your dockers? (maybe you answered this, but I haven't followed this thread in a few weeks) because that would account for your latency if a docker and a vm are trying to access a spinning disk at the same time.

Link to comment

I'm running my VM on its own SSD via the unassigned drives plugin. Docker appdata is on my cache drives.

 

My CPU layout is:

 

cpu 0 / 8
cpu 1 / 9
cpu 2 / 10
cpu 3 / 11
cpu 4 / 12
cpu 5 / 13
cpu 6 / 14
cpu 7 / 15

 

<domain type='kvm'>
  <name>Windows 7</name>
  <uuid>fc50096a-8ce6-42f8-cb09-6ea353933498</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 7" icon="windows7.png" os="windows7"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='7'/>
    <vcpupin vcpu='1' cpuset='15'/>
    <emulatorpin cpuset='1,9'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.7'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='1' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disks/Samsung_SSD_850_PRO_512GB_S250NX0H420722F/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:d2:8b:a3'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046e'/>
        <product id='0x5577'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
</domain>

 

Link to comment

 

 

also, makes sense that if a docker is using 100% cpu utilization on a core, and that core is running your vm emulator functions, it's going to cause problems. if you wanted to eliminate that you can pin dockers to specific cores only. if you search the board you can find out how. (It might actually exist in this thread.

Link to comment
Just now, 1812 said:

 

 

also, makes sense that if a docker is using 100% cpu utilization on a core, and that core is running your vm emulator functions, it's going to cause problems. if you wanted to eliminate that you can pin dockers to specific cores only. if you search the board you can find out how. (It might actually exist in this thread.

Yes I've done that, all of my dockers are pinned to cores that exclude 0,8 (emulatorpin) and 7,15 (isolated in boot and assigned to vm).

Link to comment
6 minutes ago, allanp81 said:

Yes I've done that, all of my dockers are pinned to cores that exclude 0,8 (emulatorpin) and 7,15 (isolated in boot and assigned to vm).

 

 

I don't pin dockers, so I don't know if this is your problem or not. Someone else who knows will have to answer.

 

time="2017-02-26T12:52:59.139060211Z" level=error msg="Handler for POST /images/create returned error: Error parsing reference: \"--cpuset-cpus=0,2,3,4,5,6,8,10,11,12,13,14 linuxserver/plex:latest\" is not a valid repository/tag" 

 

I'll try to have another look later...

Link to comment
5 minutes ago, 1812 said:

 

 

I don't pin dockers, so I don't know if this is your problem or not. Someone else who knows will have to answer.

 


time="2017-02-26T12:52:59.139060211Z" level=error msg="Handler for POST /images/create returned error: Error parsing reference: \"--cpuset-cpus=0,2,3,4,5,6,8,10,11,12,13,14 linuxserver/plex:latest\" is not a valid repository/tag" 

 

I'll try to have another look later...

That in theory can be ignored, the time stamp is from a few days ago and I've since removed plex docker as I prefer emby.

Link to comment
On 2/28/2017 at 9:00 AM, allanp81 said:

I'm running my VM on its own SSD via the unassigned drives plugin. Docker appdata is on my cache drives.

 

My CPU layout is:

 

cpu 0 / 8
cpu 1 / 9
cpu 2 / 10
cpu 3 / 11
cpu 4 / 12
cpu 5 / 13
cpu 6 / 14
cpu 7 / 15

 


<domain type='kvm'>
  <name>Windows 7</name>
  <uuid>fc50096a-8ce6-42f8-cb09-6ea353933498</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 7" icon="windows7.png" os="windows7"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='7'/>
    <vcpupin vcpu='1' cpuset='15'/>
    <emulatorpin cpuset='1,9'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.7'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='1' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disks/Samsung_SSD_850_PRO_512GB_S250NX0H420722F/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:d2:8b:a3'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046e'/>
        <product id='0x5577'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
</domain>

 

 

It would be better to emulatorpin 0/8.  Linux prefers the lower numbered cpu.  I'm not sure what your VM is doing, but assigning CPUs 6/14 may help.  This gives the VM two cores and 4 threads.

Link to comment
22 minutes ago, dlandon said:

 

It would be better to emulatorpin 0/8.  Linux prefers the lower numbered cpu.  I'm not sure what your VM is doing, but assigning CPUs 6/14 may help.  This gives the VM two cores and 4 threads.

I assigned emulatorpin to 6 and 14 with the win7 VM on 7 and 15 and that worked great. All 4 of those threads are isolated in my boot config.

 

I then experimented and assigned emulatorpin to 6 and then 7,14 and 15 to the VM. This again seems to work great, no latency.

 

The basic outcome of this, is that for me personally then only way I can get a VM with no latency issues is to pin the emulatorpin to isolated cores. If I assign to anything else it's generally ok but can be affected by other tasks such as things running under dockers, even if those dockers are set to not use the cores I've allocated to the VM.

Link to comment

I've noticed another oddity that may be just the way it's reported rather than an actual issue:

 

When copying files within the VM locally or across the network I see sudden huges spikes in latency that stop the minute copying has finished. I've tried switching controllers from virtio to SATA, reverting to older virtio drivers and even switching between q35 and i440fx.

 

It's strange, if I have a youtube video playing at the same time it doesn't seem to cause any issues so wonder what causes Windows to think it has massive latency spikes.

 

*EDIT* Strangely it appears that swapping out the AMD graphics card (HD 5450) for another cheap Nvidia card (GT710) seems to have solved this. How odd!

Edited by allanp81
Link to comment
On 3/1/2017 at 10:27 AM, allanp81 said:

The basic outcome of this, is that for me personally then only way I can get a VM with no latency issues is to pin the emulatorpin to isolated cores.

 

This confuses me :S. My understanding was that isolcpus is to tell the unRaid OS "Hands off these cpus. They're reserved for someone else". Then emulatorpin is to put the emulation task on other cpus and leave the virtual cpus (that have been isolated from unRaid) free to take care of the vm.

 

Now, your statement above tells me that you are isolating a set of cores from the unRaid OS, then telling the VM to have the emulation task put on those cores that the unRaid OS is not supposed to be able to get at... Am I interpreting that right?

 

Not trying to be a jerk :D. I'm just always looking at improving things and would love to understand the logic behind this!

 

Edited by DoeBoye
Link to comment

latencymon: http://www.resplendence.com/latencymon

dpc latency checker: http://www.thesycon.de/eng/latency_check.shtml

 

I've done a bit of experimenting again today. If I assign the emulatorpin to a core that is shared with the rest of unraid/dockers then I get latency spikes all over the place. If I assign the emulator pin to an isolated core then for the most part it's fine, green all of the way and no issues with youtube or playing games. I still have issues trying to pass through a DVB-T2 tuner card though but that might be an issue with that particular card or drivers so I might order a different make card at some point and try that instead.

Link to comment

Hey,

 

I'm using i3-6100 cpu on my unraid and passing it thru to win10 VM.

I have to following cpus: (0,2),(1,3)

 

should I isolate cpu (1,3) ?

should I also write "<emulatorpin cpuset='0,2'/> ?

does isolating cpu (1,3) will decrease the unraid performance? 

would appreciate any information.

 

Link to comment
51 minutes ago, amstel said:

should I isolate cpu (1,3) ?

 

if you want to run your vm on those threads and want to increase performance and decrease latency, yes.

 

52 minutes ago, amstel said:

should I also write "<emulatorpin cpuset='0,2'/> ?

 

you only need  1 cpu for emulator pin, use just cpu 2

 

53 minutes ago, amstel said:

does isolating cpu (1,3) will decrease the unraid performance?

 

Yes and no. Yes if you are doing tasks which require more cpu power, like transcoding via plex docker or heavy downloading/etc... No if you're only running a vm and network file transfers since you need less cpu resources for unRaid.

Link to comment
22 minutes ago, 1812 said:

 

if you want to run your vm on those threads and want to increase performance and decrease latency, yes.

 

 

you only need  1 cpu for emulator pin, use just cpu 2

 

 

Yes and no. Yes if you are doing tasks which require more cpu power, like transcoding via plex docker or heavy downloading/etc... No if you're only running a vm and network file transfers since you need less cpu resources for unRaid.

 

Thanks for the reply.

 

I'm using the unraid (beside for the VM) for:

TVHEADEND: record and stream DVB-T TV.

Transmission: download/upload files 24/7.

SMB: sharing local files for home TV streaming.

quassel-core: connected to irc server.

 

I guess that it should work well, is it?

 

 

another question,

can the unraid server use those isolated cpus while the VM is not running?

 

Thanks.

Link to comment
5 minutes ago, amstel said:

I'm using the unraid (beside for the VM) for:

TVHEADEND: record and stream DVB-T TV.

Transmission: download/upload files 24/7.

SMB: sharing local files for home TV streaming.

quassel-core: connected to irc server.

 

I don't use those, so you'll have to try and see.

 

6 minutes ago, amstel said:

another question,

can the unraid server use those isolated cpus while the VM is not running?

 

nope. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.