CPU Pinning question - strange behaviour with dockers/VM


Recommended Posts

Hi all,

 

So I've been trying really hard to get my main gaming VM running in tandem with my plex docker/transcoder but I'm getting  stuck.

 

Some info on the configuration:

 

SysLinux Config added this line:

isolcpus=12-15,28-31

 

VM XML added this:

    <vcpupin vcpu='0' cpuset='12'/>
    <vcpupin vcpu='1' cpuset='13'/>
    <vcpupin vcpu='2' cpuset='14'/>
    <vcpupin vcpu='3' cpuset='15'/>
    <vcpupin vcpu='4' cpuset='28'/>
    <vcpupin vcpu='5' cpuset='29'/>
    <vcpupin vcpu='6' cpuset='30'/>
    <vcpupin vcpu='7' cpuset='31'/>
    <emulatorpin cpuset='0,16'/>

 

Plex Docker is left at defaults so it uses all other cores available. Now when there are plex transcodes occuring, I immediately see FPS drops/lag on the VM. As soon as I stop the transcodes, solid 70FPS and VM is working beautifully.**

 

**EDIT: Actually - not working perfectly. Noticing lipsync issues when watching youtube videos so something isn't right...

 

Thought this might be down to Plex using the cores the emulator pin is using so set Plex docker extra parameter:

--cpuset-cpus=1-11,17-27

 

This made no difference. As soon as there's any transcoding activity, my VM is affected.

 

Any ideas what I'm doing wrong? I'll also attach the diagnostics output.

 

 

 

tower-diagnostics-20170121-1521.zip

Link to comment

If you are using HDMI sound from your video card like I am, you may need to locate your video card within your registry and turn on MSI from 0 to 1. In my screenshot red box will be the path to my video card, orange is my video card, and green is the MSI I am setting from 0 to 1. That fixed my distorted sound\video when playing games or watching videos.

Capture.PNG.b9d78095ff872699d202a7ae59bdb69c.PNG

Link to comment
Hi all,

 

So I've been trying really hard to get my main gaming VM running in tandem with my plex docker/transcoder but I'm getting  stuck.

 

Some info on the configuration:

 

SysLinux Config added this line:

Code: [select]

isolcpus=12-15,28-31

 

VM XML added this:

Code: [select]

    <vcpupin vcpu='0' cpuset='12'/>

    <vcpupin vcpu='1' cpuset='13'/>

    <vcpupin vcpu='2' cpuset='14'/>

    <vcpupin vcpu='3' cpuset='15'/>

    <vcpupin vcpu='4' cpuset='28'/>

    <vcpupin vcpu='5' cpuset='29'/>

    <vcpupin vcpu='6' cpuset='30'/>

    <vcpupin vcpu='7' cpuset='31'/>

    <emulatorpin cpuset='0,16'/>

 

Will you get a screenshot of how UNRAID sees all your cores, something does not look right in your VM template. Here is my screenshot of what I am looking for when you have time.

Capture.PNG.36c0847efe149ea38cf3abb80e5b7c59.PNG

Link to comment

try putting your vm on cores 8-15, and make sure your topology in the xml don't try to make them hyper threaded pairs, but rather 8 cores. For some reason on my dual processor system, it doesn't like ht pairs, but rather prefers cores in straight numbers. It's counter to the advice in the cpu pinning thread, but experimentation is often better across disparate hardware.

 

Link to comment

See below my VM XML in full:

 

<domain type='kvm' id='2'>
  <name>Windows 10</name>
  <uuid>4b6733a6-c5f8-50b1-416e-2844a5019f17</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='12'/>
    <vcpupin vcpu='1' cpuset='13'/>
    <vcpupin vcpu='2' cpuset='14'/>
    <vcpupin vcpu='3' cpuset='15'/>
    <vcpupin vcpu='4' cpuset='28'/>
    <vcpupin vcpu='5' cpuset='29'/>
    <vcpupin vcpu='6' cpuset='30'/>
    <vcpupin vcpu='7' cpuset='31'/>
    <emulatorpin cpuset='0,16'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.5'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/4b6733a6-c5f8-50b1-416e-2844a5019f17_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor id='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/disk/by-id/ata-Samsung_SSD_850_EVO_250GB_S21PNXAG664859P'/>
      <backingStore/>
      <target dev='hdc' bus='sata'/>
      <boot order='1'/>
      <alias name='sata0-0-2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <controller type='usb' index='0' model='nec-xhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='sata0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:17:3a:15'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-Windows 10/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x81' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </memballoon>
  </devices>
</domain>

 

And also see below CPU usage with PC on and three transcodes running in tandem. I feel like it should be using way less processing power considering these are 2x Xeon e5-2670's...

 

Thanks in advance!

CPU_load_3transcodes.JPG.b732e64dd087b3f265a2057b072df1aa.JPG

Link to comment

Looking at your screenshot and going off how I do my pinning but I don't have as many core's or dual CPU but I would try something like this and look at the screenshot I attached

 

isolcpus=2,3,4,5,6,7,10,11,12,13,14,15,18,19,20,21,22,23,26,27,28,29,30,31

 

Red = CPU for UNRAID

 

VM1 = Green  and add this to the VM template  <emulatorpin cpuset='0,1,16,17'/>

VM2 = Blue  and add this to the VM template  <emulatorpin cpuset='0,1,16,17'/>

VM3 = Orange and add this to the VM template  <emulatorpin cpuset='8,9,24,25'/>

VM4 = Yellow and add this to the VM template  <emulatorpin cpuset='8,9,24,25'/>

 

Also remember to enable the MSI for the audio in Windows if using GPU audio if you are still experiencing sounds and video play back issue. I have to enable mine to get it to work well and should be a registry edit.

 

Capture.PNG.3a8e13ed415a520752aef7297cdf1d4e.PNG

Link to comment

Ok so an update on where I am currently:

 

Suggestion from Squid -

 

Installed cAdvisor but have no idea how to use it. Checked support thread but nothing was there  I could see. Any link to instructions?

 

Suggestions from Darkun -

 

MSI/HDMI issue probably isn't relevant as I'm passing through a seperate sound card and using DVI output on the GPU

 

Suggestions by 1812 -

 

Changing core allocation to 8-15 and adjusting topology to reflect using physical cores and no HT made performance worse and maxed out CPU usage on simple tasks so reverted back.

 

I've also pinned every docker to cores not being used by my VM so my CPU core allocation is as follows:

 

cpu 0 / 16 - Emulator Pin / Unraid

cpu 1 / 17 - Unraid

cpu 2 / 18 - All dockers

cpu 3 / 19 - All dockers

cpu 4 / 20 - All dockers

cpu 5 / 21 - All dockers

cpu 6 / 22 - All dockers

cpu 7 / 23 - All dockers

cpu 8 / 24 - All dockers

cpu 9 / 25 - All dockers

cpu 10 / 26 - All dockers

cpu 11 / 27 - All dockers

cpu 12 / 28 - Win10VM

cpu 13 / 29 - Win10VM

cpu 14 / 30 - Win10VM

cpu 15 / 31 - Win10VM

 

And also of course added isolcpus=12-15,28-31 to syslinux file.

 

Now the main two problem behaviours I'm still seeing:

 

Youtube lipsync goes way out around 2-3 minutes into any video

Games run at a stable 60+FPS BUT as soon as any transcoding occurs, I experience immediate performance issues and FPS impact (can drop to 40/stall/lag etc).

 

Now this absolutely shouldn't be happening since I've isolated cores 12-15 and 28-31. Also dockers shouldn't be interfering with the emulator pin as they're all set to --cpuset-cpus=2-11,18-27.

 

Also completely rebuilt VM from scratch to eliminate OS issue. Also tried both versions of virtio drivers (1.2.6/1.3.0) and installed QEMU agent.

 

So unfortunately not made any progress really...if someone could guide me in the right direction to see why dockers/unraid processes are interfering with the VM vcpu's that would be great! cadvisor I guess could help but again, not sure how to use it!

 

Thanks!

Link to comment

ok.... grasping at straws but:

 

in my logs my pcpu-alloc on a dual 6 core processor board looks like this:

pcpu-alloc: s91480 r8192 d31400 u131072 alloc=1*2097152
Jan 22 08:46:11 Brahms1 kernel: pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 
Jan 22 08:46:11 Brahms1 kernel: pcpu-alloc: [0] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Jan 22 08:46:11 Brahms1 kernel: Built 1 zonelists in Node order, mobility grouping on.  Total pages: 18576862

 

 

 

yours shows:

 

Jan 21 14:33:28 Tower kernel: pcpu-alloc: s91480 r8192 d31400 u131072 alloc=1*2097152
Jan 21 14:33:28 Tower kernel: pcpu-alloc: [0] 00 01 02 03 04 05 06 07 16 17 18 19 20 21 22 23 
Jan 21 14:33:28 Tower kernel: pcpu-alloc: [1] 08 09 10 11 12 13 14 15 24 25 26 27 28 29 30 31 
Jan 21 14:33:28 Tower kernel: Built 2 zonelists in Node order, mobility grouping on.  Total pages: 8248817

 

 

Is it a real problem? I don't know. But, how is it managing cpu cores that you've isolated? Is it by the dashboard or by this set of numbers, and which are sharing a HT?  (also, maybe mine is the one that is messed up, we'd need more info from others with dual processors to verify)

 

----edit

 

to make it more interesting, here is my dual quad core machine:

Jan 12 17:54:45 Brahms3 kernel: PERCPU: Embedded 32 pages/cpu @ffff880533c00000 s91480 r8192 d31400 u131072
Jan 12 17:54:45 Brahms3 kernel: pcpu-alloc: s91480 r8192 d31400 u131072 alloc=1*2097152
Jan 12 17:54:45 Brahms3 kernel: pcpu-alloc: [0] 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30 
Jan 12 17:54:45 Brahms3 kernel: pcpu-alloc: [1] 01 03 05 07 09 11 13 15 17 19 21 23 25 27 29 31 
Jan 12 17:54:45 Brahms3 kernel: Built 2 zonelists in Node order, mobility grouping on.  Total pages: 10319326

 

so maybe this has no bearing. in fact, just ignore this unless someone wants to come along and explain to you and me both....

 

------------

 

 

 

I would also try changing

 

isolcpus=12-15,28-31

 

in your syslinux to actually typing out the numbers like

 

isolcpus=4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 initrd=/bzroot

 

(that's mine as an example) I've had wonky things happen when i've used shortened 4-23. I don't know why, but, it shouldn't hurt. I think the suggestion was made earlier? If you try this, post a full copy of your syslinux.cfg please.

 

 

also this error exists in your logs;

 

an 21 14:33:28 Tower kernel: x86: Booting SMP configuration:
Jan 21 14:33:28 Tower kernel: .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7
Jan 21 14:33:28 Tower kernel: .... node  #1, CPUs:    #8
Jan 21 14:33:28 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jan 21 14:33:28 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jan 21 14:33:28 Tower kernel: CMCI storm detected: switching to poll mode
Jan 21 14:33:28 Tower kernel:  #9 #10 #11 #12 #13 #14 #15
Jan 21 14:33:28 Tower kernel: .... node  #0, CPUs:   #16 #17 #18 #19 #20 #21 #22 #23
Jan 21 14:33:28 Tower kernel: .... node  #1, CPUs:   #24 #25 #26 #27 #28 #29 #30 #31
Jan 21 14:33:28 Tower kernel: x86: Booted up 2 nodes, 32 CPUs

 

Errors that probably need investigating. It ilsts some of the cores you're putting your windows vm on. Coincidence? A quick test might be to flip your pinning around and put plex/dockers on these cores and run the vm on unaffected/not listed ones.

 

If you run a search on the board for mce: [Hardware Error] there are a couple threads about this.

 

 

you all have the following on every cpu:

 

Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:00 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:01 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:02 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:03 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:04 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:05 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:06 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:07 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:08 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:09 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:0a is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:0b is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:0c is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:0d is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:0e is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:0f is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:1e is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:1f is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:20 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:21 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:22 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:23 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:24 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:25 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:26 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:27 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:28 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:29 is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:2a is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:2b is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:2c is not defined
Jan 21 14:52:44 Tower root: ACPI group processor / action LNXCPU:2d is not defined

 

Might be just something relevant to your processor/board. More info here: https://lime-technology.com/forum/index.php?topic=53037.0

 

 

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.