Isolating CPU Not Working


Recommended Posts

Good Evening,

 

I've followed @gridrunner excellent guide in maximizing performance in both unRaid and in a host VM.  However, I'm seeing that Plex is not respecting the fact that the CPU's are isolated, and is often using them when transcoding... this bring my VM to a screeching halt.  It was my understanding that isolating the CPU's made it so that nothing could use them, with the exception of any VM you assigned them too... am I mistaken?

 

Please see my core assignments and my VM.xml.  Thanks in advance.

 

~Spritz

 

CPU Thread Pairings

Pair 1:	cpu 0 / cpu 16
Pair 2:	cpu 1 / cpu 17
Pair 3:	cpu 2 / cpu 18
Pair 4:	cpu 3 / cpu 19
Pair 5:	cpu 4 / cpu 20
Pair 6:	cpu 5 / cpu 21
Pair 7:	cpu 6 / cpu 22
Pair 8:	cpu 7 / cpu 23
Pair 9:	cpu 8 / cpu 24
Pair 10:	cpu 9 / cpu 25
Pair 11:	cpu 10 / cpu 26
Pair 12:	cpu 11 / cpu 27
Pair 13:	cpu 12 / cpu 28
Pair 14:	cpu 13 / cpu 29
Pair 15:	cpu 14 / cpu 30
Pair 16:	cpu 15 / cpu 31

 

<domain type='kvm'>
  <name>Brawn</name>
  <uuid>aa4f920a-0dfe-d619-f00b-46c900a1055c</uuid>
  <description>Gaming PC</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='17'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='18'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='19'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='20'/>
    <emulatorpin cpuset='15,31'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.10'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/aa4f920a-0dfe-d619-f00b-46c900a1055c_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/mnt/disks/Brawn_SSD_1/Brawn/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/Data/OS_ISOs/Windows_10.iso'/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/Data/OS_ISOs/virtio-win-0.1.141-1.iso'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:40:f9:bb'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Oh, and my syslinux config -->

default menu.c32
menu title Lime Technology, Inc.
prompt 0
timeout 50
label unRAID OS
  menu default
  kernel /bzimage
  append isolcpus=1,2,3,4,17,18,19,20 vfio-pci.ids=1b6f:7052 initrd=/bzroot

 

Edited by Spritzup
Add syslinux
Link to comment
15 minutes ago, 1812 said:

 

and you're running plex in a docker? a bit odd

 

without addressing the problem, you could specify cpu pinning in docker until the real solution is found: https://www.reddit.com/r/unRAID/comments/6hhvh5/cpu_pinning_to_specific_dockers/

 

Yup, using the Linux server.io container.

 

Thanks, for the suggestion, I had thought of doing the same thing as well.

 

~Spritz~

Link to comment

So I've pinned both the Plex and NZBGet container to specific CPU's, and that seems to have put a bandaid on the issue.  However, unRaid (and I assume Docker) is still using those supposedly isolated cores for other actions, as even with the VM powered off those cores are seeing some activity.  For the moment I can live with that, as whatever is hitting it is not a heavy hitter.

 

All that said though, I'd like to try and figure out why this isn't functioning as expected.  When I run the command (which escapes me at the moment) to verify that the CPU's are isolated, it returns the expected result.  I can also see the system parsing the isolated CPU line during boot, without error.  Yet when I look at cAdvisor (and I don't know if this is accurate or not), it shows all CPU's are available for the containers to use.

 

I'm kind of at a loss on this one.  Any assistance would be appreciated.

 

Thanks!

 

~Spritz

Link to comment
31 minutes ago, Spritzup said:

So I've pinned both the Plex and NZBGet container to specific CPU's, and that seems to have put a bandaid on the issue.  However, unRaid (and I assume Docker) is still using those supposedly isolated cores for other actions,

With a container, pinning the app to a specific core does not mean that the core is for the container's exclusive use.  It only means that the container is limited to running on that core.  Everything else in the system still has access to that core.

 

33 minutes ago, Spritzup said:

Yet when I look at cAdvisor (and I don't know if this is accurate or not),

It is

 

You really want to peruse the Docker FAQ.  Specifically this on how to fine tune cpu pinning for docker applications

Link to comment

@Squid Thanks for the reply.  Unfortunately I think their is some confusion.  The issue is that Docker (and I assume by extension unRaid) is not respecting the "isolcpus" command in my syslinux file.  What should have been happening is that 8 cores would be isolated on for VM use, and everything else would run on the remaining 24.

 

However, that did not appear to be happening, as I could observer both NZBGet and Plex using the supposedly isolated CPU's thus bring my VM to a screeching halt.  As a bandaid, I've pinned CPU's for specific container use, but this is not ideal IMO.

 

So TLDR - The "isolated cores" in this case is those isolated for a VM using the "isolcpus" command.  The docker cpu pinning is a bandaid, but is working as expected.

 

~Spritz

Link to comment
2 minutes ago, Squid said:

I was too lazy to read the entire thread.  Only read the last couple posts.

 

haha, I've had days like that as well.  If you have any insight, I'd appreciate it.  I did read the previous link that you provided, and it was an interesting read, thanks for that :)

 

~Spritz

Link to comment

I don't run my system with any isolated cores, and as I said in the post, I'm not sure exactly how docker pinning works in conjunction with cpu pinning.  (IE: do the cores get renumbered, or what)  Been waiting for someone to really experiment and figure that one out.

Link to comment
  • 4 months later...

I am also seeing this situation. 

 

whats the best way/command to see who is using the CPU? 

I assume there are commands like HTOP / TOP that can show the process name, but I wish to know on the higher level 

like the name of the docker or the VM.

 

my configuration of the 32 cores I have  was to dedicated some to VMs specifically and others to unraid and dockers.

 

append intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1  isolcpus=2-5,8-14,18-21,24-30 vfio-pci.ids=8086:244e,1033:0194,1b73:1100 modprobe.blacklist=i2c_i801,i2c_smbus initrd=/bzroot

 

I have this line in all my dockers in EXTRA Parameter field.

--cpuset-cpus=0,1,6,7,16,17,22,23 --log-opt max-size=50m --log-opt max-file=1

 

Link to comment

Netdata does report all that under the Applications section.  IIRC, there may have to be a slight change to the template to have it report the name of the app instead of it's docker ID.  Check the support thread.  Or for a simpler view, then the cAdvisor app.

Edited by Squid
Link to comment
  • 2 months later...

bumping this because I see this cpu usage "leakage" happening as well.

 

it appears some dockers respect the

isolcpus

and others sort of do...

 

for example:

 

I have 12 threads, isolating 1-6,7-11, leaving 0,6.

 

netdata has has time on cpu 7 even though it shouldn't, and shows access to more cores than just 0,6

 

1065512207_ScreenShot2018-09-20at3_18_27PM.thumb.png.2dc444bae780a920408aaa6b5fbebe41.png

 

 

zero tier only has 0,6

 

1758004383_ScreenShot2018-09-20at3_18_42PM.png.f9db695f95b979744b375bb6e82d6a64.png

 

next cloud, marinadb, lets encrypt, and duck dns all also only report usage on 0,6.

 

 

but cloudberry has also snuck time in on cpu7, and also shows usage on 2 and 5.

 

387497316_ScreenShot2018-09-20at3_16_13PM.png.76afa5d160625bda0f2161aa841c2cdf.png

 

 

there is no cpu pinning specified for any of these dockers. so clearly, something is allowing them to move outside of the isolated cpu set.

 

 

 

to verify it isn't just a problem with 1 server, I checked another, and sure enough, cpu3 which is isolated in the syslinux.cfg is getting used by net data again

1329331312_ScreenShot2018-09-20at3_28_09PM.thumb.png.b14403a37700943d660fb29cea7d66a8.png

 

 

I have plex on this server, but it appears to only have access to the cores it should.

 

 

 

now, these amounts of usage are not enough to impact my performance/usability, but so support the idea that some dockers seem to be finding their way onto other threads.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.