FPS drops, stuttering, and other things that make me sad


Recommended Posts

I know i mentioned this a bit ago, but this really needs to be addressed at an unraid OS level to distinguish CPU core\thread pears...

 

The best case scenario would be that these are determined on boot and then are represented correctly in the 'Create VM' element of the GUI.

if that would be too much, just some documentation on the WIKI to determine the pairs so we can pin them correctly would be enough.

 

@JonP... is this worth worrying about? has it been addressed in 6.2?

Not for 6.2, no.

 

Ok, so what would you suggest in the short term?

Use the CPU latencies script to identify thread pairs if you are having trouble or disable hyper threading to eliminate the need. 

Link to comment
  • Replies 119
  • Created
  • Last Reply

Top Posters In This Topic

Use the CPU latencies script to identify thread pairs if you are having trouble or disable hyper threading to eliminate the need.

 

Any input on those who have tried it, with results that don't help to distinguish this?

See here http://lime-technology.com/forum/index.php?topic=46664.msg450726#msg450726

Tried multiple times same results, that one was done at console with nothing else loaded (Docker/VM's), and no plugins (with exception to the ones needed to run this).

Link to comment

First off, you ignore the 10s. Of course the latency from a logical CPU to itself will be good. That's like pinging localhost. It's not a thread pair to itself.

 

Unfortunately your results don't clearly show which other logical CPU is paired to each. This is a challenge.

Link to comment

First off, you ignore the 10s. Of course the latency from a logical CPU to itself will be good. That's like pinging localhost. It's not a thread pair to itself.

 

Unfortunately your results don't clearly show which other logical CPU is paired to each. This is a challenge.

 

I understand X and Y coordinates here..  ;D

Yep, that was my thoughts also... Well, you confirmed my thoughts, will plan to post in the VFIO group to get some additional eyes and thoughts on it!

 

 

Link to comment

I appear to be having similar issues. I can play portal 2 just fine, but as soon as I try and run a game that is super CPU intensive either of my VM's will have large amounts of game lag or stuttering even though my FPS count stays very high throughout. I opened a ticket with limetech and they are looking into it. Have included what I sent to them below. Have tried battlefront on both pc's and am usually running >60 fps on lesser machine and >100fps on my personal VM.

 

 

Lag Example:

 

 

Case Labs Magnum M8 with pedestal

ASUS Rampage Extreme V X99 2011-3

Intel Xeon 2697 V3 14 Core / 28 Thread CPU

32 GB Kingston 2133 mhz ECC DDR4

1 x EVGA 980 Ti Classified, 1x ASUS Strix GTX 970, AMD 240 (for vga output)

NZXT Kraken x60 280mm Rad

1000 W EVGA Supernova G2

SSD: 3 X Crucial 500 GB SSD's, 2 x Samsung 250 GB Evo SSD's, 1 x 500 GB Mushkin SSD

HDD: 1 x 3 TB Toshiba (parity disk) , 1 X 2 TB Seagate

coletower-diagnostics-20160303-0745.zip

coletower-syslog-20160303-2338.zip

Devices.txt

Link to comment

What will this tell me ?? ;-)

 

root@Tower:/boot# ./cpu-latencies.sh
  |  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
--+------------------------------------------------
0| 10 11 12 12 12 12 12 12  9 12 12 12 12 12 12 12
1| 10 10 10 11 10 10 10 10 10  8 10 10 10 10 10 10
2| 10  9 10 10  9  9  9  9  8  9  6  7  9  9  9  9
3|  9  9 10 10 10 10 10 10  9 10 10  7 10 10  9  9
4|  9  9  9 10 10 10 10  9  9  9 10  9  7 10 10  9
5|  8  9  9 10  9 10  9  9  9  9  9  9  9  7  9  9
6| 10  9  9 10 10 10 10  9  9  9 10  9 10  9  7  9
7|  9  9  9  9  9  9  9 10  9  9 10  9  9  9  9  7
8|  7 11  9 11 11 11 10 11 10 11 10 10 11 11 11 12
9|  9  7  9  9  9 10 10  9 10 10 10 10 10 10 10  9
10|  9 10  7 10 10 10 10 10  9 10 10  9 10 10 10 10
11| 10 10  9  7 10  9  9  9  9 10 10 10 10  9 10 10
12|  9  9 10  9  6  9  9  9  8  8  8  8 10  9  7  7
13| 11 10 11 11 10  8 11 11 10  9 10 10 11 10 10 10
14| 10  8  9 10  9  9  7  9 10  9  9  9  9  9 10  9
15|  9  9  9  9  9 10  9  6  9  9  8  9  9  8  9 10

Link to comment

I was doing some Googling on the issue of KVM Latency and a number of places recommend HT off. Its only a marginal boost HT, so I can live without that

 

I also see elsewhere JonP is saying next unRAID will have ISOLCPUs toggle in webUI - as that is supposed to help

 

https://lime-technology.com/forum/index.php?topic=43126.msg418893#msg418893

 

So maybe a combination of those and the script?

Link to comment

Getting more info here, but still have some questions.

Was advised to do the following:

 

Run "virsh capabilities" and look for <cpus> tag. It lists core siblings. Its what you are looking for right? For me this script also is little inconclusive, however seems like libvirt lists siblings right as i cant pin one core to two physical cores that are not siblings.

 

Which output the following (relevant information only):

    <topology>
      <cells num='1'>
        <cell id='0'>
          <memory unit='KiB'>32839588</memory>
          <cpus num='12'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
            <cpu id='1' socket_id='0' core_id='0' siblings='1'/>
            <cpu id='2' socket_id='0' core_id='1' siblings='2'/>
            <cpu id='3' socket_id='0' core_id='1' siblings='3'/>
            <cpu id='4' socket_id='0' core_id='2' siblings='4'/>
            <cpu id='5' socket_id='0' core_id='2' siblings='5'/>
            <cpu id='6' socket_id='0' core_id='3' siblings='6'/>
            <cpu id='7' socket_id='0' core_id='3' siblings='7'/>
            <cpu id='8' socket_id='0' core_id='4' siblings='8'/>
            <cpu id='9' socket_id='0' core_id='4' siblings='9'/>
            <cpu id='10' socket_id='0' core_id='5' siblings='10'/>
            <cpu id='11' socket_id='0' core_id='5' siblings='11'/>
          </cpus>
        </cell>
      </cells>
    </topology>

 

Now the siblings value = the cpu id value, so I didn't get the expected output as what was explained above.

However the core_id value shows that 0-1, 2-3, 4-5 .... are the correct pairings.

Checking to make sure that's right, and why my output is different than the cpu id 0 = siblings 1 (to align with what was explained).

 

Edit: It is supposed to list as explained, but for whatever reason mine does not.

Talk of its creation here:http://libvir-list.redhat.narkive.com/WBIG9szS/libvirt-add-hyperthreaded-sibling-info-to-virsh-capabilities

Link to comment

This is my output. By this I guess my core and sibling are mapped 1:1 so that doesn't help me in the slightest. The only thing to try next is to disable HT and/or wait for 6.2

 

          <memory unit='KiB'>32908952</memory>

          <cpus num='20'>

            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>

            <cpu id='1' socket_id='0' core_id='0' siblings='1'/>

            <cpu id='2' socket_id='0' core_id='1' siblings='2'/>

            <cpu id='3' socket_id='0' core_id='1' siblings='3'/>

            <cpu id='4' socket_id='0' core_id='2' siblings='4'/>

            <cpu id='5' socket_id='0' core_id='2' siblings='5'/>

            <cpu id='6' socket_id='0' core_id='3' siblings='6'/>

            <cpu id='7' socket_id='0' core_id='3' siblings='7'/>

            <cpu id='8' socket_id='0' core_id='4' siblings='8'/>

            <cpu id='9' socket_id='0' core_id='4' siblings='9'/>

            <cpu id='10' socket_id='0' core_id='5' siblings='10'/>

            <cpu id='11' socket_id='0' core_id='5' siblings='11'/>

            <cpu id='12' socket_id='0' core_id='6' siblings='12'/>

            <cpu id='13' socket_id='0' core_id='6' siblings='13'/>

            <cpu id='14' socket_id='0' core_id='7' siblings='14'/>

            <cpu id='15' socket_id='0' core_id='7' siblings='15'/>

            <cpu id='16' socket_id='0' core_id='8' siblings='16'/>

            <cpu id='17' socket_id='0' core_id='8' siblings='17'/>

            <cpu id='18' socket_id='0' core_id='9' siblings='18'/>

            <cpu id='19' socket_id='0' core_id='9' siblings='19'/>

          </cpus>

        </cell>

Link to comment

Well I'm starting to find this interesting, primarily from an efficiency point of view!

 

A couple of things, the

<emulatorpin cpuset= >

seems to be widely used to reduce latency.

Maybe something that is worth trying within your XML.

 

lstopo or also known as hwloc-ls displays the hierarchical topology map of the current system.

Getting this to run in Slackware would give the output we want to see.

hwloc-ls is available for Slackware https://www.open-mpi.org/projects/hwloc/

 

 

Lots of good info below, if someone has enough time to kill (I do not right now) you should follow the steps in the 2nd link.

Primer: https://a20.net/bert/tag/libvirt/

Guide: http://docs.openstack.org/developer/nova/testing/libvirt-numa.html

Link to comment

My output from hwloc - I'm still none the wiser. Does this tell me that 0,10 - 1,11 etc are paired?

 

  Package L#0 + L3 L#0 (25MB)

    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0

      PU L#0 (P#0)

      PU L#1 (P#10)

    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1

      PU L#2 (P#1)

      PU L#3 (P#11)

    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2

      PU L#4 (P#2)

      PU L#5 (P#12)

    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3

      PU L#6 (P#3)

      PU L#7 (P#13)

    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4

      PU L#8 (P#4)

      PU L#9 (P#14)

    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5

      PU L#10 (P#5)

      PU L#11 (P#15)

    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6

      PU L#12 (P#6)

      PU L#13 (P#16)

    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7

      PU L#14 (P#7)

      PU L#15 (P#17)

    L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8

      PU L#16 (P#8)

      PU L#17 (P#18)

    L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9

      PU L#18 (P#9)

      PU L#19 (P#19)

 

Link to comment

My output from hwloc - I'm still none the wiser. Does this tell me that 0,10 - 1,11 etc are paired?

 

  Package L#0 + L3 L#0 (25MB)

    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0

      PU L#0 (P#0)

      PU L#1 (P#10)

    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1

      PU L#2 (P#1)

      PU L#3 (P#11)

    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2

      PU L#4 (P#2)

      PU L#5 (P#12)

    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3

      PU L#6 (P#3)

      PU L#7 (P#13)

    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4

      PU L#8 (P#4)

      PU L#9 (P#14)

    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5

      PU L#10 (P#5)

      PU L#11 (P#15)

    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6

      PU L#12 (P#6)

      PU L#13 (P#16)

    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7

      PU L#14 (P#7)

      PU L#15 (P#17)

    L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8

      PU L#16 (P#8)

      PU L#17 (P#18)

    L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9

      PU L#18 (P#9)

      PU L#19 (P#19)

 

I think that is what it means.

Link to comment

Installing packages by themselves without a plugin is not recommended, as there is no tracking of what is installed and could cause conflicts.

 

Ok, I put the warning that others will mention if I didn't...

Now that that's out of the way!  ;)

 

If for whatever reason others are curious, and like that shiny picture representation, do the following.

You need all the packages in the attached picture (I think I may have 1 unneeded one in the list), you can find them here for Slackware64 14.1 http://packages.slackware.com/

Then you can manually install each one (a pain/the way I did it) with:

upgradepkg --install-new packagename
Example: upgradepkg --install-new libXext-1.3.2-x86_64-1.txz

You need to be in the same directory at the console/SSH as where you have the files stored, for me this was /boot/hwloc

 

OR (much easier)

put all the needed packages in the boot/extra folder on your USB drive (if you don't have one, create it) and reboot. Everything will be installed at next boot.

 

Then SSH in (or at console) and type 

lstopo namethepic.png

and it will output to the directory you are in at the command line.

If you get an error about a shared library not available, you missed a package.

You can search for the one it is asking for by using the search box at the top of the Slackware packages page.

 

I removed all the packages afterwards with

removepkg nameofpackage
Example:removepkg cairo-1.12.16-x86_64-1_slack14.1.txz

 

Also, this is a list of the commands/output you can do with lstopo

http://manpages.ubuntu.com/manpages/quantal/man1/lstopo.1.html

 

This was far too much hassle for the purpose of it, but I was determined  :P and thought I'd share.

lstopo-packages.png.f67e5f07f2417c1bb16797fd5282f47a.png

Link to comment

The latency script is gibberish as far as I can tell.

 

  |  0  1  2  3  4  5  6  7
--+------------------------
0| 10  5  5  5  3  7  5  5
1|  5 10  6  6  5  4  6  5
2|  5  5 10  5  5  6  5  5
3|  5  5  5 10  5  5  6  4
4|  3  5  5  5 10  5  5  5
5|  6  3  5  5  5 10  5  5
6|  5  5  5  7  5  5 10  5
7|  5  5  5  3  6  6  5 10

 

But based on this thread:

  https://unix.stackexchange.com/questions/57920/how-do-i-know-which-processors-are-physical-cores

I took a look at /proc/cpuinfo, where there is a "core id" field for each logical processor.  So for my Xeon E3-1240 v3 it looks like these are my pairs:

 

0,4
1,5
2,6
3,7

 

And that matches what I see in these files:

  /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
  /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
  /sys/devices/system/cpu/cpu2/topology/thread_siblings_list
  /sys/devices/system/cpu/cpu3/topology/thread_siblings_list
  etc.

 

Does this seem like it is on the right track?

 

Link to comment

I took a look at /proc/cpuinfo, where there is a "core id" field for each logical processor.  So for my Xeon E3-1240 v3 it looks like these are my pairs:

 

0,4
1,5
2,6
3,7

 

And that matches what I see in these files:

  /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
  /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
  /sys/devices/system/cpu/cpu2/topology/thread_siblings_list
  /sys/devices/system/cpu/cpu3/topology/thread_siblings_list
  etc.

 

Does this seem like it is on the right track?

 

Seems like this should give you the list of siblings:

 

cat /sys/devices/system/cpu/*/topology/thread_siblings_list | sort -u

Link to comment

I took a look at /proc/cpuinfo, where there is a "core id" field for each logical processor.  So for my Xeon E3-1240 v3 it looks like these are my pairs:

 

0,4
1,5
2,6
3,7

 

And that matches what I see in these files:

  /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
  /sys/devices/system/cpu/cpu1/topology/thread_siblings_list
  /sys/devices/system/cpu/cpu2/topology/thread_siblings_list
  /sys/devices/system/cpu/cpu3/topology/thread_siblings_list
  etc.

 

Does this seem like it is on the right track?

 

Seems like this should give you the list of siblings:

 

cat /sys/devices/system/cpu/*/topology/thread_siblings_list | sort -u

Beautiful...  Honestly didn't know that existed. Must be new. Well, we have added that under the system devices page for a future release.  Awesome find!!

Link to comment

Dont wanna seem negative but I'm not sure this gets us any closer to fixing the problems. So now we know that for (any?) given Intel HT CPU, the pairs are N+n where N is the Core number and n  is the number of cores in the CPU

 

ie

4 core = 0+4, 1+5, 2+6, 3+7

6 core = 0+6, 1+7. 2+8 etc

8 core = 0+8, etc

 

I think people with stuttering etc have probably already got that far. But if the latencies script is no use.. well where the heck are we going?

Link to comment

Dont wanna seem negative but I'm not sure this gets us any closer to fixing the problems. So now we know that for (any?) given Intel HT CPU, the pairs are N+n where N is the Core number and n  is the number of cores in the CPU

 

ie

4 core = 0+4, 1+5, 2+6, 3+7

6 core = 0+6, 1+7. 2+8 etc

8 core = 0+8, etc

 

I think people with stuttering etc have probably already got that far. But if the latencies script is no use.. well where the heck are we going?

 

Agreed,  i tried all permutations and still get horrible lag especially above 4 cores. I might have to stick with HT disabled until 6.2 assuming it makes a difference.

Link to comment

We now have LOTS of ways to do this!..  ;D

 

You can also do this (credit to Samuel Holland on VFIOl):

grep . /sys/bus/cpu/devices/*/topology/core_id

Results:

/sys/bus/cpu/devices/cpu0/topology/core_id:0

/sys/bus/cpu/devices/cpu1/topology/core_id:1

/sys/bus/cpu/devices/cpu10/topology/core_id:4

/sys/bus/cpu/devices/cpu11/topology/core_id:5

/sys/bus/cpu/devices/cpu2/topology/core_id:2

/sys/bus/cpu/devices/cpu3/topology/core_id:3

/sys/bus/cpu/devices/cpu4/topology/core_id:4

/sys/bus/cpu/devices/cpu5/topology/core_id:5

/sys/bus/cpu/devices/cpu6/topology/core_id:0

/sys/bus/cpu/devices/cpu7/topology/core_id:1

/sys/bus/cpu/devices/cpu8/topology/core_id:2

/sys/bus/cpu/devices/cpu9/topology/core_id:3

 

or this:

grep . /sys/bus/cpu/devices/*/topology/thread_siblings_list

Results:

/sys/bus/cpu/devices/cpu0/topology/thread_siblings_list:0,6

/sys/bus/cpu/devices/cpu1/topology/thread_siblings_list:1,7

/sys/bus/cpu/devices/cpu10/topology/thread_siblings_list:4,10

/sys/bus/cpu/devices/cpu11/topology/thread_siblings_list:5,11

/sys/bus/cpu/devices/cpu2/topology/thread_siblings_list:2,8

/sys/bus/cpu/devices/cpu3/topology/thread_siblings_list:3,9

/sys/bus/cpu/devices/cpu4/topology/thread_siblings_list:4,10

/sys/bus/cpu/devices/cpu5/topology/thread_siblings_list:5,11

/sys/bus/cpu/devices/cpu6/topology/thread_siblings_list:0,6

/sys/bus/cpu/devices/cpu7/topology/thread_siblings_list:1,7

/sys/bus/cpu/devices/cpu8/topology/thread_siblings_list:2,8

/sys/bus/cpu/devices/cpu9/topology/thread_siblings_list:3,9

 

 

Both of these, plus the one mentioned a couple posts up, and lstopo are in agreement..

I consider this question now well covered to discover what the layout is.

 

I think all of this has derailed this thread a bit, but certainly will help with lag, just not the only issue going on.

 

Link to comment

Dont wanna seem negative but I'm not sure this gets us any closer to fixing the problems. So now we know that for (any?) given Intel HT CPU, the pairs are N+n where N is the Core number and n  is the number of cores in the CPU

 

ie

4 core = 0+4, 1+5, 2+6, 3+7

6 core = 0+6, 1+7. 2+8 etc

8 core = 0+8, etc

 

I think people with stuttering etc have probably already got that far. But if the latencies script is no use.. well where the heck are we going?

 

Try this:

Edit your XML and assign the emulator to the thread pair (HT core) and the CPU to the VM.

For an assignment of 3 cores to the VM, and 3 HT threads (for the emulator) it'd look as follows (for me)

<vcpu placement='static'>3</vcpu>
<cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2/>
    <emulatorpin cpuset='6-8'/>
</cputune>
<cpu mode='host-passthrough'>
    <topology sockets='1' cores='3' threads='1'/>
</cpu>

Link to comment

Dont wanna seem negative but I'm not sure this gets us any closer to fixing the problems. So now we know that for (any?) given Intel HT CPU, the pairs are N+n where N is the Core number and n  is the number of cores in the CPU

 

ie

4 core = 0+4, 1+5, 2+6, 3+7

6 core = 0+6, 1+7. 2+8 etc

8 core = 0+8, etc

 

I think people with stuttering etc have probably already got that far. But if the latencies script is no use.. well where the heck are we going?

 

Try this:

Edit your XML and assign the emulator to the thread pair (HT core) and the CPU to the VM.

For an assignment of 3 cores to the VM, and 3 HT threads (for the emulator) it'd look as follows (for me)

<vcpu placement='static'>3</vcpu>
<cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2/>
    <emulatorpin cpuset='6-8'/>
</cputune>
<cpu mode='host-passthrough'>
    <topology sockets='1' cores='3' threads='1'/>
</cpu>

 

 

Does that config perform better for you?

Link to comment

Does that config perform better for you?

 

I actually don't have this issue, but I haven't gamed on this yet, just curious really.

These recommendations comes from here: https://www.redhat.com/archives/vfio-users/2015-September/msg00041.html

I played with some of these settings, pinning the emulator to the same pins as the vCPU, and to the host.

Noticed a small (but negligible) difference in CPU mark rating, however this is still more for latency removal.

Which I think is what is primarily causing your issues.

 

I also have never isolated any cpu's from unRAID, just assign my VM's to cores that are not as likely to be used (for my main one I used the highest set 4-5, 10-11)  as I haven't had an issue that I felt that it would help to resolve.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.