FPS drops, stuttering, and other things that make me sad


Recommended Posts

  • Replies 119
  • Created
  • Last Reply

Top Posters In This Topic

I found this thread whilst searching for issues with passthrough of my onboard or creative z-series soundcard and my 980ti.

Basically, if I passthrough just my 980ti, gaming is fine, nice and smooth. As soon as I add ANY soundcard be it PCIe or onboard Intel, gaming performance is very poor, lots of stuttering and lag. I am pretty sure it's down to CPU core assignment but I simply dont know what to try next.

 

This is what I have tried...

 

MSI fix (all USB, audio and gfx have negative value)

Assigning cores 0-1 to the W10 VM (reduced lag in gaming but still not perfect)

 

If I assign cores 0-3 and set threads to 8 (quad core), I get worse performance than assigning cores 0-1 and 4 threads (dual core).

 

There must be a way of setting the correct cores to a VM? I have a Xeon 2660V3 10 cores (20 hyperthreaded). If anyone has an idea where I have gone wrong, let me know. Thx.

Link to comment

Yeah, figured that they've been pretty busy..  I'm going to send them an email momentarily just to have them give this thread a glance, I think there are some real issues here with CPU frequency governors and core assignments that, at the very least, could use some clarification.

 

I still haven't run latency tests to determine exactly what the pairings are..  although from what I've read, Intel convention seems to be all physical cores followed by hyperthreaded cores.  i.e., for a 4 core hyperthreaded CPU, pairings are (0,4) (1,5) (2,6) (3,7).  2 core would be (0,2) (1,3), 6 core (0,7) (1,8) etc. etc. etc.

 

But I haven't run the tests because I don't know what to do with the information.  If I do an isolcpus=0-3,6-9 in syslinux, to give unraid core pairs (4,10) and (5,11), things don't act the way I think they should when I start setting governor profiles.  Some cores won't even report their frequency with /usr/bin/cpufreq-info unless I give isolcpus totally sequential cores (isolcpus=0-7 in my case now)

 

So, in addition to cpu core pairs, I'm very curious as to what happens when we isolate those cores from host operations.  How isolated are these cores?  When I give unraid (4,10) and (5,11) to play with, and I go to the webgui dashboard, does core (0,1),(2,3) correspond to (4,10),(5,11)?  And in the same instance, if I do a cpufreq-info, does isolating cpus change that numbering assignment or what?

 

And does any of this really make any difference?  Haha

 

Link to comment

OK, so the mystery deepens somewhat. I spent last few hrs pinnning down what was causing my system to stutter and lag and it appears to be related to how many CPU cores I assign to my VM. If I allocate cores 0-3 out of my 20 available, the W10 VM runs smoothly. All games powered by my EVGA 980ti and my Asus Essence II (oxygen) sound card both run smooth and no stutter, lag or pops and crackles to be seen or heard anywhere. As soon as I increase my core count from 0-3 to 0-7 for 8 cores, I get serious lag and performance drops from both the GPU and soundcard. There is definately something funky going on with cpu assignment but just can't figure out what. Maybe V6.2 will fix all our woes?

Link to comment

OK, so the mystery deepens somewhat. I spent last few hrs pinnning down what was causing my system to stutter and lag and it appears to be related to how many CPU cores I assign to my VM. If I allocate cores 0-3 out of my 20 available, the W10 VM runs smoothly. All games powered by my EVGA 980ti and my Asus Essence II (oxygen) sound card both run smooth and no stutter, lag or pops and crackles to be seen or heard anywhere. As soon as I increase my core count from 0-3 to 0-7 for 8 cores, I get serious lag and performance drops from both the GPU and soundcard. There is definately something funky going on with cpu assignment but just can't figure out what. Maybe V6.2 will fix all our woes?

 

Welcome to vcpu 101. When you assign more than one vcpu, the guest must wait longer for the required number of vcpu to become available. More is not better. It is faster to get one(or four) vcpu than eight vcpu.

Link to comment
Welcome to vcpu 101. When you assign more than one vcpu, the guest must wait longer for the required number of vcpu to become available. More is not better. It is faster to get one(or four) vcpu than eight vcpu.

 

Even when isolating CPU's in syslinux.cfg?

Link to comment

OK, so the mystery deepens somewhat. I spent last few hrs pinnning down what was causing my system to stutter and lag and it appears to be related to how many CPU cores I assign to my VM. If I allocate cores 0-3 out of my 20 available, the W10 VM runs smoothly. All games powered by my EVGA 980ti and my Asus Essence II (oxygen) sound card both run smooth and no stutter, lag or pops and crackles to be seen or heard anywhere. As soon as I increase my core count from 0-3 to 0-7 for 8 cores, I get serious lag and performance drops from both the GPU and soundcard. There is definately something funky going on with cpu assignment but just can't figure out what. Maybe V6.2 will fix all our woes?

 

Welcome to vcpu 101. When you assign more than one vcpu, the guest must wait longer for the required number of vcpu to become available. More is not better. It is faster to get one(or four) vcpu than eight vcpu.

 

Really? Coming from vmware esxi, if I added more vcpu's i got better performance not worse.

 

Link to comment

New weirdness.  Thought I'd gotten things pretty much sorted out.  Was watching a video on Kodi on my Windows VM in the living room, and went over to transcode a video on the Mac VM to save on my ancient iPad.  As soon as I started the transcode and the Mac's CPU meter approached 100%, the Windows VM started to sputter and lag.

 

I've given the Windows VM cores 0-3, Mac VM uses 4-11, and unraid starts with isolcpus=0-7 in syslinux.  Why would the Mac VM kill the Windows VM once it starts running something CPU intensive like a video transcode?  It shouldn't be touching those cores at all.

Link to comment

New weirdness.  Thought I'd gotten things pretty much sorted out.  Was watching a video on Kodi on my Windows VM in the living room, and went over to transcode a video on the Mac VM to save on my ancient iPad.  As soon as I started the transcode and the Mac's CPU meter approached 100%, the Windows VM started to sputter and lag.

 

I've given the Windows VM cores 0-3, Mac VM uses 4-11, and unraid starts with isolcpus=0-7 in syslinux.  Why would the Mac VM kill the Windows VM once it starts running something CPU intensive like a video transcode?  It shouldn't be touching those cores at all.

You have not left any cores for unraid? Unraid seems to prefer to have core 0 available.

Link to comment

New weirdness.  Thought I'd gotten things pretty much sorted out.  Was watching a video on Kodi on my Windows VM in the living room, and went over to transcode a video on the Mac VM to save on my ancient iPad.  As soon as I started the transcode and the Mac's CPU meter approached 100%, the Windows VM started to sputter and lag.

 

I've given the Windows VM cores 0-3, Mac VM uses 4-11, and unraid starts with isolcpus=0-7 in syslinux.  Why would the Mac VM kill the Windows VM once it starts running something CPU intensive like a video transcode?  It shouldn't be touching those cores at all.

You have not left any cores for unraid? Unraid seems to prefer to have core 0 available.

There has also been some discussion about the difference between physical cores,  hyperthreads and the virtual cores as seen by VMs.  If you have an Intel processor with 6 physical cores and hyperthreading then it is likely that virtual cores 0 and 6 are on the same physical core and so on.  This might affect what are the ideal core allocations to a VM to avoid interactions.  Not sure what are the typical associations for AMD processors.  I think there was a mention of  script that could be run to try and determine this on your own system, but I cannot find the details.
Link to comment

The script can be found here.

You have to install netperf and change the binary path so it fits with the slackware package.

The highest number is the matching cores, not the smallest. I made that mistake and it increased my latency.

 

I'm quite sure your problem is that you didn't leave any cores for unraid, which is running the qemu binary that also needs some CPU time for the VMS to run.

Link to comment

The script can be found here.

You have to install netperf and change the binary path so it fits with the slackware package.

The highest number is the matching cores, not the smallest. I made that mistake and it increased my latency.

 

I'm quite sure your problem is that you didn't leave any cores for unraid, which is running the qemu binary that also needs some CPU time for the VMS to run.

 

So what are my core and hyperthread pairs on my 20 core Xeon?

 

Is it

 

0,9

1,10

2,11 and so on? If so, how do you specify this in the xml? At the moment I am pinning cpu cores 2-5 for my w10 vm.

 

My XML - Please check this is optimum based on my Xeon 2660 V3 10 Core proc, thx.

 

  <memory unit='KiB'>12582912</memory>

  <currentMemory unit='KiB'>12582912</currentMemory>

  <memoryBacking>

    <nosharepages/>

    <locked/>

  </memoryBacking>

  <vcpu placement='static'>4</vcpu>

  <cputune>

    <vcpupin vcpu='0' cpuset='2'/>

    <vcpupin vcpu='1' cpuset='3'/>

    <vcpupin vcpu='2' cpuset='4'/>

    <vcpupin vcpu='3' cpuset='5'/>

    <emulatorpin cpuset='0-1'/>

 

  <cpu mode='host-passthrough'>

    <topology sockets='1' cores='4' threads='1'/>

Link to comment

The script can be found here.

You have to install netperf and change the binary path so it fits with the slackware package.

The highest number is the matching cores, not the smallest. I made that mistake and it increased my latency.

 

I'm quite sure your problem is that you didn't leave any cores for unraid, which is running the qemu binary that also needs some CPU time for the VMS to run.

 

So what are my core and hyperthread pairs on my 20 core Xeon?

 

Is it

 

0,9

1,10

2,11 and so on? If so, how do you specify this in the xml? At the moment I am pinning cpu cores 2-5 for my w10 vm.

 

My XML - Please check this is optimum based on my Xeon 2660 V3 10 Core proc, thx.

 

  <memory unit='KiB'>12582912</memory>

  <currentMemory unit='KiB'>12582912</currentMemory>

  <memoryBacking>

    <nosharepages/>

    <locked/>

  </memoryBacking>

  <vcpu placement='static'>4</vcpu>

  <cputune>

    <vcpupin vcpu='0' cpuset='2'/>

    <vcpupin vcpu='1' cpuset='3'/>

    <vcpupin vcpu='2' cpuset='4'/>

    <vcpupin vcpu='3' cpuset='5'/>

    <emulatorpin cpuset='0-1'/>

 

  <cpu mode='host-passthrough'>

    <topology sockets='1' cores='4' threads='1'/>

You have to run the script and look at the output to determine the cores to use.

Link to comment

I decided to run the latency script to see if it gave me any insight into the groupings for my 5930k.

I initially did this from SSH, but then decided it'd be best to do this from the console, no plugins, and certainly no VM's or Docker running.

Anyhow, the results were the same regardless, and in my case aren't conclusive or helpful at all!

For the most part (with higher being better here), the core to itself is a 10, everything else is a 4 or 5, without any differences in between.

 

If anyone else has run this, did you see the expected 8 or 9's between logical and HT cores?

I don't have any issues per say, however I expected this to show something that it is not.

No special parameters in my syslinux file that would affect it.

IMG_20160302_195344_1_Small.jpg.2cbad7a4f03330ef9b111f8b0f126b48.jpg

Link to comment

I decided to run the latency script to see if it gave me any insight into the groupings for my 5930k.

I initially did this from SSH, but then decided it'd be best to do this from the console, no plugins, and certainly no VM's or Docker running.

Anyhow, the results were the same regardless, and in my case aren't conclusive or helpful at all!

For the most part (with higher being better here), the core to itself is a 10, everything else is a 4 or 5, without any differences in between.

 

If anyone else has run this, did you see the expected 8 or 9's between logical and HT cores?

I don't have any issues per say, however I expected this to show something that it is not.

No special parameters in my syslinux file that would affect it.

 

Yes, I did this and got exactly the same numbers for each core as you. My proc is a Xeon 2660 V3 2.6Ghz 10 Core. All my threads/cores are pair 1:1 (e.g. 1,1 2,2 3,3) etc like yours.

2d performance, cpu and memory benchmarks are very poor.

Link to comment

Yes, I did this and got exactly the same numbers for each core as you. My proc is a Xeon 2660 V3 2.6Ghz 10 Core. All my threads/cores are pair 1:1 (e.g. 1,1 2,2 3,3) etc like yours.

2d performance, cpu and memory benchmarks are very poor.

 

Interesting, yeah something is weird here...

If I get REALLY bored, I could boot arch or Ubuntu and run this without UnRAID to find out for sure.

I'm pretty certain mine are 0-6, 1-7, 2-8, etc... as I see no reason for Intel to change the grouping between the 4-core and 6-core CPU's, with the same basic architecture.

 

Maybe everything will "just work" in 6.2 ?? After all, it wasn't like Linus was talking about CPU pinning & latencies in his 7 Gamers 1 CPU video.

 

Since LTT is kinda anonymous here (I did PM JonP more than a week ago) I wonder....

 

No... They also didn't tell you about the terrible reset issues those R9 Nano cards have, and that the build is likely not reliable for reset/shutdown (kind of a deal breaker)...

Info: http://vfio.blogspot.com/search?updated-min=2016-01-01T00:00:00-07:00&updated-max=2017-01-01T00:00:00-07:00&max-results=1

....You can paint a picture anyway you'd like to make it look the way you prefer. However admittedly it was more a proof of concept, then meant for someone to actually do.

Link to comment

I know i mentioned this a bit ago, but this really needs to be addressed at an unraid OS level to distinguish CPU core\thread pears...

 

The best case scenario would be that these are determined on boot and then are represented correctly in the 'Create VM' element of the GUI.

if that would be too much, just some documentation on the WIKI to determine the pairs so we can pin them correctly would be enough.

 

@JonP... is this worth worrying about? has it been addressed in 6.2?

 

Link to comment

I know i mentioned this a bit ago, but this really needs to be addressed at an unraid OS level to distinguish CPU core\thread pears...

 

The best case scenario would be that these are determined on boot and then are represented correctly in the 'Create VM' element of the GUI.

if that would be too much, just some documentation on the WIKI to determine the pairs so we can pin them correctly would be enough.

 

@JonP... is this worth worrying about? has it been addressed in 6.2?

Not for 6.2, no.

Link to comment

I know i mentioned this a bit ago, but this really needs to be addressed at an unraid OS level to distinguish CPU core\thread pears...

 

The best case scenario would be that these are determined on boot and then are represented correctly in the 'Create VM' element of the GUI.

if that would be too much, just some documentation on the WIKI to determine the pairs so we can pin them correctly would be enough.

 

@JonP... is this worth worrying about? has it been addressed in 6.2?

Not for 6.2, no.

 

Ok, so what would you suggest in the short term?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.