FPS drops, stuttering, and other things that make me sad


Recommended Posts

Hey everyone, I've posted a few times here getting my system up and everyone's been a great help, thanks.

 

I'm putting aside part of my unraid server to use as a gaming and htpc rig.  Core i7 5920k overclocked and stable at 4.5ghz. Everything's running, but unfortunately not smoothly enough for me to actually use.

 

Here's what I've done:

 

Isolate cores 0-7 (out of 12 total) from host operations using isolcpus=0-7 in syslinux.

Pass through cores 0-4 and 8gb RAM (out of 32 total) to the Windows machine

Windows 10 vm image located on an unshared nvme ssd (fast fast fast)

Disable xhci in bios to split apart usb controllers, one being passed through to Windows machine using <hostdev>

Nvidia GTX760 + audio passed through to Windows machine using <qemu:commandline>

MSI stuff done (GTX760 and audio controller show negative IRQ in device manager, lspci -v -s shows MSI: Enable+)

 

DPC latency tests are generally good, under 1000us for the most part with the occasional spike.  Was much, much worse but enabling MSI on the GTX760 largely fixed that.

 

System Interrupts in resource monitor seems a little high..  It's averaging about 4% cpu right now but last night during tinkering it was up around 10% at times.

 

I'm gaming with Dolphin, which is an emulator and generally CPU-bound.  Running at 100%, 60FPS, my CPU usage hovers around 35-40%, so I've got plenty of overhead there.  But I'm getting dips in framerate that I'm thinking are GPU related...  because what else could it be?

 

Oh, another weird thing, who knows, maybe related.  I get weird mouse stuttering sometimes.  Like the pointer gets stuck for a second, then boing, it's off on the other side of the screen overshooting whatever I'm trying to click on.  That's pretty frustrating too. 

 

Here's my xml.  Nothing weird.  (Yeah I haven't deleted the virtio drivers iso part yet; though I'd thought of that in a previous vm running Win7 with the same stuttering problems, didn't help)

 

<domain type='kvm' id='16' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>Windows 10</name>
  <uuid>8cca1c77-5110-27f1-aa77-5386c6405f85</uuid>
  <metadata>
    <vmtemplate name="Custom" icon="windows7.png" os="windows7"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='3'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/nvme/vm_images/vdisk1.img'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/Misc/kvm/virtio-win.iso'/>
      <backingStore/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <alias name='ide0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:c0:89:32'/>
      <source bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/Games Machine.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x1a' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-device'/>
    <qemu:arg value='ioh3420,bus=pci.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='vfio-pci,host=07:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='vfio-pci,host=07:00.1,bus=root.1,addr=00.1'/>
  </qemu:commandline>
</domain>

 

I'm out of ideas..  Anyone know what I might be missing?

Link to comment
  • Replies 119
  • Created
  • Last Reply

Top Posters In This Topic

Crappy 3dMark scores, too.  I didn't make a note of the score before closing the window, but was getting 30fps-ish at 720p.  Hmmmm, maybe a PCIe bus width issue?  Loading up the nvidia control panel and selecting system information shows "Bus: PCI Express x 1".  CPU-Z doesn't show anything in the Bus Width section, though... 

 

lspci -vv results:

 

Subsystem: ZOTAC International (MCO) Ltd. Device 3265
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 47
        Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at f0000000 (64-bit, prefetchable) [size=128M]
        Region 3: Memory at f8000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at c000 [size=128]
        Expansion ROM at fb000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00498  Data: 0000
        Capabilities: [78] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 <4us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest+
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900 v1] #19
        Kernel driver in use: vfio-pci

 

Hmmm..  LinkCap shows x16, LinkSta shows x8 (I've got two GPUs in here, and the 5820k is short on lanes so I'm not surprised about the x8..  So as far as unraid's end is concerned it's connected at x8)

 

And looking here, http://www.linux-kvm.org/page/PCITodo, "Support for different PCI express link width/speed settings" is on their to-do list.  Specifically....

 

"Issue: QEMU currently emulates all links at minimal width and speed. This means we don't need to emulate link negotiation, but might in theory confuse guests for assigned devices."

 

Although this page is undated, so I don't know if this is still the case....

Link to comment

I really don't think it's a CPU problem.  Like I said, CPU usage is in the 40% range (as indicated in Windows) and things are still stuttering.  I've run Prime95 to rule out CPU usage being misreported...  when stress testing usage is pegged at 100%.  I have tried giving it more cores, all cores, even tried less on a whim.  No changes.

 

Slowdowns are repeatable.  They'll occur at the same point in a game map, for example...  I'll load up Super Mario Galaxy, and if I walk to a certain place, and the camera is pointed in a certain direction, my FPS drop from 60 to 50.  And stay at 50 if I don't move again.  All the while I'm looking at my CPU meter never going above 40%.  Nothing running in the background.

 

I've ruled out Dolphin as a culprit.  I've tried both DX and OpenGL backends, tweaked every setting, and this is the best I've been able to get it.  I'm getting bad GPU benchmark scores in 3dmark and Cinebench.

 

Might not be a GPU issue but it sure seems like it.  Just don't know what to try next.

Link to comment

  <cpu mode='host-passthrough'>

    <topology sockets='1' cores='4' threads='2'/>

  </cpu>

 

I forget where I saw it, but I recall seeing a post where someone mentioned changing this from "threads=2" to "threads=1" addressed some performance issues they were having. Give that a go?

Link to comment

While this may not be directly your problem, something to keep in mind is that the cores are not grouped (logical, and HT ones) from 0-1, 2-3, 4-5, etc...

This varies by chip/manufacturer, and some testing to figure this out is needed (there is a script that tests latency that doesn't run natively on unRAID that does this).

You take a hit in performance when the shared registers/cache between logical cores and HT ones are doing completely different work loads.

Anyhow with a 6 core CPU it is likely that the companions are 0-7, 1-8, 2-9, 3-10, 4-11, 5-12, however again, not a universal thing.

 

I have not done this testing on my CPU, but we likely have the same configuration as the 5920/5930 are very similar.

I do notice some stuttering in my main Windows 10 VM with 4 cores assigned (8-12), however don't game on it, so it hasn't bothered me enough to investigate.

 

Some info:

JonP talked about in another thread there is a script to check latencies between cores to help distinguish which are in the same logical core.

https://github.com/awilliam/cpu-latencies

This does not run natively in SSH for UnRAID, I assume netperf needs to be installed or something of that nature.

Talk of it https://www.redhat.com/archives/vfio-users/2015-September/msg00041.html

https://www.redhat.com/archives/vfio-users/2015-September/msg00175.html

Link to comment

While this may not be directly your problem, something to keep in mind is that the cores are not grouped (logical, and HT ones) from 0-1, 2-3, 4-5, etc...

This varies by chip/manufacturer, and some testing to figure this out is needed (there is a script that tests latency that doesn't run natively on unRAID that does this).

You take a hit in performance when the shared registers/cache between logical cores and HT ones are doing completely different work loads.

Anyhow with a 6 core CPU it is likely that the companions are 0-7, 1-8, 2-9, 3-10, 4-11, 5-12, however again, not a universal thing.

 

I have not done this testing on my CPU, but we likely have the same configuration as the 5920/5930 are very similar.

I do notice some stuttering in my main Windows 10 VM with 4 cores assigned (8-12), however don't game on it, so it hasn't bothered me enough to investigate.

 

Some info:

JonP talked about in another thread there is a script to check latencies between cores to help distinguish which are in the same logical core.

https://github.com/awilliam/cpu-latencies

This does not run natively in SSH for UnRAID, I assume netperf needs to be installed or something of that nature.

Talk of it https://www.redhat.com/archives/vfio-users/2015-September/msg00041.html

https://www.redhat.com/archives/vfio-users/2015-September/msg00175.html

 

Thankyou for this, I always just assumed that each core was grouped with its thread.

 

However, surely this needs to be addressed at an Unraid OS level so Hyperthreaded cores can be distinguished in the ''create VM' gui?

 

Link to comment

The mouse stuttering and other things seem to me to be indicative of moments of 100% CPU usage.  You think you are seeing 40% CPU, but remember that's usually an average over a long period of a second or 2, long in CPU time.  It could very well be bouncing consistently between periods of 10% and 100%.

 

However, I have no idea why your DPC latency numbers are not showing problems.  They should be if the mouse is freezing.  But I don't have a lot of experience here, perhaps there are other explanations.

Link to comment

I was having issues with stuttering in Youtube and I did some testing and noticed that I had increased issues if using Plex on another device while running a Unigine Benchmark on my Windows 10 VM. To resolve the issue I have pinned CPU cores to certain dockers and I have also pinned cores 6-11 for the Win 10 VM.

 

CPU Pinning

Windows 10 VM 6-11

Plex 4-5

Sonaar 3

Sabnzbd 2

 

I have left cores 0-1 unpinned as I believe this is a good idea for unRAID to function correctly.

 

Link to comment

After a long week of banging my head against walls at work, I've got a couple days off to bang my head against walls with this instead.

 

Thanks for the suggestion, bungee91.  I downloaded the script you linked to, managed to install netperf but couldn't find a build of netserver that would work.  Instead, I just tried some trial and error, but didn't manage to see any improvement.

 

I'm going to ask around on the Dolphin forums as well to see if someone over there might know some way to improve things..  It's very puzzling.  I'm starting to think that unraid just isn't going to be able to do this.  Holding out hope that 6.2 will help -- OVMF instead of seabios improved things a little for me -- but something here just doesn't add up.

Link to comment

the only thing i can think to try next if we cant ensure we are passing through the correct pairs of hyper-threaded cores is to disable hyper-threading so each core you pass through is actually a true core... But then i feel like i'm giving up performance as a whole to solve the problem.

 

@jonp, are you able to shed any light on how we would address the issue of making sure we are passing through the hyperthreaded pairs? surely this is going to have an impact on other stuff outside of the VM if we are pinning CPUs to docker containers as well as VMs...

 

 

Link to comment
I downloaded the script you linked to, managed to install netperf but couldn't find a build of netserver that would work.

 

I managed to get this to work last night, but I don't really know what to do with the results.

 

Unraid is based on Slackware, so you can use the Slackware package manager to install what you need.

 

First, you'll need the netperf package: http://pkgs.org/slackware-14.1/slackonly-x86_64/netperf-2.6.0-x86_64-1_slack.txz.html

Then, you'll need the bc package: http://pkgs.org/slackware-14.1/slackware-x86_64/bc-1.06.95-x86_64-2.txz.html

 

Install each package using with

 

upgradepkg --install-new {packagename}

 

And then you'll need to modify the script so it points to the binaries in "/usr/bin" as opposed to "/usr/local/bin" (for both NETPERF and NETSERV variables).

 

Let me know what you make of the results. I'd be interested to know how I'm supposed to read it.

 

Link to comment

Holy crap I think I may have figured it out.  Seems to possibly be a problem with the host not scaling the CPU frequency as efficiently/intelligently as the guest would like.

 

Check out this shizz.  http://unix.stackexchange.com/questions/64297/host-cpu-does-not-scale-frequency-when-kvm-guest-needs-it

 

So here's what I did to test, and got immediate results.

 

cd /sys/devices/system/cpu/cpu0/cpufreq

 

That's config info for cpu0.  You can monkey with your cpu in here.  "cat scaling_max_freq" resulted in 4300000.  So I thought I'd give this a try.

 

echo 4300000 > scaling_min_freq

 

Basically what this does is set the minimum frequency to be the same as the maximum, so it'll run full tilt constantly.  Did it for the other CPUs I was passing to the Windows VM as well.  Went back to the VM, and noticed in CPU-Z that my CPU was now running at max frequency.  At first glance, all my stuttering and slowdown problems were gone as well.  I still have to do more testing but in-emulator benchmarks have immediately improved 33%.  This is exciting.

 

Now, I probably shouldn't leave things like this.  But, what I surmise was happening to me is that the hypervisor wasn't triggering a jump to the highest multiplier.  It could if it wanted to, though..  because Prime95 did it.  So.  What do we do with this information?  It should be possible to change the frequency scaling rules, shouldn't it?

Link to comment

Good info. Reading that link, the top comment mentions that because you're distributing the load across multiple cores, no one core goes above 95%, so it stays throttled. Or at least the scaling kicks in later than it should be for persistent loads.

 

Bringing that threshold down to about 50% will make it kick in sooner.

 

echo 50 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold

 

Changes are lost after a reboot, so there's no harm in trying it to see if there's a performance boost.

Link to comment

That was the first thing I tried, actually, but it seems that there are some differences between unRaid and Ubuntu when it comes to how CPU multipliers are handled.  That file doesn't exist, so we can't change things that way.

 

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors returns "performance" and "powersave".. 

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is set to "powersave" by default.

 

I'm giving this a try now...

 

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

 

...for all cores..  cpu0/cpufreq, cpu1/cpufreq, etc etc

Link to comment

I managed to get this to work last night, but I don't really know what to do with the results.

 

Unraid is based on Slackware, so you can use the Slackware package manager to install what you need.

 

First, you'll need the netperf package: http://pkgs.org/slackware-14.1/slackonly-x86_64/netperf-2.6.0-x86_64-1_slack.txz.html

Then, you'll need the bc package: http://pkgs.org/slackware-14.1/slackware-x86_64/bc-1.06.95-x86_64-2.txz.html

 

Install each package using with

 

upgradepkg --install-new {packagename}

 

And then you'll need to modify the script so it points to the binaries in "/usr/bin" as opposed to "/usr/local/bin" (for both NETPERF and NETSERV variables).

 

Let me know what you make of the results. I'd be interested to know how I'm supposed to read it.

 

Thanks for the info!

 

Some notes that may be beneficial for those that are looking into this further https://www.redhat.com/archives/vfio-users/2015-September/msg00041.html

Link to comment

Scrapz..  Yep, it appears the 'ondemand' and 'conservative' governors have been deprecated for my CPU.  All I have are 'performance' and 'powersave'.

 

Also, found some tools already installed in unraid to manage CPU frequency..  /usr/bin/cpufreq-set, which allows you to set minimum and maximum frequencies for all cores or individually, as well as changing governers..  /usr/bin/cpufreq-info gives the current settings and /usr/bin/cpufreq-aperf seems to be a performance monitor tool.

 

Much easier than catting and echoing!

Link to comment

I had some issues with this on my FX-9590.

 

I kept thinking it HAD to be something to do with the CPU, reassigned different cores ect.  Popping only seemed to occur when I had at least one VM using 4 cores (so it HAD to be CPU right?).

 

Turns out that I updated the sound card drivers for my Creative Sound Blaster X-Fi Titanium Fata1ty and now popping is non-existent!

 

Something to keep in mind if someone else is pulling their hair out, try updating drivers!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.