At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

Tritech · February 13, 2019

.

Ladies and gentlemen, we got'em.

Massive thanks to reddit user setzer with helping on this. I don't think he's on unraid but his help was invaluable. Latency is now down to at least manageable levels. I'll continue more tweaking.

His .xml = https://pastebin.com/GT1dySwt

My .xml = https://pastebin.com/yGcL0GNj

and he also sent along some additional reading for us. https://forum.level1techs.com/t/increasing-vfio-vga-performance/133443

bastl · February 13, 2019

9 hours ago, Tritech said:

tell me where things are plugged into your rear USB

Tritech · February 13, 2019

@bastl Thanks! I'll try switching some things around and see if that improves anything. Check my post above yours for some updates. IIRC I had my unraid usb where yours is, but I moved it so I can pass through the whole controller.

Edited February 13, 2019 by Tritech

bastl · February 13, 2019

@Tritech That level1 forum is the one we talked about earlier btw 😂

I didn't had any time to test yet, but from what i read are some of these fixes available with qemu 3.2 and later defaults for 4.0. Not sure when we'll see this in unraid.

Tritech · February 13, 2019

Yea, I didn't grasp the concept that the initial post was making about creating a pci root bus and assigning it vs a card. The more recent activity there does seem like that the bulk of improvements should come with QEMU updates...whenever we get those.

The guy I got it from said that the last lines in his xml we for a patched QEMU.

I was also recommend "hugepages", but after a cursory search it seems that unraid enabled that by default. Couldn't get a vm to load with it enabled.

<qemu:commandline>

    <qemu:arg value='-global'/>

    <qemu:arg value='pcie-root-port.speed=8'/>

    <qemu:arg value='-global'/>

    <qemu:arg value='pcie-root-port.width=16'/>

  </qemu:commandline>

Edited February 13, 2019 by Tritech

bastl · February 13, 2019

@Tritech

<memory mode='strict' nodeset='1'/>

Btw the numatune doesn't really work as it supposed to work. It always grabs RAM from the other node too. First VM "strict" 8GB from node0 and second "preferred" 16GB from node1. If I remeber right strict can cause issues when not enough RAM is available on the specified node and should cause an error but it doesn't for me. No glue how to fix this yet.

numastat.png.5dc67b440bee12291435c53f1dbc31bf.png

Edit:

Another thing i noticed in your xml

    <numa>
      <cell id='0' cpus='0-15' memory='16777216' unit='KiB'/>
    </numa>

Isn't that line telling the VM using 16GB from node0 for cores 0-15 where you using cores 8-15 and 24-32?

Edited February 13, 2019 by bastl

Tritech · February 13, 2019

You're right... mine seems to be grabbing almost 1.5 gb from node0.

Tritech · February 13, 2019

42 minutes ago, bastl said:
Edit:

Another thing i noticed in your xml
    <numa>
      <cell id='0' cpus='0-15' memory='16777216' unit='KiB'/>
    </numa>
Isn't that line telling the VM using 16GB from node0 for cores 0-15 where you using cores 8-15 and 24-32?

I saw that, and yes, that's what it looks like to me. Lemme test.

Tritech · February 13, 2019

numaerror.PNG.a406969cbb835e73d79d8ba184b8fe12.PNG

Evidently it does start with 1. I think the "0-15" refers to the vcpupin'd cpus, not physical.

Edited February 13, 2019 by Tritech

bastl · February 13, 2019

@Tritech Ok the "<numa>" tag is only if you have more vCPUs as one NUMA node has and you want to have a NUMA topology inside the VM. Let's say 2 cores from each node and than you can tell the vm which "virtual" node uses how much RAM. This should not affect us.

example from https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/virtualization_tuning_and_optimization_guide/

Btw. a really useful guide.


4 available nodes (0-3)
Node 0:	CPUs 0 4, size 4000 MiB
Node 1: CPUs 1 5, size 3999 MiB
Node 2: CPUs 2 6, size 4001 MiB
Node 3: CPUs 0 4, size 4005 MiB

In this scenario, use the following Domain XML setting:

<cputune>
	<vcpupin vcpu="0" cpuset="1"/>
	<vcpupin vcpu="1" cpuset="5"/>
	<vcpupin vcpu="2" cpuset="2"/>
	<vcpupin vcpu="3" cpuset="6"/>
</cputune>
<numatune>
  <memory mode="strict" nodeset="1-2"/> 
</numatune>
<cpu>
	<numa>
		<cell id="0" cpus="0-1" memory="3" unit="GiB"/>
		<cell id="1" cpus="2-3" memory="3" unit="GiB"/>
	</numa>
</cpu>

Tritech · February 13, 2019

I cross-referenced that several times. Really helpful stuff.

I reapplied the Epyc "hack" and that further brought down my latency, to ~300u, as low as 125ish.

https://pastebin.com/dLWncwhV

bastl · February 13, 2019

The <emulatorpin> tag specifies which host physical CPUs the emulator (a subset of a domain, not including vCPUs)
  will be pinned to. The <emulatorpin> tag provides a method of setting a precise affinity to emulator
  thread processes. As a result, vhost threads run on the same subset of physical CPUs and memory, and
  therefore benefit from cache locality.

@Tritech Does that mean emulatorpin outside the range of the already used vCPUs? I already have it set up for my main VM that the emupin cores are separated from the cores the VM uses, same die. Difficult if it's not your main language ^^

Tritech · February 13, 2019

I get what your saying, I think its saying that they should be in the included range. You know how you left out cores 8/24? Well I think they have to be on the same "domain" to be used at all, well to at least get the most out of them. At least that's the way I interpret it.

I've tweaked my config for now just so they're all on the same domain. I'll fix it later when I change my isolcpus at reboot.

Here's some updates as well, seems that storport.sys is whats giving me the highest execution time. Gonna see if I can track down any gains there.

bastl · February 13, 2019

If the storport.sys handles all the disk IO than maybe changing/tweaking the iothreadpin can bring improvements.

Tritech · February 13, 2019

Actually I let it run a bit longer and both of the highest execution are network related. ndis.sys and adf.sys. Come to think of it, you're using a different ethernet port than I am. I wonder if that may have some issue. I'm using the 10G port, which I don't really have a use for right now, the rest of my network is gigabit.

Edited February 13, 2019 by Tritech

bastl · February 13, 2019

I don't think. I don't directly passthrough a nic. It's a virtual nic emulated by unraid. I guess it's the same for you.

Tritech · February 13, 2019

Yea I was just wondering if its driver related on the host side/vfio.

billington.mark · February 13, 2019

3 hours ago, Tritech said:
Yea, I didn't grasp the concept that the initial post was making about creating a pci root bus and assigning it vs a card. The more recent activity there does seem like that the bulk of improvements should come with QEMU updates...whenever we get those.

The guy I got it from said that the last lines in his xml we for a patched QEMU.

I was also recommend "hugepages", but after a cursory search it seems that unraid enabled that by default. Couldn't get a vm to load with it enabled.
<qemu:commandline>

    <qemu:arg value='-global'/>

    <qemu:arg value='pcie-root-port.speed=8'/>

    <qemu:arg value='-global'/>

    <qemu:arg value='pcie-root-port.width=16'/>

  </qemu:commandline>

Ive been pushing for the changes detailed in that level1tech forum post for a while...

https://forums.unraid.net/topic/77499-qemu-pcie-root-port-patch/

Feel free to post in there to push the issue.. the next stable release of QEMU doesnt look like its coming up until April\May: https://wiki.qemu.org/Planning/4.0. So fingers crossed there's an Unraid release offering that soon after.

The alternative is for the @limetech guys to be nice to us and include QEMU from the master branch rather than from a stable release in the next RC....

Considering how many issues it would fix around threadripper, as well as PCIe passthrough performance increases, it would make ALOT of people happy...

Jerky_san · February 13, 2019

47 minutes ago, billington.mark said:

Ive been pushing for the changes detailed in that level1tech forum post for a while...

https://forums.unraid.net/topic/77499-qemu-pcie-root-port-patch/

Feel free to post in there to push the issue.. the next stable release of QEMU doesnt look like its coming up until April\May: https://wiki.qemu.org/Planning/4.0. So fingers crossed there's an Unraid release offering that soon after.

The alternative is for the @limetech guys to be nice to us and include QEMU from the master branch rather than from a stable release in the next RC....

Considering how many issues it would fix around threadripper, as well as PCIe passthrough performance increases, it would make ALOT of people happy...

Yeah I really hope they do like they did previously with a special build just for threadripper.. I'll stay on that till qemu 4.0 makes it into unraid if it gets the performance increases talked about. It will be exciting.

bastl · February 13, 2019

2 minutes ago, Jerky_san said:

build just for threadripper

Yeah, if I remember correctly they pushed the "ugly patch" into unraid before it was build into the kernel. Let's hope the devs still loving their Threadripper systems and playing around with them.

Tritech · February 13, 2019

Step one would be keeping this on the first page and visible here as well 😁

Devs, you pickin' up what we're putting down?

billington.mark · February 14, 2019

Having a build with QEMU from master would benefit everyone, not just you guys with threadripper builds

Jerky_san · February 14, 2019

8 hours ago, billington.mark said:

Having a build with QEMU from master would benefit everyone, not just you guys with threadripper builds

Last night I worked for almost 3 hours to do just that.. Its a lot harder than I expected to get it working on Unraid.. What I was hoping to do was just see what I could do knowing that when I restart it all blows away anyways.

Should mention it failed because I'm guessing LimeTech compiles with special options or something. It would constantly error when I tried to start the VM saying the field name wasn't a valid field. Looking at the log of a working machine it's apparently parsing the XML and presenting that as a command. I'll keep working on it but the LimeTech QEMU executable is much larger to so I'm missing something..

Edited February 14, 2019 by Jerky_san

Tritech · February 16, 2019

<qemu:commandline> 

    <qemu:arg value='-global'/>    

<qemu:arg value='pcie-root-port.speed=8'/>     

<qemu:arg value='-global'/>    

<qemu:arg value='pcie-root-port.width=16'/>   

</qemu:commandline>

pcie16x.PNG.c969a61603d15cc53a726f75d43e4ce3.PNG

Tested the new RC4 and looks like it's working!

bastl · February 16, 2019

Same for me. Test VM with a 1050ti and the Nvidia driver shows the correct PCIe Bus speeds.

nvidia.png.2479d0b6533ef80138885e033fa863a0.png

899456194_aidaGPUcompareRC3RC4.jpg.a85df5981df7bf2596ae44acf68bb701.jpg

At my wits end with latency and audio drops in win10vm w/ threadripper 1950x

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

bastl

bastl

bastl

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation