Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 . Ladies and gentlemen, we got'em. Massive thanks to reddit user setzer with helping on this. I don't think he's on unraid but his help was invaluable. Latency is now down to at least manageable levels. I'll continue more tweaking. His .xml = https://pastebin.com/GT1dySwt My .xml = https://pastebin.com/yGcL0GNj and he also sent along some additional reading for us. https://forum.level1techs.com/t/increasing-vfio-vga-performance/133443 Quote Link to comment
bastl Posted February 13, 2019 Share Posted February 13, 2019 9 hours ago, Tritech said: tell me where things are plugged into your rear USB 1 Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 (edited) @bastl Thanks! I'll try switching some things around and see if that improves anything. Check my post above yours for some updates. IIRC I had my unraid usb where yours is, but I moved it so I can pass through the whole controller. Edited February 13, 2019 by Tritech Quote Link to comment
bastl Posted February 13, 2019 Share Posted February 13, 2019 @Tritech That level1 forum is the one we talked about earlier btw 😂 I didn't had any time to test yet, but from what i read are some of these fixes available with qemu 3.2 and later defaults for 4.0. Not sure when we'll see this in unraid. Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 (edited) Yea, I didn't grasp the concept that the initial post was making about creating a pci root bus and assigning it vs a card. The more recent activity there does seem like that the bulk of improvements should come with QEMU updates...whenever we get those. The guy I got it from said that the last lines in his xml we for a patched QEMU. I was also recommend "hugepages", but after a cursory search it seems that unraid enabled that by default. Couldn't get a vm to load with it enabled. <qemu:commandline> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.speed=8'/> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.width=16'/> </qemu:commandline> Edited February 13, 2019 by Tritech Quote Link to comment
bastl Posted February 13, 2019 Share Posted February 13, 2019 (edited) @Tritech <memory mode='strict' nodeset='1'/> Btw the numatune doesn't really work as it supposed to work. It always grabs RAM from the other node too. First VM "strict" 8GB from node0 and second "preferred" 16GB from node1. If I remeber right strict can cause issues when not enough RAM is available on the specified node and should cause an error but it doesn't for me. No glue how to fix this yet. Edit: Another thing i noticed in your xml <numa> <cell id='0' cpus='0-15' memory='16777216' unit='KiB'/> </numa> Isn't that line telling the VM using 16GB from node0 for cores 0-15 where you using cores 8-15 and 24-32? Edited February 13, 2019 by bastl Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 You're right... mine seems to be grabbing almost 1.5 gb from node0. Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 42 minutes ago, bastl said: Edit: Another thing i noticed in your xml <numa> <cell id='0' cpus='0-15' memory='16777216' unit='KiB'/> </numa> Isn't that line telling the VM using 16GB from node0 for cores 0-15 where you using cores 8-15 and 24-32? I saw that, and yes, that's what it looks like to me. Lemme test. Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 (edited) Evidently it does start with 1. I think the "0-15" refers to the vcpupin'd cpus, not physical. Edited February 13, 2019 by Tritech Quote Link to comment
bastl Posted February 13, 2019 Share Posted February 13, 2019 @Tritech Ok the "<numa>" tag is only if you have more vCPUs as one NUMA node has and you want to have a NUMA topology inside the VM. Let's say 2 cores from each node and than you can tell the vm which "virtual" node uses how much RAM. This should not affect us. example from https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/virtualization_tuning_and_optimization_guide/ Btw. a really useful guide. 4 available nodes (0-3) Node 0: CPUs 0 4, size 4000 MiB Node 1: CPUs 1 5, size 3999 MiB Node 2: CPUs 2 6, size 4001 MiB Node 3: CPUs 0 4, size 4005 MiB In this scenario, use the following Domain XML setting: <cputune> <vcpupin vcpu="0" cpuset="1"/> <vcpupin vcpu="1" cpuset="5"/> <vcpupin vcpu="2" cpuset="2"/> <vcpupin vcpu="3" cpuset="6"/> </cputune> <numatune> <memory mode="strict" nodeset="1-2"/> </numatune> <cpu> <numa> <cell id="0" cpus="0-1" memory="3" unit="GiB"/> <cell id="1" cpus="2-3" memory="3" unit="GiB"/> </numa> </cpu> Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 I cross-referenced that several times. Really helpful stuff. I reapplied the Epyc "hack" and that further brought down my latency, to ~300u, as low as 125ish. https://pastebin.com/dLWncwhV Quote Link to comment
bastl Posted February 13, 2019 Share Posted February 13, 2019 The <emulatorpin> tag specifies which host physical CPUs the emulator (a subset of a domain, not including vCPUs) will be pinned to. The <emulatorpin> tag provides a method of setting a precise affinity to emulator thread processes. As a result, vhost threads run on the same subset of physical CPUs and memory, and therefore benefit from cache locality. @Tritech Does that mean emulatorpin outside the range of the already used vCPUs? I already have it set up for my main VM that the emupin cores are separated from the cores the VM uses, same die. Difficult if it's not your main language ^^ Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 I get what your saying, I think its saying that they should be in the included range. You know how you left out cores 8/24? Well I think they have to be on the same "domain" to be used at all, well to at least get the most out of them. At least that's the way I interpret it. I've tweaked my config for now just so they're all on the same domain. I'll fix it later when I change my isolcpus at reboot. Here's some updates as well, seems that storport.sys is whats giving me the highest execution time. Gonna see if I can track down any gains there. Quote Link to comment
bastl Posted February 13, 2019 Share Posted February 13, 2019 If the storport.sys handles all the disk IO than maybe changing/tweaking the iothreadpin can bring improvements. Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 (edited) Actually I let it run a bit longer and both of the highest execution are network related. ndis.sys and adf.sys. Come to think of it, you're using a different ethernet port than I am. I wonder if that may have some issue. I'm using the 10G port, which I don't really have a use for right now, the rest of my network is gigabit. Edited February 13, 2019 by Tritech Quote Link to comment
bastl Posted February 13, 2019 Share Posted February 13, 2019 I don't think. I don't directly passthrough a nic. It's a virtual nic emulated by unraid. I guess it's the same for you. Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 Yea I was just wondering if its driver related on the host side/vfio. Quote Link to comment
billington.mark Posted February 13, 2019 Share Posted February 13, 2019 3 hours ago, Tritech said: Yea, I didn't grasp the concept that the initial post was making about creating a pci root bus and assigning it vs a card. The more recent activity there does seem like that the bulk of improvements should come with QEMU updates...whenever we get those. The guy I got it from said that the last lines in his xml we for a patched QEMU. I was also recommend "hugepages", but after a cursory search it seems that unraid enabled that by default. Couldn't get a vm to load with it enabled. <qemu:commandline> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.speed=8'/> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.width=16'/> </qemu:commandline> Ive been pushing for the changes detailed in that level1tech forum post for a while... https://forums.unraid.net/topic/77499-qemu-pcie-root-port-patch/ Feel free to post in there to push the issue.. the next stable release of QEMU doesnt look like its coming up until April\May: https://wiki.qemu.org/Planning/4.0. So fingers crossed there's an Unraid release offering that soon after. The alternative is for the @limetech guys to be nice to us and include QEMU from the master branch rather than from a stable release in the next RC.... Considering how many issues it would fix around threadripper, as well as PCIe passthrough performance increases, it would make ALOT of people happy... 1 Quote Link to comment
Jerky_san Posted February 13, 2019 Share Posted February 13, 2019 47 minutes ago, billington.mark said: Ive been pushing for the changes detailed in that level1tech forum post for a while... https://forums.unraid.net/topic/77499-qemu-pcie-root-port-patch/ Feel free to post in there to push the issue.. the next stable release of QEMU doesnt look like its coming up until April\May: https://wiki.qemu.org/Planning/4.0. So fingers crossed there's an Unraid release offering that soon after. The alternative is for the @limetech guys to be nice to us and include QEMU from the master branch rather than from a stable release in the next RC.... Considering how many issues it would fix around threadripper, as well as PCIe passthrough performance increases, it would make ALOT of people happy... Yeah I really hope they do like they did previously with a special build just for threadripper.. I'll stay on that till qemu 4.0 makes it into unraid if it gets the performance increases talked about. It will be exciting. Quote Link to comment
bastl Posted February 13, 2019 Share Posted February 13, 2019 2 minutes ago, Jerky_san said: build just for threadripper Yeah, if I remember correctly they pushed the "ugly patch" into unraid before it was build into the kernel. Let's hope the devs still loving their Threadripper systems and playing around with them. Quote Link to comment
Tritech Posted February 13, 2019 Author Share Posted February 13, 2019 Step one would be keeping this on the first page and visible here as well 😁 Devs, you pickin' up what we're putting down? Quote Link to comment
billington.mark Posted February 14, 2019 Share Posted February 14, 2019 Having a build with QEMU from master would benefit everyone, not just you guys with threadripper builds Quote Link to comment
Jerky_san Posted February 14, 2019 Share Posted February 14, 2019 (edited) 8 hours ago, billington.mark said: Having a build with QEMU from master would benefit everyone, not just you guys with threadripper builds Last night I worked for almost 3 hours to do just that.. Its a lot harder than I expected to get it working on Unraid.. What I was hoping to do was just see what I could do knowing that when I restart it all blows away anyways. Should mention it failed because I'm guessing LimeTech compiles with special options or something. It would constantly error when I tried to start the VM saying the field name wasn't a valid field. Looking at the log of a working machine it's apparently parsing the XML and presenting that as a command. I'll keep working on it but the LimeTech QEMU executable is much larger to so I'm missing something.. Edited February 14, 2019 by Jerky_san Quote Link to comment
Tritech Posted February 16, 2019 Author Share Posted February 16, 2019 <qemu:commandline> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.speed=8'/> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.width=16'/> </qemu:commandline> Tested the new RC4 and looks like it's working! Quote Link to comment
bastl Posted February 16, 2019 Share Posted February 16, 2019 Same for me. Test VM with a 1050ti and the Nvidia driver shows the correct PCIe Bus speeds. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.