billington.mark Posted January 29, 2019 Share Posted January 29, 2019 (edited) Please can the following patch be applied to QEMU (until QEMU 4.0 is bundled with unraid, as this fix is already present in master) PCIe root ports are only exposed to VM guests as x1, which results in GPU pass-through performance degradation, and in some cases on higher end NVIDIA cards, the driver doesn't initialise some features of the card. https://patchwork.kernel.org/cover/10683043/ Once applied, the following would be added to the VMs XML, to modify the PCIe root ports to be x16 ports: <qemu:commandline> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.speed=8'/> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.width=16'/> </qemu:commandline> Patch is well documented over here too: https://forum.level1techs.com/t/increasing-vfio-vga-performance/133443 This would also increase performance of any other passed through PCIe devices which use more bandwidth provided by an x1 port (NVMe, 10Gb NICs, etc). If we could have QEMU compiled from master instead of the releases though... that would be even better! Edited January 29, 2019 by billington.mark 3 Quote Link to comment
Jerky_san Posted January 29, 2019 Share Posted January 29, 2019 3 hours ago, billington.mark said: Please can the following patch be applied to QEMU (until QEMU 4.0 is bundled with unraid, as this fix is already present in master) PCIe root ports are only exposed to VM guests as x1, which results in GPU pass-through performance degradation, and in some cases on higher end NVIDIA cards, the driver doesn't initialise some features of the card. https://patchwork.kernel.org/cover/10683043/ Once applied, the following would be added to the VMs XML, to modify the PCIe root ports to be x16 ports: <qemu:commandline> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.speed=8'/> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.width=16'/> </qemu:commandline> Patch is well documented over here too: https://forum.level1techs.com/t/increasing-vfio-vga-performance/133443 This would also increase performance of any other passed through PCIe devices which use more bandwidth provided by an x1 port (NVMe, 10Gb NICs, etc). If we could have QEMU compiled from master instead of the releases though... that would be even better! Inside this thread he talks about how to do it without the patch. It was a big pain the damn ass but I think I got it working.. Quote Link to comment
1812 Posted January 29, 2019 Share Posted January 29, 2019 15 minutes ago, Jerky_san said: It was a big pain the damn ass but I think I got it working.. how about you share with the class? Quote Link to comment
Jerky_san Posted January 29, 2019 Share Posted January 29, 2019 2 hours ago, 1812 said: how about you share with the class? Changed windows 10 machine to q35-3.1 machine Inserted this below the rest of the controllers <controller type='pci' index='8' model='pcie-root-port'> <model name='ioh3420'/> <target chassis='8' port='0x1f'/> <alias name='pci.8'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' function='0x0' multifunction='on'/> </controller> Went where my hostdev's are and inserted this. Change the address in the <source></source> to your GPU's and GPU audio's on the second and it should be more or less plug and play. <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <rom file='/mnt/user/domains/1080ti.rom'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </hostdev> Quote Link to comment
1812 Posted January 29, 2019 Share Posted January 29, 2019 6 minutes ago, Jerky_san said: Changed windows 10 machine to q35-3.1 machine Inserted this below the rest of the controllers <controller type='pci' index='8' model='pcie-root-port'> <model name='ioh3420'/> <target chassis='8' port='0x1f'/> <alias name='pci.8'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' function='0x0' multifunction='on'/> </controller> Went where my hostdev's are and inserted this. Change the address in the <source></source> to your GPU's and GPU audio's on the second and it should be more or less plug and play. <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <rom file='/mnt/user/domains/1080ti.rom'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x1'/> </source> <alias name='hostdev1'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </hostdev> Thanks for this. I'm wondering if this is a windows only issue as on MacOS it reports the correct lane width (at least on one machine I have) and hits at or near bare metal benchmarks for gpu. I'll try to play around with it over the next few days! Quote Link to comment
billington.mark Posted January 29, 2019 Author Share Posted January 29, 2019 (edited) Thats still not fixed. (as much as id like for it to have been that easy!) Have a look in the NVIDIA control panel under system info at the bus in use. (id put money on it being x1!). (image is from the level1 forum as im not at home and cant take a screenshot currently) You can also do a speed test by using the evga utility: https://forums.evga.com/PCIE-bandwidth-test-cuda-m1972266.aspx The patch to add the ability to set pcie root port speeds wasn't present in the 3.1 release (which is what we're on, as of 6.7.0rc2) Edited January 29, 2019 by billington.mark Quote Link to comment
GHunter Posted January 29, 2019 Share Posted January 29, 2019 I'd like this as well if this really does work!! This could possibly fix many of the problems some people have with GPU passthrough performance. Quote Link to comment
Jerky_san Posted January 29, 2019 Share Posted January 29, 2019 2 hours ago, billington.mark said: Thats still not fixed. (as much as id like for it to have been that easy!) Have a look in the NVIDIA control panel under system info at the bus speed in use. (id put money on it being x1!). You can also do a speed test by using the evga utility: https://forums.evga.com/PCIE-bandwidth-test-cuda-m1972266.aspx The patch to add the ability to set pcie root port speeds wasn't present in the 3.1 release (which is what we're on, as of 6.7.0rc2) Ah shit your right.. still x1 #_# thats depressing Quote Link to comment
billington.mark Posted January 29, 2019 Author Share Posted January 29, 2019 Yep, and because of that, the NVIDIA driver is reigning in performance. I dont use MacOS, so im not sure if you're able to see this info on the driver... but in either case, x1 root ports will be presented to the VM guest, regardless of the OS its running. Depending on what checks the driver is doing on MacOS, it might have different performance implications than on Windows. Quote Link to comment
Jerky_san Posted January 29, 2019 Share Posted January 29, 2019 2 hours ago, billington.mark said: Yep, and because of that, the NVIDIA driver is reigning in performance. I dont use MacOS, so im not sure if you're able to see this info on the driver... but in either case, x1 root ports will be presented to the VM guest, regardless of the OS its running. Depending on what checks the driver is doing on MacOS, it might have different performance implications than on Windows. welp hope we get it then or maybe a way to to just run the RC over QEMU 4.0 and have a switch that turns it on & off or something. Quote Link to comment
unrateable Posted January 29, 2019 Share Posted January 29, 2019 (edited) I am confused PCI 3.0 x16 should gibe about 15,500 Mbyte/s did run the linked tool in my Win10 VM guest and it shows me the following speed its off by some degree, but probably ok since its a VM guest and VT-d ? GPU-Z in Win10 reports PCIe 3.0 x16 and when I use the built in test it switches to x1 when I pause and back to x16 when I continue in NVIDIA Systeminfo it says does that mean passthrough works as it should GPU on PCIe 3.0 x16 ?! 😕 Edited January 29, 2019 by unrateable Quote Link to comment
Jerky_san Posted January 29, 2019 Share Posted January 29, 2019 (edited) 2 hours ago, unrateable said: I am confused did run the linked tool in my Win10 VM guest and it shows me the following speed GPU-Z in Win10 reports PCIe 3.0 x16 and when I use the built in test it switches to x1 when I pause and back to x16 when I continue also in NVIDIA Systeminfo it says I believe its all good and passthrough works as it should GPU on PCIe 3.0 x16, ain´t it ?! 😕 I read the whole thing he posted(it was a hell of a lot) basically when the VM boots it sets a bunch of registers. Those registers impact how the driver and windows interact with the card. Latency and many things are impacted. Not just speed. Basically the patch is to tell the card when it boots "hey your in a x16 slot so set the registers accordingly!" and so it does. Edited January 29, 2019 by Jerky_san Quote Link to comment
m0ngr31 Posted January 29, 2019 Share Posted January 29, 2019 Would be great to have this. Quote Link to comment
billington.mark Posted January 30, 2019 Author Share Posted January 30, 2019 12 hours ago, unrateable said: I am confused PCI 3.0 x16 should gibe about 15,500 Mbyte/s did run the linked tool in my Win10 VM guest and it shows me the following speed its off by some degree, but probably ok since its a VM guest and VT-d ? GPU-Z in Win10 reports PCIe 3.0 x16 and when I use the built in test it switches to x1 when I pause and back to x16 when I continue in NVIDIA Systeminfo it says does that mean passthrough works as it should GPU on PCIe 3.0 x16 ?! 😕 Are you using Q35 or i440fx? The issue here is that the NVIDIA driver is behaving differently if the bus reported is anything less than x8. Also, Latency on the VM as a whole is greatly improved when using Q35 with the patches. Its a long read, but you can see the evolution of these changes on the level1tech forum i linked in the original post. Quote Link to comment
Jerky_san Posted January 30, 2019 Share Posted January 30, 2019 6 hours ago, billington.mark said: Are you using Q35 or i440fx? The issue here is that the NVIDIA driver is behaving differently if the bus reported is anything less than x8. Also, Latency on the VM as a whole is greatly improved when using Q35 with the patches. Its a long read, but you can see the evolution of these changes on the level1tech forum i linked in the original post. i440fx gives me the 0x while q35 gives me 1x so I assume his is the same Quote Link to comment
unrateable Posted January 30, 2019 Share Posted January 30, 2019 9 hours ago, billington.mark said: Are you using Q35 or i440fx? The issue here is that the NVIDIA driver is behaving differently if the bus reported is anything less than x8. Also, Latency on the VM as a whole is greatly improved when using Q35 with the patches. Its a long read, but you can see the evolution of these changes on the level1tech forum i linked in the original post. I am using i440fx-2.7 without any patch. still puzzling. Somebody here that can use the CLI tool and show their results of a working-as-supposed-to pcie 3.0 x 16 GPU ? Quote Link to comment
Jerky_san Posted January 30, 2019 Share Posted January 30, 2019 43 minutes ago, unrateable said: I am using i440fx-2.7 without any patch. still puzzling. Somebody here that can use the CLI tool and show their results of a working-as-supposed-to pcie 3.0 x 16 GPU ? When the VM starts the card it does a rate negotiation. The rate negotiation eventually works its way up to the proper but the card when it starts only sees x1 so it sets registers to that effect. Please read the post he did if you'd like further detail. The guys in the posts go into great depth about it Quote Link to comment
billington.mark Posted January 30, 2019 Author Share Posted January 30, 2019 (edited) 4 hours ago, unrateable said: I am using i440fx-2.7 without any patch. still puzzling. Somebody here that can use the CLI tool and show their results of a working-as-supposed-to pcie 3.0 x 16 GPU ? i440fx doesnt have any PCIe 'slots' as such. its presenting the GPU to the OS on a pci slot. Again, causing latency and a performance hit compared to bare metal. The CLI tool is to show that when you use Q35, the PCIe root ports are x1, not x16. The issue here is that the NVIDIA driver doesnt corrently initialise the card (on windows anyway), unless it detects its on an x8 or x16 slot. The comments on the patch do a good job of explaining whats going on, and whats being changed here: https://patchwork.kernel.org/cover/10683043/ I'm by no means complaining, but if there's a way to improve performance and get as close to bare metal as possible, i think its worth implementing. 👍 Edited January 30, 2019 by billington.mark Quote Link to comment
bastl Posted February 13, 2019 Share Posted February 13, 2019 I tried a lot of things to improve the performance of my VMs the last couple days and stumbled across that level1tech forum as i guess like everybody here. Great in depth information and i hope limetech is able to push that fix to us unraid users as soon as possible 😉 GIVE US THE FIX NOOOOOOW Just kiddin. Don't push features if they aren't tested in your product. Since I'am using Unraid, even with all the RC builds I tested (every public RC since early 2018) were stable for my needs. Sure there are always performance improvments possible often on the edge of stability. Always using the bleeding edge technology is fun, sure and for a techi nice to play with but for the general user often hard to handle. It's hard for @limetech and any over tech company to find a good middle way. I believe in you guys 👍 Quote Link to comment
Tritech Posted February 13, 2019 Share Posted February 13, 2019 Just chiming in to say I am waiting for this and any other threadripper performance related changes as well. Quote Link to comment
jordanmw Posted February 13, 2019 Share Posted February 13, 2019 Also hoping we get this ported in SOON. Quote Link to comment
billington.mark Posted February 14, 2019 Author Share Posted February 14, 2019 The original topic of this post was to highlight a particular problem I was having (And still am), but the main underlying point here is that over the last couple of years, development on QEMU, introduction of new hardware from AMD, and the general love for virtualisation on workstation hardware has meant development in this space is moving at quite a pace. Short term, a build which would include virtualisation modules from master would make a lot of people happy, but the same is inevitably going to happen when 3rd gen Ryzen, 3rd gen Threadripper, PCIe4, PCIe5, etc, etc drops in the coming months. Personally, I think the long term holy grail here is to see the ability to choose which branch we're able to run key modules like QEMU, libvirt, docker from... then be able to update and get the latest patches\performance improvements independently of an unraid release. Short term though... a build to keep us all quiet would be lovely 2 Quote Link to comment
limetech Posted February 15, 2019 Share Posted February 15, 2019 qemu 3.1.0 with aforementioned patch will be available starting with 6.7.0-rc4. 2 2 Quote Link to comment
m0ngr31 Posted February 15, 2019 Share Posted February 15, 2019 That is amazing news. Thanks! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.