Jump to content

billington.mark

Members
  • Content Count

    339
  • Joined

  • Last visited

Everything posted by billington.mark

  1. billington.mark

    QEMU PCIe Root Port Patch

    Please can the following patch be applied to QEMU (until QEMU 4.0 is bundled with unraid, as this fix is already present in master) PCIe root ports are only exposed to VM guests as x1, which results in GPU pass-through performance degradation, and in some cases on higher end NVIDIA cards, the driver doesn't initialise some features of the card. https://patchwork.kernel.org/cover/10683043/ Once applied, the following would be added to the VMs XML, to modify the PCIe root ports to be x16 ports: <qemu:commandline> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.speed=8'/> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.width=16'/> </qemu:commandline> Patch is well documented over here too: https://forum.level1techs.com/t/increasing-vfio-vga-performance/133443 This would also increase performance of any other passed through PCIe devices which use more bandwidth provided by an x1 port (NVMe, 10Gb NICs, etc). If we could have QEMU compiled from master instead of the releases though... that would be even better!
  2. billington.mark

    QEMU PCIe Root Port Patch

    QEMU 4.0 RC0 has been released - https://www.qemu.org/download/#source And a nice specific mention in the changelog to things discussed in this thread (https://wiki.qemu.org/ChangeLog/4.0): Now that these changes are standard with the Q35 machinetype in 4.0, I think this could also be an additional argument against potentially forcing Windows based VMs to the i440fx machine type if this brings things into performance parity? If @limetech could throw this into the next RC for people to test out, that would be much appreciated!
  3. billington.mark

    QEMU PCIe Root Port Patch

    It was me I think the current behaviour in the UI is perfect. Pick an OS, and the sensible, least hassle settings are there for you to use. I dont think options to change the machine type should be removed. At worse, they could possibly be hidden behind an "advanced" switch (which i think currently flips between the form and the xml), then having another tab to view xml instead?... I know there's a balance to be found to accommodate all levels of unraid users here, and i dont envy the UI decisions to try and keep everyone happy! It is worth pointing out that its documented the drivers DO behave differently based on what PCIe link speed they detect, and personally i get better performance numbers, and prefer running a Q35 based VM... I think the long term fix for this is to either allow the option to run modules such as QEMU, libvirt, docker from the master branch, and allow them to be updated independently to the OS, or to have "bleeding edge" builds where these modules are compiled from master. Easier for me to say, than it is to implement though.
  4. billington.mark

    QEMU PCIe Root Port Patch

    @jonp Ive been under the impression for a long time that latency and performance improvements in QEMU needed the Q35 machine type to be taken advantage of. All development ive seen, and tips to improve performance, all seem to be around using the Q35 machine type. At the end of the day, I want to get as close to bare metal performance as possible, thats my aim. Im in no way preaching that we should all move to Q35. Now i have my own performance numbers pre and post patch, i'll happily test the i440fx machine type too. Ive also posted this over in the Level1Tech forum to ask them the same question, seeing as its them who've pushed for the development on the Q35 machine type to get these PCIe fixes in the first place. As for removing the option in the GUI for Q35 for windows... I think it would be more appropriate to show a warning if Q35 was selected, as apposed to remove the ability to choose it altogether.
  5. Thank you for this. This is a great baseline to compare my Xeon build to.
  6. billington.mark

    QEMU PCIe Root Port Patch

    Im seeing around 5-10% increase in performance on GPU tests with my RTX2080.
  7. Yep, looks like its fixed the driver crippling memory scaling (in windows anyway). Im seeing a 5-10% increase in GPU benchmarks after updating to RC4. Was hoping for more, but it looks like my bottleneck is my aging CPU now! (2x E5-2670). Ive been meaning to put my hand in my pocket and upgrade to a threadripper build for a while now.... Im very interested to see what performance gains you guys are getting after this patch... Thankyou @limetech
  8. Having a build with QEMU from master would benefit everyone, not just you guys with threadripper builds
  9. billington.mark

    QEMU PCIe Root Port Patch

    The original topic of this post was to highlight a particular problem I was having (And still am), but the main underlying point here is that over the last couple of years, development on QEMU, introduction of new hardware from AMD, and the general love for virtualisation on workstation hardware has meant development in this space is moving at quite a pace. Short term, a build which would include virtualisation modules from master would make a lot of people happy, but the same is inevitably going to happen when 3rd gen Ryzen, 3rd gen Threadripper, PCIe4, PCIe5, etc, etc drops in the coming months. Personally, I think the long term holy grail here is to see the ability to choose which branch we're able to run key modules like QEMU, libvirt, docker from... then be able to update and get the latest patches\performance improvements independently of an unraid release. Short term though... a build to keep us all quiet would be lovely
  10. Ive been pushing for the changes detailed in that level1tech forum post for a while... https://forums.unraid.net/topic/77499-qemu-pcie-root-port-patch/ Feel free to post in there to push the issue.. the next stable release of QEMU doesnt look like its coming up until April\May: https://wiki.qemu.org/Planning/4.0. So fingers crossed there's an Unraid release offering that soon after. The alternative is for the @limetech guys to be nice to us and include QEMU from the master branch rather than from a stable release in the next RC.... Considering how many issues it would fix around threadripper, as well as PCIe passthrough performance increases, it would make ALOT of people happy...
  11. billington.mark

    SSD Write performance

    Hi Guys, Ive noticed for a while now that write speeds aren't great from inside windows VMs. When i say 'aren't great', i mean 150MB/s out of a possible 550MB/s from my dedicated VM SSD. Obviously not slow, but I feel like im missing something in my config thats causing write speeds to be hit. Read speeds seem to be fine. The same SSD was running at 550MB/s write when used as a 'normal' drive in my old workstation, so im pretty sure it isnt the drive and its something to do with the VM stuff. I get the same write results when using a raw img file for the HDD (as set up by the GUI) and also when passing through the entire drive (current setup). Here is a screenshot of a drive benchmark, not sure whats going on with 4k read\writes?: I'm using the 109 driver set. XML is here: Stuff ive tried so far... When the HDD was a raw img file, same write speeds when cache='writeback' and cache='native' Passed through entire SSD (Same disk as used withe the raw img file) and restored the same vm from a backup onto the new disk If anyone is running W10 in a VM on an SSD, could you run some benchmarks and let me know if this 'issue' is across the board or just me? Also, if anyone has any suggestions to tweak my setup to get better write speeds, im quite happy to be a guineepig and test stuff. Thanks for your time guys, Mark
  12. billington.mark

    QEMU PCIe Root Port Patch

    i440fx doesnt have any PCIe 'slots' as such. its presenting the GPU to the OS on a pci slot. Again, causing latency and a performance hit compared to bare metal. The CLI tool is to show that when you use Q35, the PCIe root ports are x1, not x16. The issue here is that the NVIDIA driver doesnt corrently initialise the card (on windows anyway), unless it detects its on an x8 or x16 slot. The comments on the patch do a good job of explaining whats going on, and whats being changed here: https://patchwork.kernel.org/cover/10683043/ I'm by no means complaining, but if there's a way to improve performance and get as close to bare metal as possible, i think its worth implementing. 👍
  13. billington.mark

    QEMU PCIe Root Port Patch

    Are you using Q35 or i440fx? The issue here is that the NVIDIA driver is behaving differently if the bus reported is anything less than x8. Also, Latency on the VM as a whole is greatly improved when using Q35 with the patches. Its a long read, but you can see the evolution of these changes on the level1tech forum i linked in the original post.
  14. billington.mark

    QEMU PCIe Root Port Patch

    Yep, and because of that, the NVIDIA driver is reigning in performance. I dont use MacOS, so im not sure if you're able to see this info on the driver... but in either case, x1 root ports will be presented to the VM guest, regardless of the OS its running. Depending on what checks the driver is doing on MacOS, it might have different performance implications than on Windows.
  15. billington.mark

    QEMU PCIe Root Port Patch

    Thats still not fixed. (as much as id like for it to have been that easy!) Have a look in the NVIDIA control panel under system info at the bus in use. (id put money on it being x1!). (image is from the level1 forum as im not at home and cant take a screenshot currently) You can also do a speed test by using the evga utility: https://forums.evga.com/PCIE-bandwidth-test-cuda-m1972266.aspx The patch to add the ability to set pcie root port speeds wasn't present in the 3.1 release (which is what we're on, as of 6.7.0rc2)
  16. billington.mark

    Unraid OS version 6.7.0-rc2 available

    Any chance of having QEMU from the master branch rather than 3.1 in the next release? Or, can these patches be applied: https://patchwork.kernel.org/cover/10683043/
  17. billington.mark

    Unraid OS version 6.7.0-rc1 available

    looks like we're waiting for QEMU 4.0.... https://wiki.qemu.org/Planning/4.0 I dont think the unraid guys compile from source, they'll just grab the latest stable version... which is currently 3.1 The commits im interested in got pushed after the 3.1 release now ive cross referenced all the dates! @jonpIs there any way we could get a 'bleeding edge' build of QEMU built from the master git branch in the next rc maybe? Being able to test threadripper and PCIe root port lane size fixes would be great . Looking at the current schedule for 4.0, it looks like we'll be waiting a few months before the next official release....
  18. billington.mark

    Unraid OS version 6.7.0-rc1 available

    Looks like the fun stuff to fix threadripper issues and PCIe Root ports becoming x16 ports is coming in QEMU 4.0 . I hope i'm wrong though! Did you update your XML to use the new Machine type? Im going to experiment with the PCIe port speed when im home this evening.
  19. billington.mark

    Terrible gaming performance

    FYI, GPU-z lies... if you really want to see what your PCIe lane situation is for your passed through NVIDIA card, have a look in NVIDIA control panel> help> System information. Then scroll down to BUS. This is because the PCIe root ports created on a Q35 machine are x1 ports by default. in QEMU 3.2 (I think), you can add some extra XML to force the root port to be x16. And in 4.0 all root ports will be x16 by default.
  20. billington.mark

    Terrible gaming performance

    I have a very similar setup to you and have diagnosed NUMA headaches for longer than I care to remember! A few things to try.... (which made my performance better). Switch to a Q35 VM. It might not yield any performance increase right now, but there are some changes in the pipeline for QEMU 3.2\4.0 which will increase performance of passed through PCIe devices. (which should be included in the next version of unraid). After youve flipped to Q35, add an emulatorpin value to take the pressure off of core 0 (which it will be using by default). keeping it on the same numa node as your passed through CPUs would most likely be best. so it'll look like this: <vcpu placement='static'>12</vcpu> <cputune> <vcpupin vcpu='0' cpuset='10'/> <vcpupin vcpu='1' cpuset='26'/> <vcpupin vcpu='2' cpuset='11'/> <vcpupin vcpu='3' cpuset='27'/> <vcpupin vcpu='4' cpuset='12'/> <vcpupin vcpu='5' cpuset='28'/> <vcpupin vcpu='6' cpuset='13'/> <vcpupin vcpu='7' cpuset='29'/> <vcpupin vcpu='8' cpuset='14'/> <vcpupin vcpu='9' cpuset='30'/> <vcpupin vcpu='10' cpuset='15'/> <vcpupin vcpu='11' cpuset='31'/> <emulatorpin cpuset='9,25'/> </cputune> Personally, I have my main workstation VM running off of cores on NUMA node 0, so I have my emulatorpin there. With the QEMU service running on Node0 too, it might be worth testing your emulatorpin on that node too, so 7,23 maybe. personally, I also stub those cpu cores the same as the rest to ensure nothing else is stealing cycles from my VM. Add some additional hyper-v enlightenments (i cant remember if all of these are standard with unraid, but here they are anyway) <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vpindex state='on'/> <synic state='on'/> <stimer state='on'/> <reset state='on'/> <vendor_id state='on' value='none'/> <frequencies state='on'/> </hyperv> MSI fix will most likely need to be applied to your GPU and GPU Audio device. https://forums.guru3d.com/threads/windows-line-based-vs-message-signaled-based-interrupts.378044/ (Use the v2 utility) Last but by no means least is that your storage is based on NUMA node 0, and everything else is on node 1. Latency will be an issue here. not sure how viable this will be, but if you can, flip your 1070 into a PCIe slot associated with NUMA node 0, change your cpus to that node too (and your emulatorpin), and see how things are there. Another alternative is if you have a spare hdd controller, with only the SSD you're using, pass that through if you're able to, as it'll cut out the QEMU middleman between Windows and the SSD. I think you'll notice the biggest difference with the emulatorpin change.
  21. billington.mark

    Oculus Rift performance in VM

    You're gonna struggle with performance with only having a 4 core CPU.... Even more so if you're assigning 4 cores to your VM... You'll have a lot more luck upgrading to something like an i7 8700 which has a lot more CPU threads to play with. Before delving into your wallet, id post in the hardware forum and ask for some advice on where to go hardware wise. AMD are offering compelling options at very good price points...... in the short term, you could try: isolating the CPU cores you're using on your VM (search the forum for isolcpus). Id isolate core 2 and 3, leaving 0 and 1 available for unraid\docker to use. assigning core 2 and 3 to the VM setting the emulatorpin value in your VM XML to core 1 Post your VM XML and im sure people will chime in with some more suggestions... but you're gonna struggle with such a small cpu core count to play with.
  22. billington.mark

    RTX2080 passthrough surprise

    So, I received an RTX2080 today (took advantage of the EVGA step up programme as I got a 1080 in July). This is what hardware it presents: IOMMU group 20: [10de:1e87] 03:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1) [10de:10f8] 03:00.1 Audio device: NVIDIA Corporation Device 10f8 (rev a1) [10de:1ad8] 03:00.2 USB controller: NVIDIA Corporation Device 1ad8 (rev a1) [10de:1ad9] 03:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad9 (rev a1) So.. Previous Nvidia cards I've had presented as 2 devices. One graphics device and one audio device. The 'new' extra two are the serial bus (which I assume is the RGB controller), and a usb controller. To my surprise, the usb type-c port on the back of the card actually functions as a full fledged usb port, so I'm able to connect a usb3 hub to it using a 'type-c to a' adapter and no longer need to pass through an additional pci-e usb card! The hub is being powered by the usb port, and has a keyboard, mouse and usb DAC connected with zero issues. Seeing as these cards are quite new and virtualization is a bit niche, I thought I'd put this down in a post for people to see.
  23. billington.mark

    RTX2080 passthrough surprise

    Nothing special to get it to work... Stubbed the device like you usually would, and passed through like any other device. No issues at all! No extra config in Windows needed either. Just plugged in and it worked instantly like any normal usb port.
  24. billington.mark

    RTX2080 passthrough surprise

    Having a look, it doesnt seem its standard across the board on all 20series. might only be on the higher end SKUs? (2070/2080/2080Ti). EVGA doesnt seem to have it on their 2060's