QEMU PCIe Root Port Patch


Recommended Posts

On 2/25/2019 at 5:00 PM, jonp said:
On 2/24/2019 at 5:17 AM, GHunter said:

For migration purposes, keeping Q35 is about the only way to properly migrate a VM to ensure hardware compatibility with a new server.

Not sure where you heard that, but it's not true.  i440fx works perfectly fine when moving a VM from one set of electronics to another.  Not sure where you got the impression otherwise.

 

I think you're misunderstanding what I'm trying to say. I use Q35 Windows VM's currently. I need to migrate them to my new hardware. My concern is that the VM XML will be different if the XML is generated based on the hardware of the unRaid server. If the XML is the same regardless of hardware, then this would be a non-issue. If not, I'll be in trouble if Q35 is removed as I won't be able to properly generate the XML.

 

On 2/25/2019 at 5:00 PM, jonp said:
On 2/24/2019 at 5:17 AM, GHunter said:

My other thought is that migrating Windows based VM's from q35 to i440fx could invalidate the Windows license, although I don't know if this is a real problem or not.

If you change the motherboard of your computer, Windows wants you to call MS to reactivate the license (if you're not signed in with a Microsoft account).  This applies the same with virtual motherboards like i440fx and Q35.  If you've associated your registered copy of Windows with your Microsoft account, you'll just need to resign in once you change the gear and all is good.

 

All my Windows 10 VM's were upgraded from Windows 7 or 8 and from what I understand, it is an OEM license and can only be activated on one machine. A server hardware upgrade or change to i440fx would kill my activated licenses with no recourse other than to buy new licenses. So keeping Q35 would be helpful in this case.

Link to comment
6 hours ago, GHunter said:

 

I think you're misunderstanding what I'm trying to say. I use Q35 Windows VM's currently. I need to migrate them to my new hardware. My concern is that the VM XML will be different if the XML is generated based on the hardware of the unRaid server. If the XML is the same regardless of hardware, then this would be a non-issue. If not, I'll be in trouble if Q35 is removed as I won't be able to properly generate the XML.

 

 

All my Windows 10 VM's were upgraded from Windows 7 or 8 and from what I understand, it is an OEM license and can only be activated on one machine. A server hardware upgrade or change to i440fx would kill my activated licenses with no recourse other than to buy new licenses. So keeping Q35 would be helpful in this case.

 

The only thing in the xml that is based on your hardware is the PCI number of passed through devices. So all you have to do when moving to new hardware is to choose the right devices in the vm template. And of course paths to vdisks, if you change the folder for it. 

 

As far as I know, you don't need to activate windows just by changing from Q35 to i440. At least I didn't have to when I changed from i440 to Q35. I think windows is using the UUID, so as long as it stays the same, you are safe. 

Link to comment
On 2/24/2019 at 12:30 AM, jonp said:

Feedback is critical here.  We are seriously considering removing the option to select Q35 for Windows-based VMs from VM manager.  If you can provide a use-case for why we shouldn't, please do!!

I saw a post from a Unraid user over in the original forum with the dev for this patch asking why to use Q35 or why not (and no it wasn't me). Here is his answer. 

 

https://forum.level1techs.com/t/increasing-vfio-vga-performance/133443/176

 

Sure it all depends on the usecase if you should use Q35 or i440fx. Some users here reported they had issues passing through an PCIe device like an GPU and using Q35 out of the box worked for them, so why you wanna remove it than? i440fx is used for years now and should be stable, I get it, but

Quote

"The Idea that UnRaid will remove Q35 or even warn about using a newer platform topology that even the Qemu developers are trying to push is idiotic."

That's what gnif stated and also kinda my opinion. Why removing something thats in heavy development and will be the way to go in the future. Another thing he stated is the following and might be a reason why for me on i440fx 3Dmark for example never worked.

 

Quote

"Also we have evidence that shows that when the driver detects a link speed of 0x, or that it’s not on PCIe (ie, i440fx), it programs the SoC differently. Evidence both through benchmarks and the AMDGPU source code in the Linux Kernel which has an todo to implement the PCIe 3.0 specific configuration in: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/soc15.c#L446 "

At the start of the program a system profiler is running at the beginning collecting system information. This process always frooze for me and caused the VM to crash. Switching to Q35 and changing nothing else and 3Dmark runs fine. This is only one example for some piece of software that might crash or doesn't work in it's full functionality and I'am sure there will be other situations where the i440fx machine type can cause issues not reading the right hardware specific informations correctly. The only thing that changed in this example is to use the newer machine type reporting the correct link speed of x16 by the driver instead of 0x. I can't prove thats the reason why 3Dmark works in Q35 for me and not in i440fx, but as long as it works, it's the option to go with. 

 

As much as I love Unraid and believe in @limetech to do the right decisions the same i believe in Gnif opinion and appreciate what he does, btw. the guy who developed the Threadripper PCI reset patch that made it into the kernel.

 

https://patchwork.kernel.org/patch/10181903/

 

I would like to have no changes in unraid in this specific topic. Keep the Q35 as an option for the user and have the i440fx as default as it is right now. 

 

Thanks

Edited by bastl
Link to comment
2 hours ago, saarg said:

The only thing in the xml that is based on your hardware is the PCI number of passed through devices. So all you have to do when moving to new hardware is to choose the right devices in the vm template. And of course paths to vdisks, if you change the folder for it. 

 

As far as I know, you don't need to activate windows just by changing from Q35 to i440. At least I didn't have to when I changed from i440 to Q35. I think windows is using the UUID, so as long as it stays the same, you are safe.

 

Thanks. This is exactly what I needed to know! I feel better now and ditching Q35 should be a non-issue for me then. I'll convert all my Windows VM's to i440fx to future proof the VM's.

Link to comment
15 hours ago, saarg said:

As far as I know, you don't need to activate windows just by changing from Q35 to i440. At least I didn't have to when I changed from i440 to Q35. I think windows is using the UUID, so as long as it stays the same, you are safe.

 

@jonp I was able to successfully switch a VM from Q35 to i440fx and activate my Windows 10 license. Note that I have all OEM Windows license's because they were OS upgrades from previous Windows versions. OEM licenses get 1 activation so transferring to a new PC or VM invalidates them. These Windows license's seem to be tied to the VM uuid and that's how I was able to do a successful conversion. However changing the uuid takes a bit of work as it isn't very easy to do.

Link to comment
On 2/28/2019 at 2:57 AM, bastl said:

As much as I love Unraid and believe in @limetech to do the right decisions the same i believe in Gnif opinion and appreciate what he does, btw. the guy who developed the Threadripper PCI reset patch that made it into the kernel.

 

https://patchwork.kernel.org/patch/10181903/

Not to be pedantic, but this was resolved through an AGESA update, the patch never made it into the kernel, though it was helpful in identifying the issue.

Link to comment
On 2/28/2019 at 9:57 AM, bastl said:

I saw a post from a Unraid user over in the original forum with the dev for this patch asking why to use Q35 or why not (and no it wasn't me)

It was me :)

 

I think the current behaviour in the UI is perfect. Pick an OS, and the sensible, least hassle settings are there for you to use. I dont think options to change the machine type should be removed. At worse, they could possibly be hidden behind an "advanced" switch (which i think currently flips between the form and the xml), then having another tab to view xml instead?...

I know there's a balance to be found to accommodate all levels of unraid users here, and i dont envy the UI decisions to try and keep everyone happy!

 

It is worth pointing out that its documented the drivers DO behave differently based on what PCIe link speed they detect, and personally i get better performance numbers, and prefer running a Q35 based VM...

 

I think the long term fix for this is to either allow the option to run modules such as QEMU, libvirt, docker from the master branch, and allow them to be updated independently to the OS, or to have "bleeding edge" builds where these modules are compiled from master. Easier for me to say, than it is to implement though. 

 

Edited by billington.mark
Link to comment
On 2/24/2019 at 12:30 AM, jonp said:

Feedback is critical here.  We are seriously considering removing the option to select Q35 for Windows-based VMs from VM manager.  If you can provide a use-case for why we shouldn't, please do!!

As fas as i know, you can't install the newest AMD Relive Drivers on i440, only on Q35. The latest driver that worked on i440 without the driver crashing with a black screen is 18.2.1 (or 18.2.3?). Every driver after that has the same problem. Q35 just works out-of-the-box. Had the problem with a RX570 and a Vega56. Another user here in the forum had the same issue and changing to Q35 fixed this. (Ozon was his name as far as i remember).

Link to comment
15 hours ago, suRe said:

As fas as i know, you can't install the newest AMD Relive Drivers on i440, only on Q35. The latest driver that worked on i440 without the driver crashing with a black screen is 18.2.1 (or 18.2.3?). Every driver after that has the same problem. Q35 just works out-of-the-box. Had the problem with a RX570 and a Vega56. Another user here in the forum had the same issue and changing to Q35 fixed this. (Ozon was his name as far as i remember).

I just signed to confirm this. I had to create all my VM's with Q35 due to this issue when drivers are installed using i440 with an MSI RX 470 Gaming X. Only way to workaround the problem was converting/creating my VMs to Q35.

Adittionally, with Q35 i can have Hyper-V turned ON with a GTX 1070 without suffering the error 34 (or was it 43?). I don't know if it's just a coincidence (the combination of my mobo and GPU,some bios option or whatever) but actually i can enjoy like 30 or 40 more fps in games thx to being able to enable HyperV in a gaming VM at the moment. I'm actually using the latest drivers without having to fix anything at all.

For me at least, being unable to use Q35 will be a serious NO to continue using unraid right now.

 

Link to comment
On 3/8/2019 at 7:44 PM, GHunter said:

 

No. To take advantage of this patch, you'll have to continue to edit the XML manually.

 

This will become easier when QEMU 4.0 is released as any VM making use of the new 4.0 machine version will automatically get their PCIe root ports upgraded.  The upstream patch needs to maintain backwards compatibility for migration when older machine versions are used, thus requiring XML modding in the interim.

Link to comment
  • 2 weeks later...

QEMU 4.0 RC0 has been released - https://www.qemu.org/download/#source

And a nice specific mention in the changelog to things discussed in this thread (https://wiki.qemu.org/ChangeLog/4.0):

 

Quote

Generic PCIe root port link speed and width enhancements: Starting with the Q35 QEMU 4.0 machine type, generic pcie-root-port will default to the maximum PCIe link speed (16GT/s) and width (x32) provided by the PCIe 4.0 specification. Experimental options x-speed= and x-width= are provided for custom tuning, but it is expected that the default over-provisioning of bandwidth is optimal for the vast majority of use cases. Previous machine versions and ioh3420 root ports will continue to default to 2.5GT/x1 links.

 

Now that these changes are standard with the Q35 machinetype in 4.0, I think this could also be an additional argument against potentially forcing Windows based VMs to the i440fx machine type if this brings things into performance parity?

 

If @limetech could throw this into the next RC for people to test out, that would be much appreciated!

Edited by billington.mark
  • Like 3
Link to comment
  • 3 weeks later...
On 3/20/2019 at 4:34 AM, billington.mark said:

QEMU 4.0 RC0 has been released - https://www.qemu.org/download/#source

And a nice specific mention in the changelog to things discussed in this thread (https://wiki.qemu.org/ChangeLog/4.0😞

 

 

Now that these changes are standard with the Q35 machinetype in 4.0, I think this could also be an additional argument against potentially forcing Windows based VMs to the i440fx machine type if this brings things into performance parity?

 

If @limetech could throw this into the next RC for people to test out, that would be much appreciated!

I second it.. Even split it off if you'd like and just let people who want to take the risk take it.. My latency is down substantially through tons of tweaks. I've made but I still get latency spikes.. Hoping to get it down even further..

 

Just like to point out.. I don't know what causes my latency spikes.. Sometimes (usually a fresh restart) my latency spikes will be no where to be seen. Game runs smooth everything is great.. Like now though.. been up for a few days and its stutter city. 32 cores 64 threads and the system is barely being used. I wish I could figure out where the latency is coming from.. I do memory tests and it comes back nice and responsive. GPU tests are meh but no reason why that I can tell.

Edited by Jerky_san
Link to comment
  • 2 weeks later...
  • 2 months later...
On 4/10/2019 at 4:29 AM, Jerky_san said:

Would like to report I think I fixed my latency spikes.. I still get one every once in a while when gaming but nearly as bad. I figured out that it was gsync being enabled.. Don't know why it broke it but it did.

Did you disable it completely? I get stutters like every 30 seconds in CPU intesive games.. It gets really bad if I play a youtube video at the same time on a second monitor..

 

Did you switch to Q35 as well?

Link to comment

Update: this fixed the stutters for me and the games run much smoother. I also don't have any troubles playing a youtube videos on the other screen anymore while gaming. I use the HDMI output of the GPU for sound output and this maybe also had somthing to do with the issues I experienced beforehand.

 

I had some difficulties changing the VM to Q35 and finally found that the easiest way for me was:

Backup current configuration XML and switch to GUI configuratione

Change GPU of VM to VNC > Update

Change Machine Type to Q35 > Update

Change GPU to your GPU > Update

Switch back to XML View and compare the original one with the new one

Apply performance patches to the new XML > Update

 

Link to comment
  • 3 months later...
On 1/29/2019 at 9:02 AM, billington.mark said:

Please can the following patch be applied to QEMU (until QEMU 4.0 is bundled with unraid, as this fix is already present in master)

 

PCIe root ports are only exposed to VM guests as x1, which results in GPU pass-through performance degradation, and in some cases on higher end NVIDIA cards, the driver doesn't initialise some features of the card. 

 

https://patchwork.kernel.org/cover/10683043/

 

Once applied, the following would be added to the VMs XML, to modify the PCIe root ports to be x16 ports:

 


<qemu:commandline>	
	<qemu:arg value='-global'/> 
	<qemu:arg value='pcie-root-port.speed=8'/> 
	<qemu:arg value='-global'/> 
	<qemu:arg value='pcie-root-port.width=16'/>
 </qemu:commandline>

Patch is well documented over here too: https://forum.level1techs.com/t/increasing-vfio-vga-performance/133443 

 

This would also increase performance of any other passed through PCIe devices which use more bandwidth provided by an x1 port (NVMe, 10Gb NICs, etc).

 

If we could have QEMU compiled from master instead of the releases though... that would be even better!

Changing to q35 and apply this patch fixed my code 43 error in win 10 vm.  THANK YOU SO MUCH!!! Been dealing with this ever since I upgraded to ryzen 3900x. Now I can game again!!!! Plz continue to support q35!!!!

Link to comment
  • 4 weeks later...
On 2/23/2019 at 11:30 PM, jonp said:

Feedback is critical here.  We are seriously considering removing the option to select Q35 for Windows-based VMs from VM manager.  If you can provide a use-case for why we shouldn't, please do!!

@jonp: by "select Q35 for Windows-based VMs", you meant as part of Unraid pre-built VM templates?

As far as I know, the VM xml doesn't have any tag that says "expected OS" or something like that. Picking PC type = Q35 is a generic qemu option so it would take effort on Unraid dev part to disallow it. I sincerely hope you are not talking about disallowing Q35 as a blanket ban because that would be a catastrophic mistake. Putting effort into banning something that causes no harm to the majority of users while possibly helps some (even if a niche group) is nuts.

On a related note, I believe qemu 4.1 (Unraid 6.8.0-rc) no longer requires the patch. I removed the custom xml tags and my PCIe runs at full x16 speed as far as I can tell.

Link to comment
On 10/23/2019 at 7:37 AM, testdasi said:

On a related note, I believe qemu 4.1 (Unraid 6.8.0-rc) no longer requires the patch. I removed the custom xml tags and my PCIe runs at full x16 speed as far as I can tell.

Do you still need to have the extra root hub verbage in the xml?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.