GPU passthrough working with one PCI slot but not another


Recommended Posts

I'm working on passing through a GPU and USB controller to a Windows 10 VM. Below are the components I'm working with.

 

Ryzen 5 2600
Radeon HD 5670
Asrock X370 Taichi mobo (top two x16 slots run at PCIe 3.0 x8/x8 when both are populated. The bottom x16 slot runs at PCIe 2.0 x4) (http://www.asrock.com/mb/amd/x370%20taichi/index.asp#Specification)

 

To start off, I had 2 HBAs in the top two x16 slots (PCIe 3.0 x8/x8) and the GPU in the bottom x16 slot (PCIe 2.0 x4). With that configuration, the USB controller I needed to pass through was in the same IOMMU group as the GPU, but also with a SATA controller, ethernet controller, and a couple other things. So, I used ACS override (Both setting) along with vfio-pci.ids=1022:43b9 (for the USB controller) in the syslinux config and was then able to pass through the USB controller and GPU. This worked fine and I was able to get the VM to display on the monitor. 

 

I was testing a game and was noticing poor performance. Sure, this could have been due to the GPU being old, but I wanted to test it in one of the top two slots that runs at PCIe 3.0 x8 instead of the current slot at PCIe 2.0 x4. So, I moved the HBAs to the bottom two x16 slots and put the GPU in the top x16 slot. I also updated the VM template since the GPU was in a new slot. Now I'm not getting the VM to display on the monitor. The monitor displays as unRAID boots up like it did with the previous config, but just shows a black screen when the VM is started. I can see the VM is starting up since I'm able to RDP into it.

 

I tried each setting for ACS override (Downstream, Multifunction, Both) and none improved the situation.

 

I removed ACS override and vfio-pci.ids=1022:43b9 from the syslinux config to see what stock looked like. The USB controller was still in the same old IOMMU group, but now the GPU is in a group by itself. That seems good.

 

After each unsuccessful attempt to get passthrough working I disabled ACS override, removed the vfio bit in the syslinux config, and removed the GPU and USB controller from the VM template. After doing so the server fails to come back online or display anything when it's restarted. I have to hard boot it to get it to respond again.

 

I tried the following changes in the uefi, but nothing helped.
Enabled 'SR-IOV' in North Bridge config
Enabled 'IOMMU' in AMD CBS\NBIO Common Options\NB Configuration (was set to 'Auto')
Set 'PCIe x16/2x8 Switch' to 2x8 (was 'x16')

 

I also tried adding the addresses for the GPU and associated audio to vfio-pci.ids=, but this caused unRAID not to fully boot and it got stuck at the screen in the attached picture.

 

I do plan to upgrade the GPU and want to make sure it will work in this top slot so it can perform its best and also leave room around the HBAs for effective cooling. Any help on why GPU passthrough is working when the GPU is in the bottom slot, but not when in the top slot and in its own IOMMU group?

IMG_20200426_003325.jpg

elysium-diagnostics-20200425-2255.zip

Link to comment

I'm not sure I understand what you are asking. My apologies.

49 minutes ago, Carbongrip said:

When your moving the cards to different slots are you making sure the vfio pci addresses arn't changing when they get a new slot?

Where can I verify the vfio PCI addresses? When I add the GPU and GPU audio to the VM template it has the same slot numbers (5 & 6) like it did when the card was physically installed in the other slot, if that's what you mean. I normally add multifunction='on' to the 6th line and then change the slot value and function in the 13th line to be 0x05 and 0x1, respectively, but I didn't change it for the screenshot below.

image.png.05841c49ee73cf062051ae7979695238.png

49 minutes ago, Carbongrip said:

You might need to update the syslinux file with new vfio addresses.

Are you saying to add the addresses(from system devices) to the vfio-pci.ids= line in the syslinux file? If so, I tried that but the server wouldn't boot and got stuck at the point in the original screenshot. I also tried using the 'VFIO-PCI CFG' plugin to bing the device, but that resulted in the server getting stuck at the same point in boot up.

 

Link to comment
32 minutes ago, Alabaster said:

Are you saying to add the addresses(from system devices) to the vfio-pci.ids= line in the syslinux file? If so, I tried that but the server wouldn't boot and got stuck at the point in the original screenshot. I also tried using the 'VFIO-PCI CFG' plugin to bing the device, but that resulted in the server getting stuck at the same point in boot up.

 

Yes that's what I was trying to say, what device is at vfio-pci.ids=1022:43b9? When changing slots the device you expect to be 1022:43b9 and grouping could change at least as far as I am aware. Try this, remove every address you added to vfio-pci.ids so it boots, then revisit iommu groups, find the address of each device you intend to passthrough while making sure there arn't new devices conflicting in a iommu group, and add it to vfio-pci.ids= .

Edited by Carbongrip
Link to comment
12 hours ago, Carbongrip said:

what device is at vfio-pci.ids=1022:43b9

It's the onboard USB controller I mentioned in my post. The onboard bluetooth runs through this USB controller and I'm using it to connect an XBox One controller for gaming input. This USB controller has the same address through all the testing and I have no issues booting or passing through the USB controller with just that ID on the line.

12 hours ago, Carbongrip said:

Try this, remove every address you added to vfio-pci.ids so it boots, then revisit iommu groups, find the address of each device you intend to passthrough while making sure there arn't new devices conflicting in a iommu group, and add it to vfio-pci.ids= .

I have tried this. After using ACS override the USB controller, GPU video, and GPU audio are each in their own IOMMU group. Adding the addresses for the GPU and associated audio to the vfio-pci.ids line causes the server not to boot as mentioned.

12 hours ago, Carbongrip said:

Also open the VM configs, remove the pci cards from the virsh xml file and re add them. I think the <source><address></source> changes when the card moves slots so when the VM starts and goes looking for the card it's not their anymore.

I have tried this. After each attempt I would remove the GPU/audio from the VM template and verify the XML reflected the change. The addresses for the GPU/audio did change when physically moving slots. It went from 21:00.0 & 21:00.1 to 22:00.0 & 22:00.1 and the <source><address></source> section of the VM XML also reflected the new address when re-adding the GPU/audio to the VM. 

Link to comment

I'll add some screenshots to hopefully help visualize this. With ACS override set to Both, system devices shows the following. The USB controller and GPU/audio I want to pass through are in groups 15, 31, and 32...

image.thumb.png.f1d7fd0ff8911e70ab4606ae9289c259.png

 

I have this line in the syslinux config. 1022:43b9 is for the USB controller from group 15 in the above screenshot.

image.png.50486f6507b6d3b18d07831a5e129b94.png

 

I add the GPU/audio and USB controller to the VM form as shown below and click Update.

image.png.ea2c6f4961dc7052469525b1243d41fb.png

 

The resulting XML is below and looks to be correct...

image.png.35004f42f5a5c81924cf4b1ce627dc35.png

 

I then add multifunction='on' to the 6th line and update the slot and function on the 13th line for the audio. This is shown below...

image.png.6f075869e629de82377cb1765435b1f6.png

 

I can start the VM, but nothing displays on the monitor. I then RDP'd into the VM and shut it down. If I remove the GPU/audio and USB controller from the VM form, verify the devices are also removed from the VM XML, disable ACS override, and remove the vfio-pci.ids section from the syslinux config the server fails to respond to a shutdown or reboot. I then have to hardboot it to get it responding again.

 

I also tried adding the GPU/audio IDs to the syslinux config as show below, but this prevented the server from booting and it got stuck at the screenshot from my original post.

image.png.736469686ed9635dedae16e4162ff2a1.png

 

Link to comment

Update: The good news is that the server is not failing to boot after adding the GPU pci ids to the vfio-pci.ids line in the syslinux config. I just needed to wait longer for the server to boot and come back online.

 

The bad news is I'm still having the same issue...The video output for the VM is not displayed on the monitor. Also, after starting the VM, and also stopping it later, the server will not respond to a shutdown/reboot and I need to hardboot it to get it responding again. 

 

Anyone have any other ideas?

Link to comment

When you RDP into the VM after assigning the GPU and not getting a signal, is the GPU seen by the guest OS at all in windows device manager? 

Since this is a Ryzen machine, did you redo the steps to have unraid not grab the GPU after changing slots? I haven't done this so i may be way off base, but my understanding was you have to blacklist the card so that way unraid doesn't grab it for use as the only display adaptor. Since it was working before, I'm sure you at least know what I'm attempting to ask haha.

 

Also, sometimes going back to the basics helps so don't laugh at the suggestion, but did you make sure that the HDMI/DP cable is inserted fully into the GPU and the monitor? And that the monitor is set to the right input? Again, sometimes the obvious things trip us up so it's worth verifying. Even the best of us have overlooked these things before.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.