Jump to content
ljm42

Guide: bind devices to vfio-pci for easy passthrough to VMs

5 posts in this topic Last Reply

Recommended Posts

Posted (edited)

At times you will want to "hide" devices from Unraid so that they can be passed through to a VM.

 

Unraid Prior to 6.7
In the past (pre Unraid 6.7) we would stub the device by adding a Vendor:Device code to the vfio-pci.ids parameter in Syslinux, something like this:

append vfio-pci.ids=8086:1533

This worked, but had several downsides:

  • If you have multiple devices with the same Vendor:Device code, all of them would be stubbed (hidden) from Unraid
  • It is a fairly technical process to find the right Vendor:Device code and modify the syslinux file. Make a mistake and your system won't boot!


As an alternative, you could add the <Domain:Bus:Device.Function> string to the xen-pciback.hide parameter in Syslinux:

append xen-pciback.hide=0000:03:00.0

This had downsides too:

  • Still a technical / risky process
  • If you add/remove hardware after modifying syslinux, the pci address could change and the wrong device could end up being stubbed. This would cause problems if a critical disk controller or NIC were suddenly hidden from Unraid
  • This broke in Unraid 6.7. More details

 

Unraid 6.7
Starting with Unraid 6.7 we could bind devices to the vfio-pci driver based on the <Domain:Bus:Device.Function> string (aka pci address). You needed to manually modify the config/vfio-pci.cfg file and specify the <Domain:Bus:Device.Function> string, like this:

BIND=03:00.0

This worked, but still had several downsides:

  • It was a fairly technical process to find the right string to place in the file. But at least if anything went wrong you could simply delete the config file off the flash drive and reboot.
  • We still had the problem where if you add/remove hardware after modifying the file, the pci addresses could change and the wrong device could end up being bound to vfio-pci


Unraid 6.9

For Unraid 6.9, Skittals has incorporated the excellent "VFIO-PCI Config" plugin directly into the Unraid webgui. So now from the Tools -> System Devices page you can easily see all of your hardware and which IOMMU groups they are in. Rather than editing the config file by hand, simply add a checkbox next to the devices that you want to bind to vfio-pci (aka hide from Unraid). If a device is being used by Unraid (such as a USB controller, disk controller, etc) then the web interface will prevent you from selecting it.

Additionally, we have a new version of the underlying vfio-pci script which can prevent the wrong devices from being bound when hardware is added or removed. When you click to bind a device on the System Devices page, it will write both the <Domain:Bus:Device.Function> and the <Vendor:Device> code to the config file, like this:

BIND=0000:03:00.0|8086:1533

In this example, the updated script will bind the device at pci address 0000:03:00.0, but only if the <Vendor:Device> code is 8086:1533. If a different <Vendor:Device> code is found at that address, it will not bind. This means we will never inadvertently bind a device that is important to Unraid! (However, since the desired device is not available to be bound, the VM expecting that device may not function correctly.)

Devices bound in this way can be passed through to your VMs by going to the VM tab, editing the template, and then selecting the appropriate device from one of the hardware dropdowns. Can't find it? Check under "Other PCI Devices".

If the System Devices page shows that multiple devices are in the same IOMMU group, it will automatically bind all the devices in that group to vfio-pci.  You should then pass all devices in that IOMMU group to the same VM.

Note: If you make hardware changes after setting this up, it would be a good idea to disable autostart on your VMs first. Then shutdown, add/remove hardware as needed, and boot back into Unraid. Visit the Tools -> System Devices page and ensure the correct devices are still being bound to vfio-pci. Adjust as needed and reboot, then start your VMs.

Troubleshooting Tips

  • If you had the VFIO-PCI Config plugin installed, you should remove it as that functionality is now built-in to Unraid 6.9
  • General tip for Unraid - if you intend to try something that feels risky, go to Settings -> Disk Settings and disable Array Auto Start before you shutdown. This will minimize the chance of data loss on boot. If all goes well you can start the array up after booting.
  • If you bind your only video card then Unraid probably won't boot. See the next point.
  • The System Devices page writes the device details to config/vfio-pci.cfg file on the flash drive. If you ever want to "start fresh" simply delete this file and reboot.
  • (New in beta24) If there was a vfio-pci.cfg file to process during boot, System Devices will include a "View VFIO-PCI Log" button that details each of the devices that were (un)successfully bound during boot, along with any available error messages.
  • Be sure to upload your diagnostics ( Tools -> Diagnostics ) when requesting help as both the config file and the log are included in it

 

 

Hopefully this is helpful :) Feel free to let me know in the comments if anything is unclear or wrong.

 

 

Edited by ljm42
  • Like 3
  • Thanks 6

Share this post


Link to post
Posted (edited)

Hi! Nice. Still on unraid 6.8.1 right now, and I managed to passthrough a quadro card without stubbing with pci-stub.ids= like I used to, or using the VFIO-PCI Config plugin, it handled it gracefully with just pcie_acs_override=downstream and type1.allow_unsafe_interrupts=1.

 

Though I'm facing an issue with onboard NIC and the VFIO-PCI Config plugin.


Dell poweredge R720 here, using 2 other (better) network cards to actually connect the host around to network and back to back stuff, I would have liked to use all 4 'onboard' ports for some pf-sense-VM and routing tests.

 

So I went on and used the VFIO-PCI Config plugin to stub all 4 ports (cause they each appear as their own subdevice)image.png.2ff7adab4dbcc1d8e4cedc4eb0be251d.png 

but as you can see on that screenshot, UNRAID for some reason keeps grabbing and using 2 of the 4 ports, for NO reason, since, atm, and at boot, no ethernet cables were even plugged in these, and they were all unconfigured and port-down interfaces.

 

 

In network setting, showing in Mac address selection are eth6 and eth7, the two "derping" ports 01:00.0 and 01:00.1, but for some reason only one of the two is actually showing up at all as an available and configurable port (which it shouldn't at all, I don't want them grabbed by unraid)

image.thumb.png.313a48ec58c02aa3d3f5b9a84e8f28e2.png

 

 

And check that sweet iDrac I can see how the onboard management sees the derping, please note they are in reverse order between IOMMU and unraid/idrac:
port one, used to be eth4, good, not grabbed:

image.thumb.png.f9e7f519d8b9ce1a563dfd606f1a3575.png

port two, used to be eth5, good, not grabbed:

image.thumb.png.ec74721ee02a2b41dbc95bfb4156a354.png

port three, corresponding eth6, half a$$ grabbed, but no difference seen on idrac:

image.png.4ecef2fc80412f78616b25d096c8cdae.png

port four, corresponding eth7, fully grabbed, and seen as functional

image.png.a684d419e69d0a65431cc6c0b097e668.png

 

 

I just don't get how it can fail, but since these devices are reset capable, it would be handy to know if there is a way to tell unraid "Bad! Bad hypervisor! Now sit!", and forcefully unload the devices without causing a kernel panic.

If there is, that could be a life-saving option in the up-coming 6.9, to be able to tell unraid to just auto-unload some devices after booting, when they are known to hook themselves up for no reason.

 

Please note, and I repeat, there are no cables plugged in these 4 ports, nothing configured to hook to them, and iDrac has its own dedicated port that isn't linked to the 'onboard' NIC (which in fact is on a mezzanine card.)

 

If you could light my lantern there, I have no explanation of why Unraid is acting stubborn with this NIC, while handling GPU passthrough so blissfully on the other hand.

Edited by Keexrean

Share this post


Link to post

 

 

2 hours ago, Keexrean said:

it handled it gracefully with just pcie_acs_override=downstream and type1.allow_unsafe_interrupts=1.

I wouldn't call those settings "graceful". As I understand it, they are a last resort hack, and not ideal at all. I'd recommend Googling them and moving away from them if at all possible. It may be that your hardware isn't capable of doing all the passthroughs that you want to do.

 

2 hours ago, Keexrean said:

UNRAID for some reason keeps grabbing and using 2 of the 4 ports, for NO reason, since, atm, and at boot, no ethernet cables were even plugged in these, and they were all unconfigured and port-down interfaces.

The VFIO-PCI process happens before Unraid installs any drivers.  For some reason VFIO-PCI is failing to bind all the ports so Unraid goes ahead and installs the drivers.

 

In Unraid 6.9 the VFIO-PCI process logs everything it does, so hopefully there would be a clue as to why it is not binding as expected. I would try it in 6.9.

Share this post


Link to post
Posted (edited)

I get that pcie_acs_override=downstream and type1.allow_unsafe_interrupts=1 are kind of old-school methods now, and are just settings that were ported over from when I was running the same unraid install in an other box, an HP Z600 workstation, that definitively needed them to be able to passthrough something.

(I also used to shuffle my PCIe devices a lot, so using that instead of targetted-stubbing was just some kind of comfort method, also I never used 'vfio-pci.ids=', always did 'pci-stub.ids=')

 

I'll admit with no shame what so ever that I just took my drives, HBA and NICs out of the Z600, slapped them in the R720 I booted it with little to no care in early 2020. I might be part of this year's curse theme.

 

I called that gracefull as in 'it went smoothly'. Power down the server, slap GPU in, boot, little xml edition of the VM to set multifunction and bring the sound device on the same virtual slot, vbios rom, VM boots right away, drivers install, CUDA acceleration work and passmark is in the range. And since it has been running stable for over a week, and through about 30 VM restarts, as long as it doesn't catch fire, I call that good enough.

 

 

As for the NICs! Saying they were unconfigured, I used them at some point for Unraid's main networking, as it can be seen in 

 

This 'onboard' stock NIC did showed some unreliable behavior before, that I attributed to heat most likely, heavy usage + quite low airflow setting for my ears sanity (running the R720's fans between 7 and 12% speed, managed through an ipmitool script so disgusting that it would probably make you burp blood)

And since no one seemed to be eager to even throw me a ball on that thread back then, I got fed up of the situation of unreliability, and decided to move (and upgrade) to a 2x10gbps base-T card with active cooling for the server's main networking, while I already had a SFP+ card that was dedicated to only back-to-back connections.

 

 

Eth 4 5 6 and 7 still have their custom names in the network settings panel if I unstub them, but since have just stock-nothing settings dialed in, aren't linked to any network/bridge or anything, and all are "down".

And I'm starting to think that MAYBE part of the card's unreliability back when I was using it as main NIC isn't all about heat, but lies deeper. It would indeed be interesting to see what insight the features of the 6.9 release would give on the issue.

 

But I feel like whenever it's gonna bother me enough (which will probably happen before 6.9 release comes out), I'll go give some try to some sad-milking methods, like echo -n /sys/bus/pci/drivers/igb/0000:01:00.3 > unbind or just rmmod it.

Edited by Keexrean

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.