Guide: bind devices to vfio-pci for easy passthrough to VMs


ljm42

Recommended Posts

TL;DR - skip to the second post if you just want to know how to convert from legacy PCI stubbing via Syslinux to the new point and click method in Unraid 6.9

---

At times you will want to "hide" devices from Unraid so that they can be passed through to a VM.

 

Unraid Prior to 6.7
In the past (pre Unraid 6.7) we would stub the device by adding a Vendor:Device code to the vfio-pci.ids parameter in Syslinux, something like this:

append vfio-pci.ids=8086:1533

This worked, but had several downsides:

  • If you have multiple devices with the same Vendor:Device code, all of them would be stubbed (hidden) from Unraid
  • It is a fairly technical process to find the right Vendor:Device code and modify the syslinux file. Make a mistake and your system won't boot!


As an alternative, you could add the <Domain:Bus:Device.Function> string to the xen-pciback.hide parameter in Syslinux:

append xen-pciback.hide=0000:03:00.0

This had downsides too:

  • Still a technical / risky process
  • If you add/remove hardware after modifying syslinux, the pci address could change and the wrong device could end up being stubbed. This would cause problems if a critical disk controller or NIC were suddenly hidden from Unraid
  • This broke in Unraid 6.7. More details

 

Unraid 6.7
Starting with Unraid 6.7 we could bind devices to the vfio-pci driver based on the <Domain:Bus:Device.Function> string (aka pci address). You needed to manually modify the config/vfio-pci.cfg file and specify the <Domain:Bus:Device.Function> string, like this:

BIND=03:00.0

This worked, but still had several downsides:

  • It was a fairly technical process to find the right string to place in the file. But at least if anything went wrong you could simply delete the config file off the flash drive and reboot.
  • We still had the problem where if you add/remove hardware after modifying the file, the pci addresses could change and the wrong device could end up being bound to vfio-pci


Unraid 6.9

For Unraid 6.9, Skittals has incorporated the excellent "VFIO-PCI Config" plugin directly into the Unraid webgui. So now from the Tools -> System Devices page you can easily see all of your hardware and which IOMMU groups they are in. Rather than editing the config file by hand, simply add a checkbox next to the devices that you want to bind to vfio-pci (aka hide from Unraid). If a device is being used by Unraid (such as a USB controller, disk controller, etc) then the web interface will prevent you from selecting it.

Additionally, we have a new version of the underlying vfio-pci script which can prevent the wrong devices from being bound when hardware is added or removed. When you click to bind a device on the System Devices page, it will write both the <Domain:Bus:Device.Function> and the <Vendor:Device> code to the config file, like this:

BIND=0000:03:00.0|8086:1533

In this example, the updated script will bind the device at pci address 0000:03:00.0, but only if the <Vendor:Device> code is 8086:1533. If a different <Vendor:Device> code is found at that address, it will not bind. This means we will never inadvertently bind a device that is important to Unraid! (However, since the desired device is not available to be bound, the VM expecting that device may not function correctly.)

Devices bound in this way can be passed through to your VMs by going to the VM tab, editing the template, and then selecting the appropriate device from one of the hardware dropdowns. Can't find it? Check under "Other PCI Devices".

If the System Devices page shows that multiple devices are in the same IOMMU group, it will automatically bind all the devices in that group to vfio-pci.  You should then pass all devices in that IOMMU group to the same VM.

Note: If you make hardware changes after setting this up, it would be a good idea to disable autostart on your VMs first. Then shutdown, add/remove hardware as needed, and boot back into Unraid. Visit the Tools -> System Devices page and ensure the correct devices are still being bound to vfio-pci. Adjust as needed and reboot, then start your VMs.

Troubleshooting Tips

  • If you had the VFIO-PCI Config plugin installed, you should remove it as that functionality is now built-in to Unraid 6.9
  • General tip for Unraid - if you intend to try something that feels risky, go to Settings -> Disk Settings and disable Array Auto Start before you shutdown. This will minimize the chance of data loss on boot. If all goes well you can start the array up after booting.
  • If you bind your only video card then Unraid probably won't boot. See the next point.
  • The System Devices page writes the device details to config/vfio-pci.cfg file on the flash drive. If you ever want to "start fresh" simply delete this file and reboot.
  • If there was a vfio-pci.cfg file to process during boot, System Devices will include a "View VFIO-PCI Log" button that details each of the devices that were (un)successfully bound during boot, along with any available error messages.
  • Be sure to upload your diagnostics ( Tools -> Diagnostics ) when requesting help as both the config file and the log are included in it
  • Like 5
  • Thanks 7
Link to comment
  • 3 weeks later...

Hi! Nice. Still on unraid 6.8.1 right now, and I managed to passthrough a quadro card without stubbing with pci-stub.ids= like I used to, or using the VFIO-PCI Config plugin, it handled it gracefully with just pcie_acs_override=downstream and type1.allow_unsafe_interrupts=1.

 

Though I'm facing an issue with onboard NIC and the VFIO-PCI Config plugin.


Dell poweredge R720 here, using 2 other (better) network cards to actually connect the host around to network and back to back stuff, I would have liked to use all 4 'onboard' ports for some pf-sense-VM and routing tests.

 

So I went on and used the VFIO-PCI Config plugin to stub all 4 ports (cause they each appear as their own subdevice)image.png.2ff7adab4dbcc1d8e4cedc4eb0be251d.png 

but as you can see on that screenshot, UNRAID for some reason keeps grabbing and using 2 of the 4 ports, for NO reason, since, atm, and at boot, no ethernet cables were even plugged in these, and they were all unconfigured and port-down interfaces.

 

 

In network setting, showing in Mac address selection are eth6 and eth7, the two "derping" ports 01:00.0 and 01:00.1, but for some reason only one of the two is actually showing up at all as an available and configurable port (which it shouldn't at all, I don't want them grabbed by unraid)

image.thumb.png.313a48ec58c02aa3d3f5b9a84e8f28e2.png

 

 

And check that sweet iDrac I can see how the onboard management sees the derping, please note they are in reverse order between IOMMU and unraid/idrac:
port one, used to be eth4, good, not grabbed:

image.thumb.png.f9e7f519d8b9ce1a563dfd606f1a3575.png

port two, used to be eth5, good, not grabbed:

image.thumb.png.ec74721ee02a2b41dbc95bfb4156a354.png

port three, corresponding eth6, half a$$ grabbed, but no difference seen on idrac:

image.png.4ecef2fc80412f78616b25d096c8cdae.png

port four, corresponding eth7, fully grabbed, and seen as functional

image.png.a684d419e69d0a65431cc6c0b097e668.png

 

 

I just don't get how it can fail, but since these devices are reset capable, it would be handy to know if there is a way to tell unraid "Bad! Bad hypervisor! Now sit!", and forcefully unload the devices without causing a kernel panic.

If there is, that could be a life-saving option in the up-coming 6.9, to be able to tell unraid to just auto-unload some devices after booting, when they are known to hook themselves up for no reason.

 

Please note, and I repeat, there are no cables plugged in these 4 ports, nothing configured to hook to them, and iDrac has its own dedicated port that isn't linked to the 'onboard' NIC (which in fact is on a mezzanine card.)

 

If you could light my lantern there, I have no explanation of why Unraid is acting stubborn with this NIC, while handling GPU passthrough so blissfully on the other hand.

Edited by Keexrean
Link to comment

 

 

2 hours ago, Keexrean said:

it handled it gracefully with just pcie_acs_override=downstream and type1.allow_unsafe_interrupts=1.

I wouldn't call those settings "graceful". As I understand it, they are a last resort hack, and not ideal at all. I'd recommend Googling them and moving away from them if at all possible. It may be that your hardware isn't capable of doing all the passthroughs that you want to do.

 

2 hours ago, Keexrean said:

UNRAID for some reason keeps grabbing and using 2 of the 4 ports, for NO reason, since, atm, and at boot, no ethernet cables were even plugged in these, and they were all unconfigured and port-down interfaces.

The VFIO-PCI process happens before Unraid installs any drivers.  For some reason VFIO-PCI is failing to bind all the ports so Unraid goes ahead and installs the drivers.

 

In Unraid 6.9 the VFIO-PCI process logs everything it does, so hopefully there would be a clue as to why it is not binding as expected. I would try it in 6.9.

Link to comment

I get that pcie_acs_override=downstream and type1.allow_unsafe_interrupts=1 are kind of old-school methods now, and are just settings that were ported over from when I was running the same unraid install in an other box, an HP Z600 workstation, that definitively needed them to be able to passthrough something.

(I also used to shuffle my PCIe devices a lot, so using that instead of targetted-stubbing was just some kind of comfort method, also I never used 'vfio-pci.ids=', always did 'pci-stub.ids=')

 

I'll admit with no shame what so ever that I just took my drives, HBA and NICs out of the Z600, slapped them in the R720 I booted it with little to no care in early 2020. I might be part of this year's curse theme.

 

I called that gracefull as in 'it went smoothly'. Power down the server, slap GPU in, boot, little xml edition of the VM to set multifunction and bring the sound device on the same virtual slot, vbios rom, VM boots right away, drivers install, CUDA acceleration work and passmark is in the range. And since it has been running stable for over a week, and through about 30 VM restarts, as long as it doesn't catch fire, I call that good enough.

 

 

As for the NICs! Saying they were unconfigured, I used them at some point for Unraid's main networking, as it can be seen in 

 

This 'onboard' stock NIC did showed some unreliable behavior before, that I attributed to heat most likely, heavy usage + quite low airflow setting for my ears sanity (running the R720's fans between 7 and 12% speed, managed through an ipmitool script so disgusting that it would probably make you burp blood)

And since no one seemed to be eager to even throw me a ball on that thread back then, I got fed up of the situation of unreliability, and decided to move (and upgrade) to a 2x10gbps base-T card with active cooling for the server's main networking, while I already had a SFP+ card that was dedicated to only back-to-back connections.

 

 

Eth 4 5 6 and 7 still have their custom names in the network settings panel if I unstub them, but since have just stock-nothing settings dialed in, aren't linked to any network/bridge or anything, and all are "down".

And I'm starting to think that MAYBE part of the card's unreliability back when I was using it as main NIC isn't all about heat, but lies deeper. It would indeed be interesting to see what insight the features of the 6.9 release would give on the issue.

 

But I feel like whenever it's gonna bother me enough (which will probably happen before 6.9 release comes out), I'll go give some try to some sad-milking methods, like echo -n /sys/bus/pci/drivers/igb/0000:01:00.3 > unbind or just rmmod it.

Edited by Keexrean
Link to comment
  • 5 months later...

I have two NICs, one is a ConnectX-3 NIC with dual ports, the other is RTL8125 2.5GbE within the motherboard. the Unraid system is connected to the network with a cable by one port of the ConnectX-3 NIC. I want to passthrough the RTL8125 2.5GbE by vfio-pci, but the checkbox is uncheckable. How can I make it work?

unraid.png

Link to comment
9 hours ago, Sanly said:

I have two NICs, one is a ConnectX-3 NIC with dual ports, the other is RTL8125 2.5GbE within the motherboard. the Unraid system is connected to the network with a cable by one port of the ConnectX-3 NIC. I want to passthrough the RTL8125 2.5GbE by vfio-pci, but the checkbox is uncheckable. How can I make it work?

I'm assuming you are on 6.9.0-rc2? If you hover over the disabled checkbox you should get a tooltip saying "In use by Unraid". Go to Settings -> Network Settings and remove the nic from any bonds (note: you will need to stop the array to make any changes to the network setup). Once it is no longer in use by Unraid, the checkbox will be active. If you get stuck, please upload your diagnostics (from Tools -> Diagnostics)

  • Thanks 1
Link to comment
On 1/14/2021 at 12:25 AM, ljm42 said:

I'm assuming you are on 6.9.0-rc2? If you hover over the disabled checkbox you should get a tooltip saying "In use by Unraid". Go to Settings -> Network Settings and remove the nic from any bonds (note: you will need to stop the array to make any changes to the network setup). Once it is no longer in use by Unraid, the checkbox will be active. If you get stuck, please upload your diagnostics (from Tools -> Diagnostics)

Solved by disable the "Enable bridging", many thanks!

  • Like 1
Link to comment
  • 3 weeks later...
  • 3 weeks later...
On 1/31/2021 at 2:45 AM, richiesebo said:

Legacy PCI Stubbing found, please help clear this warning, vfio-pci.ids or xen-pciback.hide found within syslinux.cfg. For best results on Unraid 6.9+, it is recommended to remove those methods of isolating devices for use within a VM and instead utilize the options within Tools - System Devices

nrs-diagnostics-20210131-1242.zip 136.15 kB · 0 downloads

 

At some point in the past you added this to your syslinux file:
  vfio-pci.ids=8086:150e
This is the old way of stubbing a device so that Unraid will not install a driver for it.

 

In 6.9 we do that using the webgui.

 

First go to Main -> Boot Device -> Flash and choose "Flash backup". This will give you a zip file of your settings "just in case"

 

Then go to the Syslinux tab on that page and remove "vfio-pci.ids=8086:150e" from this line:
  append initrd=/bzroot vfio-pci.ids=8086:150e
so it looks like this:

  append initrd=/bzroot 

 

Hit Apply, but don't reboot yet!

 

Then go to Tools -> System Devices and put a checkmark next to device "8086:150e" and click "Bind selected to VFIO at boot"

 

Now reboot. When it comes back up, the System Devices page will be in control of what devices are stubbed to VFIO-PCI. You can press "View VFIO-PCI Log" to see exactly what it did for that device while booting.

Link to comment

In the 6.9 interface, it’s offering me the chance to bind the USB port my Unraid boot flash is on. Is this normal and correct behavior? What will happen if I try it? Will I be able to see the flash device both in vfio vm’s and in Unraid, or will Unraid keep that one to itself?

 

(I have worked on Linux kernel internals this present century—just barely!—so I’m guessing if this has something to do with the vagaries of USB Storage, such that the flash device can remain mounted by Unraid while the underlying USB port appears present but unavailable for any functions to the VM, then this would seem to not cause any problems for the Unraid kernel’s VFS—except pulling the flash while Unraid is running wouldn’t allow it to be remounted by Unraid if it were plugged back in to that port and a VM could grab it. But I assume that removing the boot flash during operation isn’t safe anyway.)

Edited by TreyH
Link to comment
1 hour ago, TreyH said:

In the 6.9 interface, it’s offering me the chance to bind the USB port my Unraid boot flash is on. Is this normal and correct behavior?

 

No, passing your boot drive to a VM would not be good.

 

I'm not clear which area of the interface you are looking at. Please provide screenshots showing the issue and upload your diagnostics (Tools -> Diagnostics). Thanks!

Link to comment

I've enabled SR-IOV on a network adapter via modprobe:

 

/boot/config/modprobe.d/mlx4_core.conf

options mlx4_core num_vfs=4

 

This creates four Virtual Function devices that can be assigned to multiple virtual machines for direct hardware access (without requiring a dedicated NIC)

 

This also makes them available to vfio-pci passthrough function:

image.thumb.png.b2054df1106da0fbb330102d7b9e6327.png

 

However this fails

 

Loading config from /boot/config/vfio-pci.cfg
BIND=0000:03:00.1|15b3:1002 0000:03:00.2|15b3:1002 0000:03:00.3|15b3:1002 0000:03:00.4|15b3:1002
---
Processing 0000:03:00.1 15b3:1002
Error: Device 0000:03:00.1 does not exist, unable to bind device
---
Processing 0000:03:00.2 15b3:1002
Error: Device 0000:03:00.2 does not exist, unable to bind device
---
Processing 0000:03:00.3 15b3:1002
Error: Device 0000:03:00.3 does not exist, unable to bind device
---
Processing 0000:03:00.4 15b3:1002
Error: Device 0000:03:00.4 does not exist, unable to bind device
---
vfio-pci binding complete


This is entirely expected as the devices didn't exist when vfio-pci.cfg was processed.

 

Is there any interest in making this functionality work for SR-IOV - either by re-attempting the binding after a delay should it fail - or re-attempting failed bindings on array start, etc?

 

Edit: Or a checkbox "Reattempt bindings on array start for SR-IOV" ?

Edited by ConnectivIT
Link to comment
4 hours ago, ConnectivIT said:

I've enabled SR-IOV on a network adapter via modprobe:

 

/boot/config/modprobe.d/mlx4_core.conf




options mlx4_core num_vfs=4

 

This creates four Virtual Function devices that can be assigned to multiple virtual machines for direct hardware access (without requiring a dedicated NIC)

 

This also makes them available to vfio-pci passthrough function:

image.thumb.png.b2054df1106da0fbb330102d7b9e6327.png

 

However this fails

 




Loading config from /boot/config/vfio-pci.cfg
BIND=0000:03:00.1|15b3:1002 0000:03:00.2|15b3:1002 0000:03:00.3|15b3:1002 0000:03:00.4|15b3:1002
---
Processing 0000:03:00.1 15b3:1002
Error: Device 0000:03:00.1 does not exist, unable to bind device
---
Processing 0000:03:00.2 15b3:1002
Error: Device 0000:03:00.2 does not exist, unable to bind device
---
Processing 0000:03:00.3 15b3:1002
Error: Device 0000:03:00.3 does not exist, unable to bind device
---
Processing 0000:03:00.4 15b3:1002
Error: Device 0000:03:00.4 does not exist, unable to bind device
---
vfio-pci binding complete


This is entirely expected as the devices didn't exist when vfio-pci.cfg was processed.

 

Is there any interest in making this functionality work for SR-IOV - either by re-attempting the binding after a delay should it fail - or re-attempting failed bindings on array start, etc?

 

Edit: Or a checkbox "Reattempt bindings on array start for SR-IOV" ?

 

This doesn't work the way you expect because you're attempting to do separate actions through two separate abstraction layers - one using the pci bus, one using sysfs. 

 

In order to make the pci version of this to work, you need to use the pci bus to create your VFs - the two options are mutually exclusive in this case. Sysfs is up at the OS level, which happens after the PCI bus is initialized, so unless you use PCI bus partitioning (which is sub-optimal as it limits your ability to make changes without a reboot), you'll want to script this as noted in the guide.

Edited by BVD
Link to comment
9 hours ago, BVD said:

In order to make the pci version of this to work, you need to use the pci bus to create your VFs - the two options are mutually exclusive in this case. Sysfs is up at the OS level, which happens after the PCI bus is initialized, so unless you use PCI bus partitioning (which is sub-optimal as it limits your ability to make changes without a reboot), you'll want to script this as noted in the guide.

 

Thanks for clarifying.  No issues using the script myself, but I guess my question would then become "is there any interest in providing this functionality in the unRAID System Devices GUI" - even if this was a separate column of checkboxes for passing through SR-IOV virtual function devices?

 

image.png

 

edit: Then again, maybe this is better suited to a separate SR-IOV plugin that could handle some of the janky requirements for getting VFs enabled in the first place

Edited by ConnectivIT
Link to comment

If I had even just 5% as much time to dedicate to things I enjoy (like this) as I end up spending putting out fires at work, I'd be all over this - seems like it'd be fun to make a plugin. I have a few days off next week, maybe I'll read up on it if the kiddos let me and see what it'd take.

 

I'd temper any expectations though... UnRAID in and of itself is a niche within a niche, people who both want a home NAS, AND are just crazy enough to go out and build their own after finding commercially available alternatives lacking (you know... like US!). Folks who'd also actually get something out of using a plugin that enables SR-IOV in addition to that? It's a niche's niche within that niche. I'm honestly surprised more than 2-3 people have chimed in about it, but that must just go to show the breadth of Limetech's customer base.

 

The simple truth is, at least from what I've gathered lurking around for the last year or so, most people only have maybe 2 VMs going at a time max, usually with >8 or so containers running in the background, and they're not really going to see much benefit from this kind of technology... In fact, they'd just be adding to the complexity of their deployment, which without any tangible benefit, only makes matters worse should something else go sideways in their system down the line (more knobs to turn, dials and gauges to look at, etc).

 

Now... When Intel's Xe graphics cards become 'real things' (I mean, like the kind you can go out and buy I guess, this graphics market sucks SO bad), well, that might be an entirely different story. The technology used is different (GVT-s and GVT-g), but the principle is the same (referring to SR-IOV). Want to have your graphics card partitioned so plex gets a 60% share of your encode/decode engine while the rest is allocated to an emulation VM? Or maybe have that emulation VM running with 20% allocated, another 40% for jellyfin and plex to share, chip 10-20 over to handbrake for automated media conversion, and leave the rest for something like blueiris to do motion detection? All these applications require *just* enough GPU power to need some kind of actual GPU for their work efficiency to skyrocket. And these products seem born for their use.

Personally, I'm ready to plunk down for one right now, just to play with it and see how much life I can squeeze out of the thing. If they'd just quit delaying them another 6 months (every 4-6 months 😕 ), I'd be game for taking a week off to just go nuts with it.

  • Like 1
Link to comment
1 hour ago, ConnectivIT said:

is there any interest in providing this functionality in the unRAID System Devices GUI"

 

This is exactly what Unraid does today.  The front-end writes to config/vfio-pci.cfg, and a modified version of vfio-pci-bind.sh reads from it and binds the devices during boot. Take a look at /usr/local/sbin/vfio-pci

 

@BVD - I've been reading over here:  https://forums.unraid.net/topic/103323-how-to-using-sr-iov-in-unraid-with-1gb10gb40gb-network-interface-cards-nics/

 

Unraid calls the script as early as possible so that devices are bound to vfio-pci before any drivers are loaded.  I guess your issue is that you need to call the script again after your SR-IOV commands have run.  

  • Like 1
Link to comment
4 hours ago, BVD said:

and they're not really going to see much benefit from this kind of technology...

 

Plenty of posts around from people attempting to passthrough entire network adapters - though usually out of a desire to virtualise pfsense, which I would always advise against (talk about complicating your network!)

 

4 hours ago, BVD said:

In fact, they'd just be adding to the complexity of their deployment, which without any tangible benefit

 

My use-case is wanting to get the best possible 10Gb SMB performance out of a Windows KVM guest (on ZFS storage)  This may be a fools errand but I'm sure I will learn some things along the way.

 

But yes, GPU SR-IOV is the killer use-case for all this.  It's so frustrating that this feature is still missing, even on professional (workstation) cards.

 

3 hours ago, ljm42 said:

I guess your issue is that you need to call the script again after your SR-IOV commands have run.  

 

Perfect, thank you.

 

/boot/config/go

# Relaunch vfio-pci script to bind virtual function adapters that didn't exist at boot time
/usr/local/sbin/vfio-pci >>/var/log/vfio-pci

 

Edited by ConnectivIT
  • Like 1
Link to comment
7 hours ago, ConnectivIT said:

virtualise pfsense, which I would always advise against (talk about complicating your network!)

I've been running a pfsense VM as my primary router  / firewall for what seems like forever, it's been at least 2 years. I have a pc ready to get spun up if I need internet while Unraid is down, but I can only remember doing that once in 2 years, partially just to see if it was seamless (it was).

 

I agree it's not for everyone, but if you have a powerful CPU with cycles available 24/7 anyway, I see no reason to not use a VM router, as long as you have failover options. My server room went from 6 discrete computers down to 2, I virtualized the router, 2 home theater instances, and home automation. I now just have 2 Unraid boxes doing all the work. If I could figure out a way to automatically fail over a VM router from one Unraid to the other I'd be set, no manual intervention at all.

Link to comment
On 2/20/2021 at 12:15 PM, ljm42 said:

 

No, passing your boot drive to a VM would not be good.

 

I'm not clear which area of the interface you are looking at. Please provide screenshots showing the issue and upload your diagnostics (Tools -> Diagnostics). Thanks!

Oops, sorry—I missed this message last month or i would have replied sooner. It turns out the checkbox is unclickable, but the particular colors used for a dimmed checkbox compared with an active checkbox on this device (Safari/iPadOS) wasn’t clear. Sorry for the confusion.

  • Like 1
Link to comment
8 hours ago, jonathanm said:

Not sure how to set that up with only a single static WAN IP.

 

It can be done, but not if you use DHCP for your WAN interface.  If you have a static WAN IP, you can assign RFC1918 addresses to both WAN interfaces and use your actual WAN IP as the CARP interface address.  Not much of a guide, but some discussion on this here:

 

https://www.reddit.com/r/PFSENSE/comments/cvmefu/pfsense_carp_with_one_wan_ip/

If haven't used pfsense CARP for many years, had a lot of issues getting it working.  I think it's improved since then though.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.