HOW TO: Using SR-IOV in UnRAID with 1Gb/10Gb/40Gb network interface cards (NICs)


BVD

Recommended Posts

I've tried everything I can think of to get my markdown formatting to work here... but it just ain't happenin. So I apologize in advance for the lack of prettification!

_____

 

I've seen a lot of folks more recently asking Limetech to implement SR-IOV support in unraid, whether it's so they can better secure specific virtual machines, better utilize resources, or simplify device passthrough, and as I looked around, I didn't see that anyone had really gone into this much; as I've been using SR-IOV with UnRAID since I spooled up my first server, I thought I'd try my hand at making what I hope will serve as at least some guidance to others wishing to either use, or know more about it. This guide is specific to 6.9RC2 as I just upgraded, but had been running with 6.8.2 and 6.8.3 earlier (so those *should* be fine), and I'll try to keep this up to date with releases as newer versions are released. My hope actually is that, in the longer run, none of these instructions will be necessary as the methods to implement them aren't terribly difficult or cumbersome, they just take time.

 

For those who already know what SR-IOV is, what it's benefits are, etc, you can skip to the next post - 

_____

 

In brief, you can think of an SR-IOV device as a 'Hardware Hypervisor' technology, much as the Hypervisor itself (UnRAID being ours here, virtualizing Operating systems) is an Application/Software Hypervisor SR-IOV. We all know that basic VM performance is nearly bare metal when configured properly... 

With an SR-IOV device, the functions within that VM (you'd call them PCI cards standard PC) get direct PCI level access to the device - Network cards (Ethernet controllers), graphics cards, SCSI [SAS/SATA] controllers, HBAs/RAID controllers, ANYTHING that takes specialized hardware to perform performs much better when you can actually let that specialized hardware do the work. Not only can the performance be orders of magnitude better, but it's also considered far more secure than an emulated device as there's far less attack surface area.

 

I've typed up some further information and reference material below for those interested:

 

Quote

Each time you use CPU pinning, you're actually using SR-IOV, perhaps without even knowing it. The CPU is a single component - How would we be able to say 'only use this one part of yourself when doing something I ask' in software only for something that has only one PCI address, right? S-IOV.

 

SR-IOV (or more specifically, the spec 'PCI-SIG SR-IOV') is a technology originally developed by intel (now an industry standard which AMD, IBM, Microsoft, and many others support)) to allow hardware to be 'partitioned' into multiple 'functions'. When you partition something for SR-IOV use, you're creating multiple pci addressable interfaces from a single physical function (i.e. a single port nic has 1 physical function). It require's hardware capable of advertising itself as SR-IOV capable (must support the feature), as well as a driver component which allows the hypervisor to address those functions. 

There's a TON of great videos that go over background for this feature, so I won't go *too* deep here.

 

As to why you'd want to use it, may've already put 2 and 2 together here after reading the above, especially if you've ever watched/read a VM tuning guide for UnRAID, but here it is - hardware addressable devices are seen as physical devices by virtual machines, requiring no secondary drivers other than those of the original vendor to work. What this means is that, if you attach an intel nic vf to a windows/ubuntu machine, when you boot that machine, no additional drivers are needed (no virtio installation) as long as windows/ubuntu/(etc) is supported for that hardware. If for whatever reason the OS you choose doesn't include the driver, just go to the vendors website and download the driver, just like you would for any other card added to a machine.

 

If you want multiple hosts to talk to eachother and only have one nic, you're typically stuck with an emulated device (you might've seen these referred to as an 'e1000' or 'vmxnet3' nic elsewhere), for which you have to deal with pesky drivers (especially problematic when dealing with BSD based systems, at least in my experience), not to mention sub-par performance. If you instead set up a VF for each VM on that same NIC, the Ethernet traffic is handled directly by the NIC hardware.

 

The second, and likely most important benefit for UnRAID users, as I somewhat alluded to above, is performance - since you're not using a bridge interface (which is a completely software feature), the host hypervisor isn't involved in sending/receiving data. With a bridged network, UnRAID has to inspect each packet in order to route it to the proper location, essentially acting as a software router which, once you get to multiple busy hosts, or especially once you start looking at 10Gb+ networking, you could be in for a bad time. Or at the very least, terribly inefficent use of CPU resources comparitively.

 

The hypervisor has to do all of the work in a network bridge (in software); anyone that uses plex/emby/jellyfin/handbrake, and has gone from using software transcoding to hardware trascoding with nVidia NVENC or Intel's QuickSync has experienced the difference between purely software performance, vs. hardware accelerated work. Latency with SR-IOV is near native, regardless of how many virtual functions are used (up to the maximum supported by the nic chipset - more on that later), and host CPU usage is limited.

 

 

Further Reading:

 

Next, let's see if you've already got something that'll work...

 

Edited by BVD
  • Thanks 1
Link to comment

Does my NIC support it? If not, what should I buy?


 

Quote

 My own setup (though everyones will be unique) uses an onboard x711 chipset (i40e driver) for gigabit, with an 82599ES 10Gb add-in card (HP 560SFP+, using ixgbe driver) - supermicro boards are pretty good about ensuring their onboard NICs support SR-IOV, though others vary by manufacturer. Checking if your hardware supports SR-IOV is pretty easy - 

 

  1. Find the vendor ID; Navigate to Tools -> System Devices, and scroll down until you see your NIC:

    ixgbeSystemDevices.png.97242ed8519833848f2b9bd823c7e698.png

    The vendor ID for the example here is 8086:10fb, which we'll use to check SR-IOV support.

     
  2. Pull up the terminal (either via SSH, or using the web shell), and query the device:
     
    
    lspci -vvv <venderID> | grep -A 9 SR-IOV


    I've highlighted the parts we're looking for; you can see it supports SR-IOV, allows up to 64 virtual functions on each of my nics, and I have 8 vfs already running on the second one (yours should read 0). If you don't see any output, the device doesn't support SR-IOV; you can run "lspci -vvv| grep -A 9 SR-IOV" to check if any devices in your system support it as an alternative.

    ixgbeLspci.png.20bc1f7fc83331a67798131a6a9a11de.png

 


_______

 

Shopping for a NIC

Quote

If your server doesn't have SR-IOV, there are a metric TON of options available out there. I won't spend a great deal of time on this as, again, there are tons of references out there, but briefly, here are some starting points/pointers when trying to seek out a NIC for this - 

 

A word of caution when shopping for these used/second-hand (ebay, newegg/amazon third party sellers, etc) - there's a HUGE number of fake/chinese manufacturers for nics out there, claiming that they're making intel T350's, Silicom units, and others. Read up on the below prior to purchasing, and try to confirm that the images for what you're buying match what the manufacturer shows on their site wherever possible:

 

Great, but what should I get? Too many choices makes my brain hurt.

 

I gotchoo. In brief for those who don't really care about all the intricacies or want to do further research, if you want a card for SR-IOV in your UnRAID server, I'd recommend either the Dell PRO/1000 ET (NOT the VT model); it'll give you up to 8 vf's per physical port (so 16 for dual port, 32 for quad models), which is likely more than most home users need, and you can find them for ~30 bucks or less.

 

And now, we're ready to start the show...

Edited by BVD
Updating last line
Link to comment

I've got the right stuff, where do I start?

 

Alright, on to the config/setup -  I'm trying to make this as generic as possible to cover as many possibilities as possible at once, as the implementation of virtual functions and their utilization depends on a combination of both the driver AND the hardware. Let's first gather some information so we know what drivers we're using before we move forward with creating our vfs, and get the script set up so we only have to reboot the one time here:

 

  1. The first thing we need is a script to bind our vfs once they're created by the driver; vf creation happens AFTER the OS is booted, which means using the built in bind options just won't work for us in this instance. My script for doing this prior to 6.9 was such a friggin hack job, but worked fine... Fortunately, someone else has already done the work required for us here that's clean and pretty, saving me the embarrassment; Andre Richter is a legend, I highly recommend checking out and supporting his work:
    wget 'https://raw.githubusercontent.com/andre-richter/vfio-pci-bind/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/

     

  2. While you're here in the terminal, make sure to add the following variable if it's not already in your syslinux config file:
    intel_iommu=pt

    This is more restrictive than the 'iommu=on' equivalent, as it explicitly looks for devices that support interrupt remapping, thereby also giving a performance benefit. However, do note that, as this is more restrictive, should it impact your ability to utilize other (non-interrupt remapping) devices that you require, set this instead to 'intel_iommu=on'. If using AMD, just replace 'intel' with 'amd' - the rest of the input is the same for both..
     
  3. We now need to ensure that, if the card has it's own BIOS options, it's set to enable SR-IOV - not all boards' BIOS' do this, but we should check, just in case. Reboot your host and get into the BIOS:

    HP1.thumb.jpg.9839d660e760cdffd2916692b42ec5f2.jpg

    As you can see, each port has it's own config for my card, so I'll need to check the options for each one I plan to use. Again, this part will be unique to your card, but the idea should be the same; go through the BIOS options for the NIC, looking for anything related to either SR-IOV or virtualiztion, make sure the options are set to allow virtualization, then save+exit. For my NICs, it looks like this (sorry for the crappy pics!):

    HP2.thumb.jpg.a0be33bb8abe442d0b77959c274913bb.jpg
     
  4. We're finally ready for some config modification. We first need to know what driver we're using to determine the method for creating our vf's. Using the same vendor ID we had noted earlier:
    lspci -vv -d 8086:37d1 | grep -A 2 'Kernel driver'
    	Kernel driver in use: i40e
    	Kernel modules: i40e

 

There are several possibilities here on how to actually create the functions, and the methods/modules we'll need to use vary depending on which functions your chipset requires and which chipset you're using.

 

Finally, let's get cracking on creating our vf's...


 

 

Edited by BVD
fixing wget to point to raw instead of blob
  • Like 1
  • Thanks 1
Link to comment

Driver / Device specific steps

 

The following is the most generic option, and should work for most UnRAID deployments that contain SR-IOV supporting NICs, going back to around 6.4, but I would recommend no lower than 6.8.2 if you're working with any device using the i40e driver (save yourself the pain and upgrade!):

 

  1. Open your terminal and edit the go file
    nano /boot/config/go
  2. Add the following line to the bottom, specifying the number of vf's to create for this interface, replacing my ID (17.:0.1) with your own - I chose 4 per interface:
    echo 4 > /sys/bus/pci/devices/0000:17:00.3/sriov_numvfs
    echo 4 > /sys/bus/pci/devices/0000:17:00.3/sriov_numvfs
  3. Hit 'Ctrl+x' , then 'Y', then Enter (just following the on screen prompts to save the file), and it's time for a reboot (one of the joys of UnRAID!)
     
  4. Now that your system is back up and running, head to the system devices screen (Tools -> System Devices) - you should see something pretty:
    ixgbeVfDevices.png.dc5011138941791d0b3c34da01bb086d.png
     
  5. Now normally, with any other device showing here, you'd just check the box, save, and reboot. But if you try this, you'll notice they're not bound on reboot, and on checking the vfio log, it says the devices are not found/invalid. This is where we'll use the User Scripts plugin to automatically bind our vf's the first time we start our array, using the script we pulled down earlier.

    5.a - In Settings -> User Scripts, create a new script.

    5.b For each interface, we'll call the script, specifying the vendor ID (note, this is different from the physical devices vendor ID), domain (always 0000 in our case), and bus ID - we'll choose to run this at first array start only, as it's only needed once per boot, one line per vf:
    sudo bash /boot/config/vfio-pci-bind.sh 8086:10ed 0000:17:10.0;

    I have 8 of them, so mine looks like the below:
    ixgbeUserScript.thumb.png.5d4d0bd5c6038503ff2344596e9c8ec4.png

     
  6.  Start the array, and you're done!
    bound_nics.png.6b2071549928366cea15270ff4f57e6c.png

    You can now add the vf's just like you would any other pci device under the VM edit page:
    vm_edit.png.913f20e8fce166d0f356b0782fb89d61.png

     

Next up, we'll talk about another (much simpler) method to do all of the above, which is now viable as of UnRAID 6.9 thanks to the later version of the Linux Kernel in use!

(not that you can't try it with earlier versions... it's just that it's not as sure-fire a way with them as it is with 6.9)

Edited by BVD
  • Like 1
  • Thanks 2
Link to comment

UnRAID 6.9, and new hotness chipsets /devices


***I have not yet fully tested this yet in UnRAID, please only attempt at your own risk until further notice***

 

***The following should be considered a work in progress***

 

Quote

 While I know this method works, I don't know what unforeseen circumstances might arise from the fact that UnRAID being a fully in-memory OS might have here; as I've not tested/trialed all the various circumstances I can think of that one might have for their UnRAID server, I can't guarantee there will be no unforeseen consequences here. However, *should* you try this and it causes a problem, *AND* your device has function level reset (all intel NICs with SR-IOV that I'm aware of do), just issue a reset to the device with the following for the address of the nic (from System Devices)

 


echo "1" > /sys/bus/pci/devices/0000:<Your>:<Bus>.<Number>/reset

 

Additionally, while I know these commands also work with earlier versions of the Linux Kernel, there are several bugs that I'm aware of for network drivers in Linux 4.4 related to virtual functions that can cause all kinds of unexplained weirdness, so I just can't recommend it for the average home user to use that might not have the background in troubleshooting such things and sorting them out. This is definitely an 'at your own risk' type operation when ran on anything older than UnRAID 6.9. "Do what I say, not what I do" 🤣

 

_________

 

 

Quote

 Just for awareness, the earlier method we went through above is typically considered sub-optimal in the datacenter, where asking to reboot a server might as well be asking permission to murder someone, and doing so on a whim might find you filing for unemployment. Let's look at option 2 (these two options shouldn't be used together!). Rebooting sucks, and UnRAID users aren't the only ones that feel that way. 

 

And so over the years, the driver technology has improved to allow addressing these devices without ever having to exit the kernel (i.e. NO REBOOTS); combining this with function level reset (the ability to reset a specific component rather than the entire card). Let's look at option 2 (do *not* run this yet!) :

 

  • Via terminal, just type the following command, substituting 'eth1' for whichever interface you're planning to set up:
    
    
    
    sudo echo 4 > /sys/class/net/eth1/device/sriov_numvfs

    ... That's it... no reboots, no config file changes, just bind the vfs with the scipt and no downtime, which is pretty spectactular; you now have 4 vfs (virtual functions) and one pf (physical function).
     

Now if we remember from earlier, different chipsets and drivers behave differently when it comes to partitioning up their physical function into virtual functions - most nics that are available in pro-sumer hardware, or even in the aftermarket (including all enterprise grade hardware through 2014, at which point it was bleeding edge - didn't become common really until ~2017, and even then it a premium) utilize chipsets which fully partition the Physical Function into the specified number of Virtual Functions. 

 

That's why, when you specify "sriov_numvfs=4", they hypervisor shows a total of 4 addressable NICs - the virtual functions have no abstraction layer from the physical device, so if you decide to use the virtual functions, the physical function can't be used as an addressable NIC because you've fully split the device. This means that one must ensure that the *physical* function isn't in use by the kernel prior to partitioning VFs (using the script previously mentioned. 

 

With the introduction of intel's 700 series chipsets (and later), if you specify you'd like 4 vf's, you end up seeing 5 addressable NICs. There's an abstraction layer within the driver that allows the hypervisor to continue to address the PF, even though VFs are being created/partitioned; you should ensure that the port is down, however, prior to changing this, to ensure the abstraction layer isn't unexpectedly interrupted (i.e. from the Network Settings tab, down the port, send the command, up the port).

 

Driver specifics, and device specific recommendations:

 

A. ixgbe/ixgb - uses the 'ixgbe_vf' driver for virtual functions

  • MUST be bound (vfio script utilized prior configuring VF's); see step 5.b in above post.
  • When setting up vfs on any device which utilizes the ixgbe driver, the number of vfs you specify will be the total number of interfaces available on that NIC
  • Used for the 82599 and X500 series. Commonly found in many OEM 10Gb cards under various names, including the HP 560SFP+, Dell mezzanine cards, etc

B. igbe/igb - uses the 'igbe_vf' driver for virtual functions - may be referred to as igbvf (OS dependent)

  • MUST be bound (vfio script utilized prior configuring VF's); see step 5.b in above post.
  • Used in various chipsets and cards, with the most common that support SR-IOV beint the i350 series and PRO/1000ET (and EF) cards, utilizing the 82576
  • The steps here are pretty much the same as for the ixgb driver as far as we need to be concerned in UnRAID, with the exception of: 
    • You'll be calling 'igb' instead of 'ixgb', and 'igbvf' instead of 'ixgbevf'
    • In place of 'sriov_numvfs', you *may* need to use the term 'max_vfs'; If I had access to my i350 I could verify this, but thanks to the pandemic, I can't get to it to validate when/where this change occurred with UnRAID, so I'm basing this purely off previous experience with the generic Linux kernel.
      • Which to use just depends on what version of UnRAID you're running, as the driver must be able to interpret this, and the naming convention's changed over the years. If it's anything recent thoough (6.8.2 or higher, at least, maybe further), you shouldn't have to change it. Feel free to ping me if you have one of these cards and encounter issues with the guide above so I can get the information updated.


C. i40e - uses the 'iavf' driver or virtual functions - in older versions of unraid i40evf may be seen as it was renamed to iavf in later builds

  • DOES NOT require being bound like either of the above, however, the port cannot be active when setting up VF's (though you still can if you prefer)
    • Navigate to Settings -> Network Settings
    • Browse to the interface, and verify that the port is in a Down state - If it says 'Up', click to change it and take down the link
    • After VF's created, you must 'Up' the port in order to utilize the VFs tied to it
  • Used by intel 700 series chipsets, also typically found on newer X11 era supermicro boards which often utilize the X722.
  • Unlike igbe and ixgbe nics, even with vfs created, the physical device is still addressable by the hypervisor
  • This means that if you specify 3 vfs during creation for 1 nic, you'll end up with 4 total interfaces. Benefit of newer technology's progress! Furthermore, you don't have to bind the device to create the virtual functions... But I'll get into that in a bit.

 

_____

 

Pondering some potential upcoming guides/topics subsequent to this SR-IOV walkthrough could include:

  • Making MAC addresses persistent (surviving reboots) for VF's when utilizing method 2 (the one in this post)
  • Installing a NICs VF drivers in 'Unsupported' versions of operating systems which some OEM card vendors try to block because they're greedy jerks (such as Windows 10 Home/Pro)

 

... And anything else that catches my fancy. I've done quite a bit with my own UnRAID setup that is completely outside the UI to bend it to my will, and I honestly just didn't think much of it as it's been 'Linux stuff, sometimes with some UnRAID quirks thrown in'. After spending some more time reading up on the forum recently though, I understood how many folks simply didn't realize how much magic they had under the hood with their servers, and it seemed like the most common request/question was about enabling SR-IOV support for UnRAID... So I thought maybe I could help?

 

If there are other topics of interest, I'm open to suggestions - who knows, maybe it's something I've already done and was just not engaged enough with the forum to see there was a need to be filled. I'll help where I can 👍

Edited by BVD
  • Like 1
  • Thanks 1
Link to comment
1 hour ago, Ford Prefect said:

...really nice, thanks for sharing!

 

I appreciate it! I've got chipset specific guidance already typed up (i.e. i350, i250, 82xxx series, 7xx, etc), I just need to find the time to format it - If you try it out, I'd appreciate letting me know if you have any issues so I can get the guide updated to fix those edge cases. With the OS (UnRAID) being all in memory, with certain files only held in bzroot and other's pulled from config files with non-matching names in other directories on flash, there's always the possibility that certain specific niche's will need some tweaks.

Honestly it's just nice to finally use some hard earned 'work' related knowledge for something *other* than work lol. I actually feel a little bit bad for not writing this up sooner after going through all the forum requests for SR-IOV support and passthrough related help :-\ 

  • Like 1
Link to comment

...I don't think that there is right or wrong in terms of timing....You're ready, when you're ready.

 

The topic will help a lot of people, as soon as the word is out, I think and others can/will chime in.

In terms of hardware support and Features available, this is a great progress for the community and for unraid.

 

I am using an i350 and will try it for my new VM projects....also running a dedicted router, for Dockers and when opening up to external services, especially with ipv6 and integrating external VPS instances.

There are so many opportunies, as soon as you have a nearly unlimmted number of "real" NICs.

Speaking of opportunies, I must admit, that I do not know enough to elaborate about the risks included (a virtio-NIC is considered higher security risk than a SR-IOV based one, which is higher than a real one ?).

 

Anyway...thanks again and keep up the good work!

 

Link to comment

@BVD Thanks for this!

 

Mellanox ConnectX-2 Firmware Upgrade / unRAID config for SR-IOV

 

Don't buy these cards! (see post below).  I already had a handful of these so I wanted to make use of them.  If you're in the same boat, or you can get them very cheap, then below are my notes for getting them to work.  SR-IOV is not supported by these cards as sold and requires reconfiguration/firmware update to a version beyond anything supported by Mellanox.  Your mileage may vary.

 

Some of what's below is repeating @BVD's post, but I thought it might be useful to have a set of complete steps to get this working specifically for Mellanox adapters.  I believe these should be the same steps for ConnectX-3 cards, they may not need the firmware update though.  If you want to upgrade firmware on them, make sure you use fw-ConnectX3-rel.mlx instead!

 

Mellanox firmware updating is a bit janky and Mellanox/nVidia sure as hell don't make it easy to find firmware downloads for unsupported adapters.  These are my notes from doing this on a Windows machine for MNPA19-XTR ConnectX-2 adapters.

 

Grab latest firmware fw-ConnectX2-rel-2_10_0720.zip from: https://drive.google.com/open?id=1Vdaup5hDYW9XItEaVqDDeJDMxlk0dp-B

(not my link, PM me if it stops working)

 

Extract the zip to a folder, open a command prompt as Administrator, and cd to that folder.

 

mst status

 

This should return something like:

 

MST devices:
------------
  mt26448_pciconf0
  mt26448_pci_cr0

 

We're interested in the second device name, mt26448_pci_cr0.  If you mess something up, you may be able to recover the card by flashing your firmware backup.  You may have to restore your backup firmware to the first device instead of the second device (worked for me)

 

Backup current firmware to backup.bin:

mstflint -d mt26448_pci_cr0 ri backup.bin

 

Read the current configuration of the card and store it in backup.ini

mstflint -d mt26448_pci_cr0 dc > backup.ini

 

Make a copy of backup.ini called sriov.ini.  In the copy, insert this at the bottom of the [HCA] section:

num_pfs = 1
total_vfs = 64
sriov_en = true

 

Now create firmware image firmware-sriov.bin based on the latest firmware file + your modified configuration:

mlxburn -fw fw-ConnectX2-rel.mlx -conf sriov.ini -wrimage firmware-sriov.bin

 

Write this firmware to the device ID that you identified previously - in my case this is mt26448_pci_cr0

mstflint -d mt26448_pci_cr0 -i firmware-sriov.bin b

 

Reboot

 

You should see FW version 2.10.720 with this command:

flint -d mt26448_pci_cr0 query

 

The card should now support SR-IOV (as well as RDMA, RSS, etc) 

 

image.png.43f155ec258de62b574feb84c3fe1a00.png

 

image.png.e45ed6fcb8a3a041b6b993defcceecd5.png

 

unRAID configuration

 

I couldn't get either of @BVD's methods for enabling VFs to work.  As of unRAID 6.9 we can now issue commands to kernel modules via files in /boot/config/modprobe.d

 

In unRAID Tools / System Devices, check the vendor/device ID and confirm the kernel driver used is mlx4_core:

 

lspci -vv -d 15b3:6750 | grep -A 2 'Kernel driver'

 

        Kernel driver in use: mlx4_core
        Kernel modules: mlx4_core

 

Create this file: /boot/config/modprobe.d/mlx4_core.conf

 

with:

options mlx4_core num_vfs=8

 

Reboot.

 

Bind to vfio on startup

 

Option 1 - unRAID GUI

 

In unRAID Tools \ System Devices, select all the "Virtual Function" devices and click Bind Selected to vfio at boot

 

image.thumb.png.70affc1b62940b83c442ab664ba36d8c.png

 

If we do nothing else, this will fail because vfio-pci script is run before device drivers are loaded as noted here

 

Tell unRAID to re-run the vfio-pci script again (but after device drivers have loaded) by calling it in the /boot/config/go file:

 

# Relaunch vfio-pci script to bind virtual function adapters that didn't exist at boot time
/usr/local/sbin/vfio-pci >>/var/log/vfio-pci

 

Reboot.

 

If you check Tools \ System Devices \ View vfio logs, you should see the first run where the bindings failed, and then a second run where the bindings succeeded.

 

Option 2 - Manual

 

Check the pcie addresses of the VIrtual Function devices:

 

lspci | grep Mellanox

 

03:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
03:00.1 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0)
03:00.2 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0)
03:00.3 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0)
03:00.4 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0)
03:00.5 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0)
03:00.6 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0)
03:00.7 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0)
03:01.0 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0)

 

Install the script to bind the virtual functions to vfio-pci:

wget 'https://raw.githubusercontent.com/andre-richter/vfio-pci-bind/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/

 

BVD refers to using "User Scripts" to run the bind commands.  There may be a good reason for that, but I ended up just adding these to /boot/config/go

 

sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.1
sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.2
sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.3
sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.4
sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.5
sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.6
sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.7
sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:01.1


Permanent MAC addresses

 

Assuming your SR-IOV-enabled network interface is eth0:

 

ip link show dev eth0

 

13: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:02:c9:55:ba:e6 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto

 

If the virtual function has a MAC address of 00:00:00:00:00:00, it will be assigned a random MAC address that will change on subsequent reboots.

 

 Add something like this to assign MAC addresses in /boot/config/go

 

ip link set eth0 vf 0 mac 00:25:8b:ff:01:00
ip link set eth0 vf 1 mac 00:25:8b:ff:01:01
ip link set eth0 vf 2 mac 00:25:8b:ff:01:02
ip link set eth0 vf 3 mac 00:25:8b:ff:01:03
ip link set eth0 vf 4 mac 00:25:8b:ff:01:04
ip link set eth0 vf 5 mac 00:25:8b:ff:01:05
ip link set eth0 vf 6 mac 00:25:8b:ff:01:06
ip link set eth0 vf 7 mac 00:25:8b:ff:01:07

 

As far as I'm aware, it doesn't matter too much what MAC addresses you choose (as long as they don't conflict with other devices on your network).  To satisfy my OCD, I looked up Mellanox here, and assigned MACs with one of those prefixes.  This way the VMs assigned a virtual function device will show up as a Mellanox adapter to other devices on the network:

 

image.png.e0430c7620bbc7f3d97132cb599bb9f3.png

 

It seems like Windows 10 / Server 2019 have built-in drivers for Mellanox virtual function devices, though you can install the latest Win-OF (not Win-OF2) drivers.

https://www.mellanox.com/products/adapter-software/ethernet/windows/winof-2

 

ie. current version is 5.50.54000 for Server 2019, or 5.50.53000 for Windows 10/2016/2012R2, etc.  As with the updated firmware, these drivers aren't advertised as supporting ConnectX-2 cards, but they work (for now).  Your mileage may vary!

 

 

Edited by ConnectivIT
  • Like 1
  • Thanks 1
Link to comment
13 minutes ago, ConnectivIT said:

@BVD

 

lspci | grep Mellanox

image.png.28760d4b0fe7166f4cf93a0112179f05.png

 

Success!

 

I can't actually test this on network yet, will be running some fibre over the next week or so.

 

 

Yeah, it works with MLNX devices as well - the problem there is that the nvidia (who now own mellanox) guys behind their later drivers have started doing some... Well, typical nvidia-like crap that *can* make for some real difficulties down the line, depending on the configuration and whether or not nvidia considers it "supported". There's a lot I could say about the whole mess, but the short it is this:

 

If you want to use virtual function mellanox drivers in a VM that nvidia doesn't want you to, be prepared for the possibility of long nights beating your head against a wall. It's the reason I didn't really make a guide including them, as I know enough of the pain points that I didn't feel comfortable doing so as I don't think I'd have the time necessary to support it properly.

 

Now with intel based boards though? ... No problem. If for whatever reason your OS doesn't come with the VF drivers for the intel VF NIC you're using (like windows 10 for instance), it's super easy to get them installed. Just a few clicks and you're done.

Edited by BVD
  • Like 1
Link to comment

Thanks for the heads-up.  I have a dual 82599 on the way for my primary unRAID server, so it sounds like that will make things a lot less painful.

 

ps: I think this:

Quote

wget 'https://github.com/andre-richter/vfio-pci-bind/blob/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/

 

should be this?

 

wget 'https://raw.githubusercontent.com/andre-richter/vfio-pci-bind/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/

 

Link to comment
On 3/5/2021 at 3:40 AM, ConnectivIT said:

Thanks for the heads-up.  I have a dual 82599 on the way for my primary unRAID server, so it sounds like that will make things a lot less painful.

 

ps: I think this:

 

should be this?

 


wget 'https://raw.githubusercontent.com/andre-richter/vfio-pci-bind/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/

 



Shoot.... You're right lol. I guess I didn't realize it as I'd just been using my own (admittedly janky) script instead. Should be corrected now, thanks for double-checking me!!

Link to comment

 

On 3/1/2021 at 6:00 AM, BVD said:

Now normally, with any other device showing here, you'd just check the box, save, and reboot. But if you try this, you'll notice they're not bound on reboot, and on checking the vfio log, it says the devices are not found/invalid. This is where we'll use the User Scripts plugin to automatically bind our vf's the first time we start our array, using the script we pulled down earlier.

 

Interesting, this only seems to be required when you first assign a VF to a KVM guest.  if the VFs are not bound to vfio and you start a VM that already has the VF assigned, KVM or unRAID seem to bind that specific VF for you:

 

image.thumb.png.171a8b8025edd3bc6ffc26f53dd8fe44.png

 

That said it would be really nice if this feature "just worked" for SR-IOV and just seems to be an issue of timing, so presumably not that difficult to fix?

 

I've posted about this in the new vfio-pci thread for 6.9:

 

edit: Added option for passing Virtual Function devices to vfio via unRAID GUI in my post above.

Edited by ConnectivIT
Link to comment
4 hours ago, ConnectivIT said:

 

 

Interesting, this only seems to be required when you first assign a VF to a KVM guest.  if the VFs are not bound to vfio and you start a VM that already has the VF assigned, KVM or unRAID seem to bind that specific VF for you:

 

image.thumb.png.171a8b8025edd3bc6ffc26f53dd8fe44.png

 

That said it would be really nice if this feature "just worked" for SR-IOV and just seems to be an issue of timing, so presumably not that difficult to fix?

 

I've posted about this in the new vfio-pci thread for 6.9:

 

 

Yeah, I mentioned in the first post, it wouldn't be *super* difficult to add this into the OS - the problem becomes with OEMs do janky crap with their firmware or don't keep their drivers up to date 😕 

On the up side... Works great with intel iGPUs as well :D For people to get the most out of this, they're also going to want to set up some filters so they can allocate bandwidth to whichever machines need it the most, but that's a tutorial for another day lol

Link to comment

Yeah, thatd actually work come to think of it. The "right" way would be to use the network-rules.cfg file - youd actually use that file both to specify the numvfs as well as the macs for those vfs, as it's considered the single "source of truth" for persistent network changes.

 

If you decide to go that route and have formatting issues, lemme know if you have any issues with the formatting or anything and I'd be happy to help - the most important part really is that you create you 'ACTION' adding the sriov_numvfs for the eth interface above/before you set the vf interface parameters (for probably obvious reasons lol)

Edited by BVD
Link to comment
On 3/4/2021 at 6:56 AM, ConnectivIT said:

Tell unRAID to re-run the vfio-pci script again (but after device drivers have loaded) by calling it in the /boot/config/go file:

 


# Relaunch vfio-pci script to bind virtual function adapters that didn't exist at boot time
/usr/local/sbin/vfio-pci >>/var/log/vfio-pci

 

Reboot.

 

If you check Tools \ System Devices \ View vfio logs, you should see the first run where the bindings failed, and then a second run where the bindings succeeded.

 

Woot! Very cool to get confirmation that simply re-running this command after setting up the drivers allows you to use the webgui to pick which devices to bind to vfio-pci

Link to comment

@ConnectivIT looking for some feedback whenever you have a moment -

 

- VF driver installation in MS OS's pretty much always possible, the process just varies a bit depending on which OS (8, 10, server 2019, etc), which version (home, pro / pro for workstation / pro education, IOT, enterprise enterprise IOT), and which release (1809, 1903, etc). I was planning on doing a separate guide for this, but then also figured most people using this one probably would be doing windows drivers as well... what're your thoughts on putting it in a separate guide and linking to in from here as an "optional next step" vs just adding it to this one directly?

 

- I'd intentionally left off MLNX from the initial list here for the reasons I'd noted above, but if you're comfortable supporting it / providing guidance on it, I can add it to the main doc and reference you in it (as both credit to the additions and a contact for issues)? Idk how familiar you are with SR-IOV in the field, but if not quite there yet, I could always just reference your comments and link to them instead for now?

 

- Performance is a HUGE subject with all of this, and I'm working on another topic related to performance/tuning with SR-IOV. For instance, RDMA (SMB Direct in MS terminology), which provides wire level latency for storage over the network, is only available to Pro for Workstations and Enterprise builds; many other nuances. But unlike VF creation, the parameters used to implement any filters/tuning are completely different between Intel and MLNX drivers... you've any interest in creating/supporting the MLNX side of that? Lemme know 👍

Link to comment
Quote

what're your thoughts on putting it in a separate guide

 

Happy to follow your lead and contribute where I can.

 

Quote

I'd intentionally left off MLNX from the initial list here for the reasons I'd noted above, but if you're comfortable supporting it / providing guidance on it, I can add it to the main doc and reference you in it (as both credit to the additions and a contact for issues)? Idk how familiar you are with SR-IOV in the field

 

9 hours ago, BVD said:

Performance is a HUGE subject with all of this, and I'm working on another topic related to performance/tuning with SR-IOV ... you've any interest in creating/supporting the MLNX side of that?

 

I haven't worked with SR-IOV previously.  I have an 82599eb card on the way for my main system so I'm mostly interested in eeking out the best performance I can from that.   But I'll continue to have a test server with MLNX ConnectX-2 adapter - I'm happy to continue doing testing/benchmarking/assisting where I can.

 

edit:

 

9 hours ago, BVD said:

I can add it to the main doc and reference you in it

 

Feel free to incorporate anything from my post.  I'm not sure about the best way to organise this.  It certainly makes sense to have a single document that outlines all the different options / possible methods for each step - but it can be a little difficult to follow.   It would probably be easier for readers if we had a set of end-to-end instructions for each kernel driver (I'm assuming the variations in requirements are common to the kernel driver being used?)  That was my thought in making the Mellanox post (and repeating a lot of your work)

Edited by ConnectivIT
Link to comment
14 hours ago, BVD said:

But unlike VF creation, the parameters used to implement any filters/tuning are completely different between Intel and MLNX drivers... you've any interest in creating/supporting the MLNX side of that?

 

Unfortunately it looks like SR-IOV/RDMA can't be used together on ConnectX-2/3 adapters, so I'll probably be using them on the Windows client side, but not for unRAID/SR-IOV/RDMA.

 

Quote

 

edit: 

 

This is a lot more complicated than I thought.... https://www.starwindsoftware.com/blog/smb-direct-the-state-of-rdma-for-use-with-smb-3-traffic-part-i

 

Edited by ConnectivIT
Link to comment
  • 2 weeks later...
On 3/20/2021 at 1:31 PM, trott said:

thanks for the how-to,  is it possible to assign the VF to docker container?

 

From doing some quick reading there may be some benefit to that, but would probably require far more customisation of unRAID internals than is the case for VMs.

 

On that note, moving a Windows VM from virtio-net to SR-IOV, my SMB read speeds across network have gone from ~70MB/sec to ~600MB/sec.  Wow!

 

image.png.5c6b1c982fca0ca3b563e4d04b3355d9.png

Link to comment
8 hours ago, ConnectivIT said:

 

From doing some quick reading there may be some benefit to that, but would probably require far more customisation of unRAID internals than is the case for VMs.

 

On that note, moving a Windows VM from virtio-net to SR-IOV, my SMB read speeds across network have gone from ~70MB/sec to ~600MB/sec.  Wow!

 

image.png.5c6b1c982fca0ca3b563e4d04b3355d9.png


And you're only just scratching the surface! :D Wait till you start looking at latency numbers; I think you'll be pleasantly surprised when you start looking at nano instead of milliseconds :D 

 

My work life has kind of run over all my other projects the last few weeks so I've not been able to really dig back in as of yet - but I'm certainly glad to see you experiencing some of the benefits already! Let me know if you encounter any issues or have any questions as you're working on it - I'm happy to help 👍

 

@trott there's no direct way to do so via the unraid UI - I'm not sure why one would want to do so though honestly. If it's for security purposes, then it'd be best suited to being in a VM anyway, at least as far as overall security scope is concerned. A VM, at least IMO, is inherently more 'secure' than a docker container as there's far less attack surface area from the VM to the hypervisor than from a docker container to the hypervisor.

Edited by BVD
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.