BVD Posted February 28, 2021 Share Posted February 28, 2021 (edited) I've tried everything I can think of to get my markdown formatting to work here... but it just ain't happenin. So I apologize in advance for the lack of prettification! _____ I've seen a lot of folks more recently asking Limetech to implement SR-IOV support in unraid, whether it's so they can better secure specific virtual machines, better utilize resources, or simplify device passthrough, and as I looked around, I didn't see that anyone had really gone into this much; as I've been using SR-IOV with UnRAID since I spooled up my first server, I thought I'd try my hand at making what I hope will serve as at least some guidance to others wishing to either use, or know more about it. This guide is specific to 6.9RC2 as I just upgraded, but had been running with 6.8.2 and 6.8.3 earlier (so those *should* be fine), and I'll try to keep this up to date with releases as newer versions are released. My hope actually is that, in the longer run, none of these instructions will be necessary as the methods to implement them aren't terribly difficult or cumbersome, they just take time. For those who already know what SR-IOV is, what it's benefits are, etc, you can skip to the next post - _____ In brief, you can think of an SR-IOV device as a 'Hardware Hypervisor' technology, much as the Hypervisor itself (UnRAID being ours here, virtualizing Operating systems) is an Application/Software Hypervisor SR-IOV. We all know that basic VM performance is nearly bare metal when configured properly... With an SR-IOV device, the functions within that VM (you'd call them PCI cards standard PC) get direct PCI level access to the device - Network cards (Ethernet controllers), graphics cards, SCSI [SAS/SATA] controllers, HBAs/RAID controllers, ANYTHING that takes specialized hardware to perform performs much better when you can actually let that specialized hardware do the work. Not only can the performance be orders of magnitude better, but it's also considered far more secure than an emulated device as there's far less attack surface area. I've typed up some further information and reference material below for those interested: Quote Each time you use CPU pinning, you're actually using SR-IOV, perhaps without even knowing it. The CPU is a single component - How would we be able to say 'only use this one part of yourself when doing something I ask' in software only for something that has only one PCI address, right? S-IOV. SR-IOV (or more specifically, the spec 'PCI-SIG SR-IOV') is a technology originally developed by intel (now an industry standard which AMD, IBM, Microsoft, and many others support)) to allow hardware to be 'partitioned' into multiple 'functions'. When you partition something for SR-IOV use, you're creating multiple pci addressable interfaces from a single physical function (i.e. a single port nic has 1 physical function). It require's hardware capable of advertising itself as SR-IOV capable (must support the feature), as well as a driver component which allows the hypervisor to address those functions. There's a TON of great videos that go over background for this feature, so I won't go *too* deep here. As to why you'd want to use it, may've already put 2 and 2 together here after reading the above, especially if you've ever watched/read a VM tuning guide for UnRAID, but here it is - hardware addressable devices are seen as physical devices by virtual machines, requiring no secondary drivers other than those of the original vendor to work. What this means is that, if you attach an intel nic vf to a windows/ubuntu machine, when you boot that machine, no additional drivers are needed (no virtio installation) as long as windows/ubuntu/(etc) is supported for that hardware. If for whatever reason the OS you choose doesn't include the driver, just go to the vendors website and download the driver, just like you would for any other card added to a machine. If you want multiple hosts to talk to eachother and only have one nic, you're typically stuck with an emulated device (you might've seen these referred to as an 'e1000' or 'vmxnet3' nic elsewhere), for which you have to deal with pesky drivers (especially problematic when dealing with BSD based systems, at least in my experience), not to mention sub-par performance. If you instead set up a VF for each VM on that same NIC, the Ethernet traffic is handled directly by the NIC hardware. The second, and likely most important benefit for UnRAID users, as I somewhat alluded to above, is performance - since you're not using a bridge interface (which is a completely software feature), the host hypervisor isn't involved in sending/receiving data. With a bridged network, UnRAID has to inspect each packet in order to route it to the proper location, essentially acting as a software router which, once you get to multiple busy hosts, or especially once you start looking at 10Gb+ networking, you could be in for a bad time. Or at the very least, terribly inefficent use of CPU resources comparitively. The hypervisor has to do all of the work in a network bridge (in software); anyone that uses plex/emby/jellyfin/handbrake, and has gone from using software transcoding to hardware trascoding with nVidia NVENC or Intel's QuickSync has experienced the difference between purely software performance, vs. hardware accelerated work. Latency with SR-IOV is near native, regardless of how many virtual functions are used (up to the maximum supported by the nic chipset - more on that later), and host CPU usage is limited. Further Reading: The gist of it in written form from Juniper Networks: https://www.juniper.net/documentation/en_US/junos/topics/concept/disaggregated-junos-sr-iov.html In video format from Level 1 Techs - Wendell is my friggin hero (he's talking about graphics here, but the SR-IOV explanation of resource utilization and benefits holds the same for all SR-IOV devices): https://www.youtube.com/watch?v=IXUS1W7Ifys&ab_channel=Level1Linux And for the truly nerdy, a research paper (really, if you geek out over technology, you might actually like this, I promise): https://www.researchgate.net/publication/220267372_High_performance_network_virtualization_with_SR-IOV Next, let's see if you've already got something that'll work... Edited March 1, 2021 by BVD 1 Quote Link to comment
BVD Posted February 28, 2021 Author Share Posted February 28, 2021 (edited) Does my NIC support it? If not, what should I buy? Quote My own setup (though everyones will be unique) uses an onboard x711 chipset (i40e driver) for gigabit, with an 82599ES 10Gb add-in card (HP 560SFP+, using ixgbe driver) - supermicro boards are pretty good about ensuring their onboard NICs support SR-IOV, though others vary by manufacturer. Checking if your hardware supports SR-IOV is pretty easy - Find the vendor ID; Navigate to Tools -> System Devices, and scroll down until you see your NIC: The vendor ID for the example here is 8086:10fb, which we'll use to check SR-IOV support. Pull up the terminal (either via SSH, or using the web shell), and query the device: lspci -vvv <venderID> | grep -A 9 SR-IOV I've highlighted the parts we're looking for; you can see it supports SR-IOV, allows up to 64 virtual functions on each of my nics, and I have 8 vfs already running on the second one (yours should read 0). If you don't see any output, the device doesn't support SR-IOV; you can run "lspci -vvv| grep -A 9 SR-IOV" to check if any devices in your system support it as an alternative. _______ Shopping for a NIC Quote If your server doesn't have SR-IOV, there are a metric TON of options available out there. I won't spend a great deal of time on this as, again, there are tons of references out there, but briefly, here are some starting points/pointers when trying to seek out a NIC for this - Different OEM (Dell, HP, etc) model numbers for NICs, and what chipsets they use: https://forums.servethehome.com/index.php?threads/list-of-nics-and-their-equivalent-oem-parts.20974/ Decent reference for which chipsets support what functions (whether SR-IOV is supported, etc) for 1Gb NICs: https://forums.serverbuilds.net/t/demystifying-intel-pro-1000-quad-port-nics/2401 Got too many VMs but not enough slots? Check this bad boy out (couldn't help myself, I love Silicom's hardware engineering): https://www.silicom-usa.com/pr/server-adapters/networking-adapters/gigabit-ethernet-networking-server-adapters/pe2g6i35-server-adapter/ A word of caution when shopping for these used/second-hand (ebay, newegg/amazon third party sellers, etc) - there's a HUGE number of fake/chinese manufacturers for nics out there, claiming that they're making intel T350's, Silicom units, and others. Read up on the below prior to purchasing, and try to confirm that the images for what you're buying match what the manufacturer shows on their site wherever possible: https://forums.servethehome.com/index.php?threads/comparison-intel-i350-t4-genuine-vs-fake.6917/ https://www.servethehome.com/investigating-fake-intel-i350-network-adapters/ Great, but what should I get? Too many choices makes my brain hurt. I gotchoo. In brief for those who don't really care about all the intricacies or want to do further research, if you want a card for SR-IOV in your UnRAID server, I'd recommend either the Dell PRO/1000 ET (NOT the VT model); it'll give you up to 8 vf's per physical port (so 16 for dual port, 32 for quad models), which is likely more than most home users need, and you can find them for ~30 bucks or less. And now, we're ready to start the show... Edited February 28, 2021 by BVD Updating last line Quote Link to comment
BVD Posted February 28, 2021 Author Share Posted February 28, 2021 (edited) I've got the right stuff, where do I start? Alright, on to the config/setup - I'm trying to make this as generic as possible to cover as many possibilities as possible at once, as the implementation of virtual functions and their utilization depends on a combination of both the driver AND the hardware. Let's first gather some information so we know what drivers we're using before we move forward with creating our vfs, and get the script set up so we only have to reboot the one time here: The first thing we need is a script to bind our vfs once they're created by the driver; vf creation happens AFTER the OS is booted, which means using the built in bind options just won't work for us in this instance. My script for doing this prior to 6.9 was such a friggin hack job, but worked fine... Fortunately, someone else has already done the work required for us here that's clean and pretty, saving me the embarrassment; Andre Richter is a legend, I highly recommend checking out and supporting his work: wget 'https://raw.githubusercontent.com/andre-richter/vfio-pci-bind/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/ While you're here in the terminal, make sure to add the following variable if it's not already in your syslinux config file: intel_iommu=pt This is more restrictive than the 'iommu=on' equivalent, as it explicitly looks for devices that support interrupt remapping, thereby also giving a performance benefit. However, do note that, as this is more restrictive, should it impact your ability to utilize other (non-interrupt remapping) devices that you require, set this instead to 'intel_iommu=on'. If using AMD, just replace 'intel' with 'amd' - the rest of the input is the same for both.. We now need to ensure that, if the card has it's own BIOS options, it's set to enable SR-IOV - not all boards' BIOS' do this, but we should check, just in case. Reboot your host and get into the BIOS: As you can see, each port has it's own config for my card, so I'll need to check the options for each one I plan to use. Again, this part will be unique to your card, but the idea should be the same; go through the BIOS options for the NIC, looking for anything related to either SR-IOV or virtualiztion, make sure the options are set to allow virtualization, then save+exit. For my NICs, it looks like this (sorry for the crappy pics!): We're finally ready for some config modification. We first need to know what driver we're using to determine the method for creating our vf's. Using the same vendor ID we had noted earlier: lspci -vv -d 8086:37d1 | grep -A 2 'Kernel driver' Kernel driver in use: i40e Kernel modules: i40e There are several possibilities here on how to actually create the functions, and the methods/modules we'll need to use vary depending on which functions your chipset requires and which chipset you're using. Finally, let's get cracking on creating our vf's... Edited March 6, 2021 by BVD fixing wget to point to raw instead of blob 1 1 Quote Link to comment
BVD Posted February 28, 2021 Author Share Posted February 28, 2021 (edited) Driver / Device specific steps The following is the most generic option, and should work for most UnRAID deployments that contain SR-IOV supporting NICs, going back to around 6.4, but I would recommend no lower than 6.8.2 if you're working with any device using the i40e driver (save yourself the pain and upgrade!): Open your terminal and edit the go file nano /boot/config/go Add the following line to the bottom, specifying the number of vf's to create for this interface, replacing my ID (17.:0.1) with your own - I chose 4 per interface: echo 4 > /sys/bus/pci/devices/0000:17:00.3/sriov_numvfs echo 4 > /sys/bus/pci/devices/0000:17:00.3/sriov_numvfs Hit 'Ctrl+x' , then 'Y', then Enter (just following the on screen prompts to save the file), and it's time for a reboot (one of the joys of UnRAID!) Now that your system is back up and running, head to the system devices screen (Tools -> System Devices) - you should see something pretty: Now normally, with any other device showing here, you'd just check the box, save, and reboot. But if you try this, you'll notice they're not bound on reboot, and on checking the vfio log, it says the devices are not found/invalid. This is where we'll use the User Scripts plugin to automatically bind our vf's the first time we start our array, using the script we pulled down earlier. 5.a - In Settings -> User Scripts, create a new script. 5.b For each interface, we'll call the script, specifying the vendor ID (note, this is different from the physical devices vendor ID), domain (always 0000 in our case), and bus ID - we'll choose to run this at first array start only, as it's only needed once per boot, one line per vf: sudo bash /boot/config/vfio-pci-bind.sh 8086:10ed 0000:17:10.0; I have 8 of them, so mine looks like the below: Start the array, and you're done! You can now add the vf's just like you would any other pci device under the VM edit page: Next up, we'll talk about another (much simpler) method to do all of the above, which is now viable as of UnRAID 6.9 thanks to the later version of the Linux Kernel in use! (not that you can't try it with earlier versions... it's just that it's not as sure-fire a way with them as it is with 6.9) Edited February 28, 2021 by BVD 1 2 Quote Link to comment
BVD Posted February 28, 2021 Author Share Posted February 28, 2021 (edited) UnRAID 6.9, and new hotness chipsets /devices ***I have not yet fully tested this yet in UnRAID, please only attempt at your own risk until further notice*** ***The following should be considered a work in progress*** Quote While I know this method works, I don't know what unforeseen circumstances might arise from the fact that UnRAID being a fully in-memory OS might have here; as I've not tested/trialed all the various circumstances I can think of that one might have for their UnRAID server, I can't guarantee there will be no unforeseen consequences here. However, *should* you try this and it causes a problem, *AND* your device has function level reset (all intel NICs with SR-IOV that I'm aware of do), just issue a reset to the device with the following for the address of the nic (from System Devices) echo "1" > /sys/bus/pci/devices/0000:<Your>:<Bus>.<Number>/reset Additionally, while I know these commands also work with earlier versions of the Linux Kernel, there are several bugs that I'm aware of for network drivers in Linux 4.4 related to virtual functions that can cause all kinds of unexplained weirdness, so I just can't recommend it for the average home user to use that might not have the background in troubleshooting such things and sorting them out. This is definitely an 'at your own risk' type operation when ran on anything older than UnRAID 6.9. "Do what I say, not what I do" 🤣 _________ Quote Just for awareness, the earlier method we went through above is typically considered sub-optimal in the datacenter, where asking to reboot a server might as well be asking permission to murder someone, and doing so on a whim might find you filing for unemployment. Let's look at option 2 (these two options shouldn't be used together!). Rebooting sucks, and UnRAID users aren't the only ones that feel that way. And so over the years, the driver technology has improved to allow addressing these devices without ever having to exit the kernel (i.e. NO REBOOTS); combining this with function level reset (the ability to reset a specific component rather than the entire card). Let's look at option 2 (do *not* run this yet!) : Via terminal, just type the following command, substituting 'eth1' for whichever interface you're planning to set up: sudo echo 4 > /sys/class/net/eth1/device/sriov_numvfs ... That's it... no reboots, no config file changes, just bind the vfs with the scipt and no downtime, which is pretty spectactular; you now have 4 vfs (virtual functions) and one pf (physical function). Now if we remember from earlier, different chipsets and drivers behave differently when it comes to partitioning up their physical function into virtual functions - most nics that are available in pro-sumer hardware, or even in the aftermarket (including all enterprise grade hardware through 2014, at which point it was bleeding edge - didn't become common really until ~2017, and even then it a premium) utilize chipsets which fully partition the Physical Function into the specified number of Virtual Functions. That's why, when you specify "sriov_numvfs=4", they hypervisor shows a total of 4 addressable NICs - the virtual functions have no abstraction layer from the physical device, so if you decide to use the virtual functions, the physical function can't be used as an addressable NIC because you've fully split the device. This means that one must ensure that the *physical* function isn't in use by the kernel prior to partitioning VFs (using the script previously mentioned. With the introduction of intel's 700 series chipsets (and later), if you specify you'd like 4 vf's, you end up seeing 5 addressable NICs. There's an abstraction layer within the driver that allows the hypervisor to continue to address the PF, even though VFs are being created/partitioned; you should ensure that the port is down, however, prior to changing this, to ensure the abstraction layer isn't unexpectedly interrupted (i.e. from the Network Settings tab, down the port, send the command, up the port). Driver specifics, and device specific recommendations: A. ixgbe/ixgb - uses the 'ixgbe_vf' driver for virtual functions MUST be bound (vfio script utilized prior configuring VF's); see step 5.b in above post. When setting up vfs on any device which utilizes the ixgbe driver, the number of vfs you specify will be the total number of interfaces available on that NIC Used for the 82599 and X500 series. Commonly found in many OEM 10Gb cards under various names, including the HP 560SFP+, Dell mezzanine cards, etc B. igbe/igb - uses the 'igbe_vf' driver for virtual functions - may be referred to as igbvf (OS dependent) MUST be bound (vfio script utilized prior configuring VF's); see step 5.b in above post. Used in various chipsets and cards, with the most common that support SR-IOV beint the i350 series and PRO/1000ET (and EF) cards, utilizing the 82576 The steps here are pretty much the same as for the ixgb driver as far as we need to be concerned in UnRAID, with the exception of: You'll be calling 'igb' instead of 'ixgb', and 'igbvf' instead of 'ixgbevf' In place of 'sriov_numvfs', you *may* need to use the term 'max_vfs'; If I had access to my i350 I could verify this, but thanks to the pandemic, I can't get to it to validate when/where this change occurred with UnRAID, so I'm basing this purely off previous experience with the generic Linux kernel. Which to use just depends on what version of UnRAID you're running, as the driver must be able to interpret this, and the naming convention's changed over the years. If it's anything recent thoough (6.8.2 or higher, at least, maybe further), you shouldn't have to change it. Feel free to ping me if you have one of these cards and encounter issues with the guide above so I can get the information updated. C. i40e - uses the 'iavf' driver or virtual functions - in older versions of unraid i40evf may be seen as it was renamed to iavf in later builds DOES NOT require being bound like either of the above, however, the port cannot be active when setting up VF's (though you still can if you prefer) Navigate to Settings -> Network Settings Browse to the interface, and verify that the port is in a Down state - If it says 'Up', click to change it and take down the link After VF's created, you must 'Up' the port in order to utilize the VFs tied to it Used by intel 700 series chipsets, also typically found on newer X11 era supermicro boards which often utilize the X722. Unlike igbe and ixgbe nics, even with vfs created, the physical device is still addressable by the hypervisor This means that if you specify 3 vfs during creation for 1 nic, you'll end up with 4 total interfaces. Benefit of newer technology's progress! Furthermore, you don't have to bind the device to create the virtual functions... But I'll get into that in a bit. _____ Pondering some potential upcoming guides/topics subsequent to this SR-IOV walkthrough could include: Making MAC addresses persistent (surviving reboots) for VF's when utilizing method 2 (the one in this post) Installing a NICs VF drivers in 'Unsupported' versions of operating systems which some OEM card vendors try to block because they're greedy jerks (such as Windows 10 Home/Pro) ... And anything else that catches my fancy. I've done quite a bit with my own UnRAID setup that is completely outside the UI to bend it to my will, and I honestly just didn't think much of it as it's been 'Linux stuff, sometimes with some UnRAID quirks thrown in'. After spending some more time reading up on the forum recently though, I understood how many folks simply didn't realize how much magic they had under the hood with their servers, and it seemed like the most common request/question was about enabling SR-IOV support for UnRAID... So I thought maybe I could help? If there are other topics of interest, I'm open to suggestions - who knows, maybe it's something I've already done and was just not engaged enough with the forum to see there was a need to be filled. I'll help where I can 👍 Edited March 1, 2021 by BVD 1 1 Quote Link to comment
Ford Prefect Posted February 28, 2021 Share Posted February 28, 2021 ...really nice, thanks for sharing! Quote Link to comment
BVD Posted February 28, 2021 Author Share Posted February 28, 2021 1 hour ago, Ford Prefect said: ...really nice, thanks for sharing! I appreciate it! I've got chipset specific guidance already typed up (i.e. i350, i250, 82xxx series, 7xx, etc), I just need to find the time to format it - If you try it out, I'd appreciate letting me know if you have any issues so I can get the guide updated to fix those edge cases. With the OS (UnRAID) being all in memory, with certain files only held in bzroot and other's pulled from config files with non-matching names in other directories on flash, there's always the possibility that certain specific niche's will need some tweaks. Honestly it's just nice to finally use some hard earned 'work' related knowledge for something *other* than work lol. I actually feel a little bit bad for not writing this up sooner after going through all the forum requests for SR-IOV support and passthrough related help 1 Quote Link to comment
Ford Prefect Posted March 1, 2021 Share Posted March 1, 2021 ...I don't think that there is right or wrong in terms of timing....You're ready, when you're ready. The topic will help a lot of people, as soon as the word is out, I think and others can/will chime in. In terms of hardware support and Features available, this is a great progress for the community and for unraid. I am using an i350 and will try it for my new VM projects....also running a dedicted router, for Dockers and when opening up to external services, especially with ipv6 and integrating external VPS instances. There are so many opportunies, as soon as you have a nearly unlimmted number of "real" NICs. Speaking of opportunies, I must admit, that I do not know enough to elaborate about the risks included (a virtio-NIC is considered higher security risk than a SR-IOV based one, which is higher than a real one ?). Anyway...thanks again and keep up the good work! Quote Link to comment
jortan Posted March 4, 2021 Share Posted March 4, 2021 (edited) @BVD Thanks for this! Mellanox ConnectX-2 Firmware Upgrade / unRAID config for SR-IOV Don't buy these cards! (see post below). I already had a handful of these so I wanted to make use of them. If you're in the same boat, or you can get them very cheap, then below are my notes for getting them to work. SR-IOV is not supported by these cards as sold and requires reconfiguration/firmware update to a version beyond anything supported by Mellanox. Your mileage may vary. Some of what's below is repeating @BVD's post, but I thought it might be useful to have a set of complete steps to get this working specifically for Mellanox adapters. I believe these should be the same steps for ConnectX-3 cards, they may not need the firmware update though. If you want to upgrade firmware on them, make sure you use fw-ConnectX3-rel.mlx instead! Mellanox firmware updating is a bit janky and Mellanox/nVidia sure as hell don't make it easy to find firmware downloads for unsupported adapters. These are my notes from doing this on a Windows machine for MNPA19-XTR ConnectX-2 adapters. Grab latest firmware fw-ConnectX2-rel-2_10_0720.zip from: https://drive.google.com/open?id=1Vdaup5hDYW9XItEaVqDDeJDMxlk0dp-B (not my link, PM me if it stops working) Extract the zip to a folder, open a command prompt as Administrator, and cd to that folder. mst status This should return something like: MST devices: ------------ mt26448_pciconf0 mt26448_pci_cr0 We're interested in the second device name, mt26448_pci_cr0. If you mess something up, you may be able to recover the card by flashing your firmware backup. You may have to restore your backup firmware to the first device instead of the second device (worked for me) Backup current firmware to backup.bin: mstflint -d mt26448_pci_cr0 ri backup.bin Read the current configuration of the card and store it in backup.ini mstflint -d mt26448_pci_cr0 dc > backup.ini Make a copy of backup.ini called sriov.ini. In the copy, insert this at the bottom of the [HCA] section: num_pfs = 1 total_vfs = 64 sriov_en = true Now create firmware image firmware-sriov.bin based on the latest firmware file + your modified configuration: mlxburn -fw fw-ConnectX2-rel.mlx -conf sriov.ini -wrimage firmware-sriov.bin Write this firmware to the device ID that you identified previously - in my case this is mt26448_pci_cr0 mstflint -d mt26448_pci_cr0 -i firmware-sriov.bin b Reboot You should see FW version 2.10.720 with this command: flint -d mt26448_pci_cr0 query The card should now support SR-IOV (as well as RDMA, RSS, etc) unRAID configuration I couldn't get either of @BVD's methods for enabling VFs to work. As of unRAID 6.9 we can now issue commands to kernel modules via files in /boot/config/modprobe.d In unRAID Tools / System Devices, check the vendor/device ID and confirm the kernel driver used is mlx4_core: lspci -vv -d 15b3:6750 | grep -A 2 'Kernel driver' Kernel driver in use: mlx4_core Kernel modules: mlx4_core Create this file: /boot/config/modprobe.d/mlx4_core.conf with: options mlx4_core num_vfs=8 Reboot. Bind to vfio on startup Option 1 - unRAID GUI In unRAID Tools \ System Devices, select all the "Virtual Function" devices and click Bind Selected to vfio at boot If we do nothing else, this will fail because vfio-pci script is run before device drivers are loaded as noted here Tell unRAID to re-run the vfio-pci script again (but after device drivers have loaded) by calling it in the /boot/config/go file: # Relaunch vfio-pci script to bind virtual function adapters that didn't exist at boot time /usr/local/sbin/vfio-pci >>/var/log/vfio-pci Reboot. If you check Tools \ System Devices \ View vfio logs, you should see the first run where the bindings failed, and then a second run where the bindings succeeded. Option 2 - Manual Check the pcie addresses of the VIrtual Function devices: lspci | grep Mellanox 03:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0) 03:00.1 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0) 03:00.2 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0) 03:00.3 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0) 03:00.4 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0) 03:00.5 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0) 03:00.6 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0) 03:00.7 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0) 03:01.0 Ethernet controller: Mellanox Technologies MT25400 Family [ConnectX-2 Virtual Function] (rev b0) Install the script to bind the virtual functions to vfio-pci: wget 'https://raw.githubusercontent.com/andre-richter/vfio-pci-bind/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/ BVD refers to using "User Scripts" to run the bind commands. There may be a good reason for that, but I ended up just adding these to /boot/config/go sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.1 sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.2 sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.3 sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.4 sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.5 sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.6 sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:00.7 sudo bash /boot/config/vfio-pci-bind.sh 15b3:1002 0000:03:01.1 Permanent MAC addresses Assuming your SR-IOV-enabled network interface is eth0: ip link show dev eth0 13: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 00:02:c9:55:ba:e6 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto vf 2 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto vf 3 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto vf 4 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto vf 5 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto vf 6 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto vf 7 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, vlan 4095, spoof checking off, link-state auto If the virtual function has a MAC address of 00:00:00:00:00:00, it will be assigned a random MAC address that will change on subsequent reboots. Add something like this to assign MAC addresses in /boot/config/go ip link set eth0 vf 0 mac 00:25:8b:ff:01:00 ip link set eth0 vf 1 mac 00:25:8b:ff:01:01 ip link set eth0 vf 2 mac 00:25:8b:ff:01:02 ip link set eth0 vf 3 mac 00:25:8b:ff:01:03 ip link set eth0 vf 4 mac 00:25:8b:ff:01:04 ip link set eth0 vf 5 mac 00:25:8b:ff:01:05 ip link set eth0 vf 6 mac 00:25:8b:ff:01:06 ip link set eth0 vf 7 mac 00:25:8b:ff:01:07 As far as I'm aware, it doesn't matter too much what MAC addresses you choose (as long as they don't conflict with other devices on your network). To satisfy my OCD, I looked up Mellanox here, and assigned MACs with one of those prefixes. This way the VMs assigned a virtual function device will show up as a Mellanox adapter to other devices on the network: It seems like Windows 10 / Server 2019 have built-in drivers for Mellanox virtual function devices, though you can install the latest Win-OF (not Win-OF2) drivers. https://www.mellanox.com/products/adapter-software/ethernet/windows/winof-2 ie. current version is 5.50.54000 for Server 2019, or 5.50.53000 for Windows 10/2016/2012R2, etc. As with the updated firmware, these drivers aren't advertised as supporting ConnectX-2 cards, but they work (for now). Your mileage may vary! Edited March 8, 2021 by ConnectivIT 1 1 Quote Link to comment
BVD Posted March 4, 2021 Author Share Posted March 4, 2021 (edited) 13 minutes ago, ConnectivIT said: @BVD lspci | grep Mellanox Success! I can't actually test this on network yet, will be running some fibre over the next week or so. Yeah, it works with MLNX devices as well - the problem there is that the nvidia (who now own mellanox) guys behind their later drivers have started doing some... Well, typical nvidia-like crap that *can* make for some real difficulties down the line, depending on the configuration and whether or not nvidia considers it "supported". There's a lot I could say about the whole mess, but the short it is this: If you want to use virtual function mellanox drivers in a VM that nvidia doesn't want you to, be prepared for the possibility of long nights beating your head against a wall. It's the reason I didn't really make a guide including them, as I know enough of the pain points that I didn't feel comfortable doing so as I don't think I'd have the time necessary to support it properly. Now with intel based boards though? ... No problem. If for whatever reason your OS doesn't come with the VF drivers for the intel VF NIC you're using (like windows 10 for instance), it's super easy to get them installed. Just a few clicks and you're done. Edited March 4, 2021 by BVD 1 Quote Link to comment
jortan Posted March 5, 2021 Share Posted March 5, 2021 Thanks for the heads-up. I have a dual 82599 on the way for my primary unRAID server, so it sounds like that will make things a lot less painful. ps: I think this: Quote wget 'https://github.com/andre-richter/vfio-pci-bind/blob/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/ should be this? wget 'https://raw.githubusercontent.com/andre-richter/vfio-pci-bind/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/ Quote Link to comment
BVD Posted March 6, 2021 Author Share Posted March 6, 2021 On 3/5/2021 at 3:40 AM, ConnectivIT said: Thanks for the heads-up. I have a dual 82599 on the way for my primary unRAID server, so it sounds like that will make things a lot less painful. ps: I think this: should be this? wget 'https://raw.githubusercontent.com/andre-richter/vfio-pci-bind/master/vfio-pci-bind.sh'; mv vfio-pci-bind.sh /boot/config/ Shoot.... You're right lol. I guess I didn't realize it as I'd just been using my own (admittedly janky) script instead. Should be corrected now, thanks for double-checking me!! Quote Link to comment
jortan Posted March 7, 2021 Share Posted March 7, 2021 (edited) On 3/1/2021 at 6:00 AM, BVD said: Now normally, with any other device showing here, you'd just check the box, save, and reboot. But if you try this, you'll notice they're not bound on reboot, and on checking the vfio log, it says the devices are not found/invalid. This is where we'll use the User Scripts plugin to automatically bind our vf's the first time we start our array, using the script we pulled down earlier. Interesting, this only seems to be required when you first assign a VF to a KVM guest. if the VFs are not bound to vfio and you start a VM that already has the VF assigned, KVM or unRAID seem to bind that specific VF for you: That said it would be really nice if this feature "just worked" for SR-IOV and just seems to be an issue of timing, so presumably not that difficult to fix? I've posted about this in the new vfio-pci thread for 6.9: edit: Added option for passing Virtual Function devices to vfio via unRAID GUI in my post above. Edited March 8, 2021 by ConnectivIT Quote Link to comment
BVD Posted March 7, 2021 Author Share Posted March 7, 2021 4 hours ago, ConnectivIT said: Interesting, this only seems to be required when you first assign a VF to a KVM guest. if the VFs are not bound to vfio and you start a VM that already has the VF assigned, KVM or unRAID seem to bind that specific VF for you: That said it would be really nice if this feature "just worked" for SR-IOV and just seems to be an issue of timing, so presumably not that difficult to fix? I've posted about this in the new vfio-pci thread for 6.9: Yeah, I mentioned in the first post, it wouldn't be *super* difficult to add this into the OS - the problem becomes with OEMs do janky crap with their firmware or don't keep their drivers up to date 😕 On the up side... Works great with intel iGPUs as well For people to get the most out of this, they're also going to want to set up some filters so they can allocate bandwidth to whichever machines need it the most, but that's a tutorial for another day lol Quote Link to comment
jortan Posted March 8, 2021 Share Posted March 8, 2021 ps: There may be a nicer way of doing this, but I added some notes re: permanent MAC address assignment to my post above Quote Link to comment
BVD Posted March 8, 2021 Author Share Posted March 8, 2021 (edited) Yeah, thatd actually work come to think of it. The "right" way would be to use the network-rules.cfg file - youd actually use that file both to specify the numvfs as well as the macs for those vfs, as it's considered the single "source of truth" for persistent network changes. If you decide to go that route and have formatting issues, lemme know if you have any issues with the formatting or anything and I'd be happy to help - the most important part really is that you create you 'ACTION' adding the sriov_numvfs for the eth interface above/before you set the vf interface parameters (for probably obvious reasons lol) Edited March 8, 2021 by BVD Quote Link to comment
ljm42 Posted March 9, 2021 Share Posted March 9, 2021 On 3/4/2021 at 6:56 AM, ConnectivIT said: Tell unRAID to re-run the vfio-pci script again (but after device drivers have loaded) by calling it in the /boot/config/go file: # Relaunch vfio-pci script to bind virtual function adapters that didn't exist at boot time /usr/local/sbin/vfio-pci >>/var/log/vfio-pci Reboot. If you check Tools \ System Devices \ View vfio logs, you should see the first run where the bindings failed, and then a second run where the bindings succeeded. Woot! Very cool to get confirmation that simply re-running this command after setting up the drivers allows you to use the webgui to pick which devices to bind to vfio-pci Quote Link to comment
BVD Posted March 9, 2021 Author Share Posted March 9, 2021 @ConnectivIT looking for some feedback whenever you have a moment - - VF driver installation in MS OS's pretty much always possible, the process just varies a bit depending on which OS (8, 10, server 2019, etc), which version (home, pro / pro for workstation / pro education, IOT, enterprise enterprise IOT), and which release (1809, 1903, etc). I was planning on doing a separate guide for this, but then also figured most people using this one probably would be doing windows drivers as well... what're your thoughts on putting it in a separate guide and linking to in from here as an "optional next step" vs just adding it to this one directly? - I'd intentionally left off MLNX from the initial list here for the reasons I'd noted above, but if you're comfortable supporting it / providing guidance on it, I can add it to the main doc and reference you in it (as both credit to the additions and a contact for issues)? Idk how familiar you are with SR-IOV in the field, but if not quite there yet, I could always just reference your comments and link to them instead for now? - Performance is a HUGE subject with all of this, and I'm working on another topic related to performance/tuning with SR-IOV. For instance, RDMA (SMB Direct in MS terminology), which provides wire level latency for storage over the network, is only available to Pro for Workstations and Enterprise builds; many other nuances. But unlike VF creation, the parameters used to implement any filters/tuning are completely different between Intel and MLNX drivers... you've any interest in creating/supporting the MLNX side of that? Lemme know 👍 Quote Link to comment
jortan Posted March 10, 2021 Share Posted March 10, 2021 (edited) Quote what're your thoughts on putting it in a separate guide Happy to follow your lead and contribute where I can. Quote I'd intentionally left off MLNX from the initial list here for the reasons I'd noted above, but if you're comfortable supporting it / providing guidance on it, I can add it to the main doc and reference you in it (as both credit to the additions and a contact for issues)? Idk how familiar you are with SR-IOV in the field 9 hours ago, BVD said: Performance is a HUGE subject with all of this, and I'm working on another topic related to performance/tuning with SR-IOV ... you've any interest in creating/supporting the MLNX side of that? I haven't worked with SR-IOV previously. I have an 82599eb card on the way for my main system so I'm mostly interested in eeking out the best performance I can from that. But I'll continue to have a test server with MLNX ConnectX-2 adapter - I'm happy to continue doing testing/benchmarking/assisting where I can. edit: 9 hours ago, BVD said: I can add it to the main doc and reference you in it Feel free to incorporate anything from my post. I'm not sure about the best way to organise this. It certainly makes sense to have a single document that outlines all the different options / possible methods for each step - but it can be a little difficult to follow. It would probably be easier for readers if we had a set of end-to-end instructions for each kernel driver (I'm assuming the variations in requirements are common to the kernel driver being used?) That was my thought in making the Mellanox post (and repeating a lot of your work) Edited March 10, 2021 by ConnectivIT Quote Link to comment
jortan Posted March 10, 2021 Share Posted March 10, 2021 (edited) 14 hours ago, BVD said: But unlike VF creation, the parameters used to implement any filters/tuning are completely different between Intel and MLNX drivers... you've any interest in creating/supporting the MLNX side of that? Unfortunately it looks like SR-IOV/RDMA can't be used together on ConnectX-2/3 adapters, so I'll probably be using them on the Windows client side, but not for unRAID/SR-IOV/RDMA. Quote Note: In case SR-IOV is enabled on the adapter, RDMA is disabled. https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-3-with-hyper-v--ethernet-x edit: This is a lot more complicated than I thought.... https://www.starwindsoftware.com/blog/smb-direct-the-state-of-rdma-for-use-with-smb-3-traffic-part-i Edited March 10, 2021 by ConnectivIT Quote Link to comment
BVD Posted March 10, 2021 Author Share Posted March 10, 2021 We can take this to DMs - I shouldve done so in the first place, come to think of it. In short though - complicated might be an understatement lol. Quote Link to comment
trott Posted March 20, 2021 Share Posted March 20, 2021 thanks for the how-to, is it possible to assign the VF to docker container? Quote Link to comment
jortan Posted March 22, 2021 Share Posted March 22, 2021 On 3/20/2021 at 1:31 PM, trott said: thanks for the how-to, is it possible to assign the VF to docker container? From doing some quick reading there may be some benefit to that, but would probably require far more customisation of unRAID internals than is the case for VMs. On that note, moving a Windows VM from virtio-net to SR-IOV, my SMB read speeds across network have gone from ~70MB/sec to ~600MB/sec. Wow! Quote Link to comment
BVD Posted March 22, 2021 Author Share Posted March 22, 2021 (edited) 8 hours ago, ConnectivIT said: From doing some quick reading there may be some benefit to that, but would probably require far more customisation of unRAID internals than is the case for VMs. On that note, moving a Windows VM from virtio-net to SR-IOV, my SMB read speeds across network have gone from ~70MB/sec to ~600MB/sec. Wow! And you're only just scratching the surface! Wait till you start looking at latency numbers; I think you'll be pleasantly surprised when you start looking at nano instead of milliseconds My work life has kind of run over all my other projects the last few weeks so I've not been able to really dig back in as of yet - but I'm certainly glad to see you experiencing some of the benefits already! Let me know if you encounter any issues or have any questions as you're working on it - I'm happy to help 👍 @trott there's no direct way to do so via the unraid UI - I'm not sure why one would want to do so though honestly. If it's for security purposes, then it'd be best suited to being in a VM anyway, at least as far as overall security scope is concerned. A VM, at least IMO, is inherently more 'secure' than a docker container as there's far less attack surface area from the VM to the hypervisor than from a docker container to the hypervisor. Edited March 22, 2021 by BVD Quote Link to comment
trott Posted March 27, 2021 Share Posted March 27, 2021 Guys, when I using the VF on VM, it cannot talk to the VM on br0, do you guys has this issue? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.