• 6.9 beta 25 - Passing through 1 of 2 identical NIC's to a VM issue


    dnoyeb
    • Minor

     Trying to get a mellanox-3 card passed through to a VM and having some troubles.  To set the stage, I have two of these cards in my unraid server, I am using one of them for the OS.  I used the Tools / System Devices / Bind Selected to VFIO at boot method and have verified that the card is added:

    cat vfio-pci.cfg
    BIND=0000:03:00.0|15b3:1003

     

    Here is the log showing it was successful in being bound at boot of unraid:

     

    Loading config from /boot/config/vfio-pci.cfg
    BIND=0000:03:00.0|15b3:1003
    ---
    Processing 0000:03:00.0 15b3:1003
    Vendor:Device 15b3:1003 found at 0000:03:00.0
    
    IOMMU group members (sans bridges):
    /sys/bus/pci/devices/0000:03:00.0/iommu_group/devices/0000:03:00.0
    
    Binding...
    Successfully bound the device 15b3:1003 at 0000:03:00.0 to vfio-pci
    ---
    vfio-pci binding complete
    
    Devices listed in /sys/bus/pci/drivers/vfio-pci:
    lrwxrwxrwx 1 root root 0 Aug 3 18:56 0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:1c.4/0000:03:00.0
    
    ls -l /dev/vfio/
    total 0
    crw------- 1 root root 249, 0 Aug 3 18:56 12
    crw-rw-rw- 1 root root 10, 196 Aug 3 18:56 vfio

    This card shows up when setting up the VM :

     

    Other PCI Devices:	
     Mellanox Technologies MT27500 Family [ConnectX-3] | Ethernet controller (03:00.0)

     

    When that box is checked and the VM is started, this error shows up in the log:

    2020-08-04T00:24:59.369033Z qemu-system-x86_64: -device vfio-pci,host=0000:03:00.0,id=hostdev0,bus=pci.0,addr=0x8: vfio 0000:03:00.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: Device or resource busy
    2020-08-04 00:25:00.476+0000: shutting down, reason=failed
    
    

    I'm confused as to how / why it is in use since it is allocated to VFIO at boot.  

     

      

    I have tried enabling "PCIe ACS override" and for the fun of it did the "VFIO allow unsafe interrupts"  Neither helped (didn't think they would but tried anyways.

     

    Does anyone have any thoughts on other things to try?  I am working to setup a RockNSM VM and need to pass through this NIC so the VM can capture the 10g mirror of my uplink to my router.  

     

    Thanks in advance for any assistance.




    User Feedback

    Recommended Comments

    bit more info, I see in the logs these lines:  

     

    Aug  3 19:04:09 Tower kernel: vfio-pci 0000:03:00.0: enabling device (0100 -> 0102)
    Aug  3 19:04:10 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x18c
    Aug  3 19:04:10 Tower kernel: genirq: Flags mismatch irq 16. 00000000 (vfio-intx(0000:03:00.0)) vs. 00000080 (ehci_hcd:usb1)

    Hopefully this helps in the troubleshooting.. 

    Link to comment

    Anonymized version attached; if you need the other version let me know and i'll dm it.  

     

    Other testing done last night:  I tried disabling the usb2.0 ports and had same issue, also tried disabling the usb3.0 ports and moved the key over to the 2.0 to just make sure it wasn't something funny like that.  Neither helped.  

     

    Thanks for any guidance.  

    tower-diagnostics-20200804-0854.zip

    Link to comment

    Retested on beta 29, issue still there.  Based on further testing and seeing Radek's comment; it occurs in 6.8.3 as well.

     

    I tried both cards that were in the box, same issue on each address.

     

    Here's the current error that pops up and the diag is attached:

     

    Execution error

    internal error: qemu unexpectedly closed the monitor: 2020-09-29T17:39:26.418978Z qemu-system-x86_64: -device vfio-pci,host=0000:03:00.0,id=hostdev0,bus=pci.4,addr=0x0: vfio 0000:03:00.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: Device or resource busy

     

     

    One side note I just wanted to bring up; based on mellanox's website, they're up to driver version 5.x whereas the diagnostic files i'm looking through seem to show this build is using the 4.0 driver version.   Any reason to think that could contribute?  Or when these things are passed through are they completely transparent to unraid?

     

     

    tower-diagnostics-20200929-1351.zip

    Edited by dnoyeb
    Link to comment

    I am also having this issue passing the Mellanox card to a VM.

     

    Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)

    Link to comment

    So I tested to pass through my Mellanox ConnectX3-Pro in UnRAID 6.9.1 stable and this bug is still here.

    Link to comment
    On 8/4/2020 at 3:56 PM, dnoyeb said:

    Anonymized version attached; if you need the other version let me know and i'll dm it.  

     

    Other testing done last night:  I tried disabling the usb2.0 ports and had same issue, also tried disabling the usb3.0 ports and moved the key over to the 2.0 to just make sure it wasn't something funny like that.  Neither helped.  

     

    Thanks for any guidance.  

    tower-diagnostics-20200804-0854.zip 70.36 kB · 25 downloads

    Can you post a screenshot from your PCI Devices and IOMMU Groups?

     

    As I don't have the NICs to test with, I assume these two links would get your sorted out:

     

    Preparing PCI pass-through devices / unloading drivers (note 0000:03:00 instead of 0000:03:00.0)

    https://www.ibm.com/docs/en/linux-on-systems?topic=through-pci

     

    Single Root IO Virtualization (SR-IOV) / Configuring SR-IOV for ConnectX-3/ConnectX-3 Pro
    https://docs.mellanox.com/pages/viewpage.action?pageId=12013542

    Edited by Hakabe
    Link to comment

    I tested Mellanox Technologies MT27500/MCX311A in unRAID 6.9.2 and had the same problem as #1

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.