• [6.10.3] Sr-iov not working with mellanox connectx4+ cards (mlx5_core)


    bbqdt
    • Solved Urgent

    Reproduce -

    Enable sr-iov and add vfs in the driver properly per the instructions.

     

    Note that no ip commands work on virtual function devices, they don't show in `ip link show` correctly, and attaching a vm to them results in errors or non-operable interfaces.

     

    Example -
     

    # lspci | grep X-4
    83:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
    83:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
    # ip link show eth1
    15: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether 98:03:9b:94:8d:30 brd ff:ff:ff:ff:ff:ff


    Note that the VF does not show under eth1 as a vf, it shows as eth2:
     

    # ip link show eth2
    79: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
        link/ether 56:17:3a:18:05:d6 brd ff:ff:ff:ff:ff:ff

     

    If I unbind the vf via `/sys/bus/pci/drivers/mlx5_core/unbind` it does not show at all under `ip link`.

     

    With the vf bound _or_ unbound I get the following when trying to do _any_ ip link set operations on it -
     

    # ip link set eth2 vf 0 mac 52:54:00:96:19:c1
    RTNETLINK answers: Operation not supported


    Another interesting point is that if I try to use it in kvm with an `<interface>` element I get this error when trying to start the vm - `error: internal error: missing IFLA_VF_INFO in netlink response`. 

     

    If I use a `<hostdev>` element I can start the vm and it seems to recognize it but network traffic does not work.

     

    This all works fine in ubuntu - 

     

    1360050602_ScreenShot2022-06-15at10_48_07AM.thumb.png.3fdc2285e065883d140a5b35033ca472.png

     

    If you need a diagnostics file I can send it in DM. Not comfortable uploading all that info to the public.

     




    User Feedback

    Recommended Comments

    @limetech

     

    Please enable the following kernel build configs to fix this -
     

    required -
    CONFIG_MLX5_ESWITCH=y
    
    please also add for nice-to-have
    CONFIG_MLX5_EN_TLS=y

     

    Edited by bbqdt
    Link to comment

    Here are the main additions to the .config for the kernel build needed for this -

     

    CONFIG_TLS=y
    CONFIG_TLS_DEVICE=y
    CONFIG_NET_SWITCHDEV=y
    CONFIG_MLX5_ACCEL=y
    CONFIG_MLX5_ESWITCH=y
    CONFIG_MLX5_BRIDGE=y
    CONFIG_MLX5_CLS_ACT=y
    CONFIG_MLX5_TC_SAMPLE=y
    CONFIG_MLX5_TLS=y
    CONFIG_MLX5_EN_TLS=y
    CONFIG_MLXSW_SPECTRUM=m

     

    Edited by bbqdt
    Link to comment

    @limetech

     

    You seemed to have missed 

     

    CONFIG_MLX5_ACCEL=y
    CONFIG_MLX5_ESWITCH=y
    CONFIG_MLX5_BRIDGE=y
    CONFIG_MLX5_CLS_ACT=y
    CONFIG_MLX5_TC_SAMPLE=
    CONFIG_MLX5_EN_TLS=y
    CONFIG_MLXSW_SPECTRUM=m

     

    Was that intentional?

    Edited by bbqdt
    Link to comment

    Those are all turned on, did you try it?

     

    Those options are turned on as a result of turning other options on.  I only document in the change log the options we had to actually change.

    Edit: double checking, I did leave out a few change log entries, but all of those options should be on.

    Link to comment

    Sorry, I was going off of release notes. Just updated and checked, and you are correct, they are included.

     

    Also, this works now with stock kernel.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.