• 6.10 RC2 Mellanox ConnectX3 Not Functional


    trypowercycle
    • Closed

    Upgraded to 6.10 RC2 from latest stable release. Lost my Mellanox connectX3 card as a network interface. It is on the recommended firmware for Linux kernel 5.14. Firmware version 2.42.5000. 

     

    https://docs.mellanox.com/display/kernelupstreamv514/Linux+Kernel+Upstream+Release+Notes+v5.14

     

    I forgot to grab the log file before I reverted back to stable because my wife was bugging me to have Plex up and running again. But I can grab one tomorrow.

     

    Best I have for now is a picture I took from the syslog in the local GUI while I was doing some initial googling for the error. 
     

    The lines that seemed relevant were:

    kernel: mlx4 core: Mellanox Connedtx core driver v4.0-0
    Nov 16 21: 32:57 Unraid kernel: mlx4 core: Initializing 0000:01:00.0
    Nov 16 21:32:57 Unraid kernel: mlx4 core 0000:01:00.0: can't change power state from D3cold to DO (config space inaccessible)
    Nov 16 21:32:57 Unraid kernel: mlx4 core 0000:01:00.0: Multiple PFs not yet supported
    - Skipping PF
    Nov 16 21:32 57 Unraid kernel: mlx4 core: probe of boBo:d
    Failed with error -22

     

    Let me know if you’d like that full log or if this is a known issue already, thanks!

     

    56704B35-107A-491B-B27F-B5FAF1EF235F.jpeg




    User Feedback

    Recommended Comments

    No problem on 6.10 RC2, just note Nvidia Acquire Mellanox 😂

     

    [   28.914105] mlx4_core: Mellanox ConnectX core driver v4.0-0
    [   28.954868] mlx4_core: Initializing 0000:b3:00.0
    [   28.964990] mlx4_core 0000:b3:00.0: enabling device (0100 -> 0102)
    [   35.173294] mlx4_core 0000:b3:00.0: DMFS high rate steer mode is: disabled performance optimized steering
    [   35.211751] mlx4_core 0000:b3:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
    [   35.358556] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
    [   35.370670] mlx4_en 0000:b3:00.0: Activating port:1
    [   35.384860] mlx4_en: 0000:b3:00.0: Port 1: Using 16 TX rings
    [   35.396721] mlx4_en: 0000:b3:00.0: Port 1: Using 16 RX rings
    [   35.408895] mlx4_en: 0000:b3:00.0: Port 1: Initializing port
    [   35.433318] mlx4_en 0000:b3:00.0: registered PHC clock
    [   38.042378] mlx4_en: eth1: Link Up
    [   40.788582] mlx4_en 0000:b3:00.0: removed PHC
    [   42.162369] mlx4_core: Mellanox ConnectX core driver v4.0-0
    [   42.162419] mlx4_core: Initializing 0000:b3:00.0
    [   48.296705] mlx4_core 0000:b3:00.0: DMFS high rate steer mode is: disabled performance optimized steering
    [   48.296950] mlx4_core 0000:b3:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
    [   48.363072] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
    [   48.363220] mlx4_en 0000:b3:00.0: Activating port:1
    [   48.365702] mlx4_en: 0000:b3:00.0: Port 1: Using 16 TX rings
    [   48.365706] mlx4_en: 0000:b3:00.0: Port 1: Using 16 RX rings
    [   48.365942] mlx4_en: 0000:b3:00.0: Port 1: Initializing port
    [   48.366224] mlx4_en 0000:b3:00.0: registered PHC clock
    [   48.442886] mlx4_en: eth0: Link Up
    [   48.668016] mlx4_en: eth0: Steering Mode 1
    [   48.683720] mlx4_en: eth0: Link Up

     

    Link to comment
    2 hours ago, Vr2Io said:

    No problem on 6.10 RC2, just note Nvidia Acquire Mellanox 😂

     

    [   28.914105] mlx4_core: Mellanox ConnectX core driver v4.0-0
    [   28.954868] mlx4_core: Initializing 0000:b3:00.0
    [   28.964990] mlx4_core 0000:b3:00.0: enabling device (0100 -> 0102)
    [   35.173294] mlx4_core 0000:b3:00.0: DMFS high rate steer mode is: disabled performance optimized steering
    [   35.211751] mlx4_core 0000:b3:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
    [   35.358556] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
    [   35.370670] mlx4_en 0000:b3:00.0: Activating port:1
    [   35.384860] mlx4_en: 0000:b3:00.0: Port 1: Using 16 TX rings
    [   35.396721] mlx4_en: 0000:b3:00.0: Port 1: Using 16 RX rings
    [   35.408895] mlx4_en: 0000:b3:00.0: Port 1: Initializing port
    [   35.433318] mlx4_en 0000:b3:00.0: registered PHC clock
    [   38.042378] mlx4_en: eth1: Link Up
    [   40.788582] mlx4_en 0000:b3:00.0: removed PHC
    [   42.162369] mlx4_core: Mellanox ConnectX core driver v4.0-0
    [   42.162419] mlx4_core: Initializing 0000:b3:00.0
    [   48.296705] mlx4_core 0000:b3:00.0: DMFS high rate steer mode is: disabled performance optimized steering
    [   48.296950] mlx4_core 0000:b3:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
    [   48.363072] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
    [   48.363220] mlx4_en 0000:b3:00.0: Activating port:1
    [   48.365702] mlx4_en: 0000:b3:00.0: Port 1: Using 16 TX rings
    [   48.365706] mlx4_en: 0000:b3:00.0: Port 1: Using 16 RX rings
    [   48.365942] mlx4_en: 0000:b3:00.0: Port 1: Initializing port
    [   48.366224] mlx4_en 0000:b3:00.0: registered PHC clock
    [   48.442886] mlx4_en: eth0: Link Up
    [   48.668016] mlx4_en: eth0: Steering Mode 1
    [   48.683720] mlx4_en: eth0: Link Up

     

    Hmm interesting, so I guess the driver is functional if it’s working on yours. Is that a connectx 3 card? I wonder if it doesn’t like my flavor of the card for some reason…

    Link to comment

    I'm starting to think this is an issue with the firmware having SR-IOV enabled and my motherboard not supporting SR-IOV. I'm going to see if I can disable it in the firmware and go from there.

    Link to comment

    Also no problem here on 6.10.0-rc2:

    mlx4_core: Mellanox ConnectX core driver v4.0-0
    mlx4_core: Initializing 0000:01:00.0
    mlx4_core 0000:01:00.0: DMFS high rate steer mode is: disabled performance optimized steering
    mlx4_core 0000:01:00.0: 31.504 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x4 link)
    mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
    mlx4_en 0000:01:00.0: Activating port:1
    mlx4_en: 0000:01:00.0: Port 1: Using 12 TX rings
    mlx4_en: 0000:01:00.0: Port 1: Using 8 RX rings
    mlx4_en: 0000:01:00.0: Port 1: Initializing port
    mlx4_en 0000:01:00.0: registered PHC clock
    mlx4_en: eth0: Link Up
    mlx4_en: eth0: Steering Mode 1
    mlx4_en: eth0: Link Up

     

    grafik.thumb.png.c4d365c742a1dbad8af19237aa89fc8c.png

    Link to comment
    3 minutes ago, ich777 said:

    Also no problem here on 6.10.0-rc2:

    mlx4_core: Mellanox ConnectX core driver v4.0-0
    mlx4_core: Initializing 0000:01:00.0
    mlx4_core 0000:01:00.0: DMFS high rate steer mode is: disabled performance optimized steering
    mlx4_core 0000:01:00.0: 31.504 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x4 link)
    mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
    mlx4_en 0000:01:00.0: Activating port:1
    mlx4_en: 0000:01:00.0: Port 1: Using 12 TX rings
    mlx4_en: 0000:01:00.0: Port 1: Using 8 RX rings
    mlx4_en: 0000:01:00.0: Port 1: Initializing port
    mlx4_en 0000:01:00.0: registered PHC clock
    mlx4_en: eth0: Link Up
    mlx4_en: eth0: Steering Mode 1
    mlx4_en: eth0: Link Up

     

    grafik.thumb.png.c4d365c742a1dbad8af19237aa89fc8c.png

    Any chance you could query your card so I can compare it to mine with a mstconfig -d 'device-id' query? I wonder if I have some funky config. Mine is below:

     

    Device type:    ConnectX3       
    Device:         01:00.0         
    
    Configurations:                              Next Boot
             SRIOV_EN                            False(0)        
             NUM_OF_VFS                          8               
             LINK_TYPE_P1                        ETH(2)          
             LINK_TYPE_P2                        ETH(2)          
             LOG_BAR_SIZE                        3               
             BOOT_PKEY_P1                        0               
             BOOT_PKEY_P2                        0               
             BOOT_OPTION_ROM_EN_P1               True(1)         
             BOOT_VLAN_EN_P1                     False(0)        
             BOOT_RETRY_CNT_P1                   0               
             LEGACY_BOOT_PROTOCOL_P1             PXE(1)          
             BOOT_VLAN_P1                        1               
             BOOT_OPTION_ROM_EN_P2               True(1)         
             BOOT_VLAN_EN_P2                     False(0)        
             BOOT_RETRY_CNT_P2                   0               
             LEGACY_BOOT_PROTOCOL_P2             PXE(1)          
             BOOT_VLAN_P2                        1               
             IP_VER_P1                           IPv4(0)         
             IP_VER_P2                           IPv4(0)         
             CQ_TIMESTAMP                        True(1)

     

    Link to comment
    5 minutes ago, trypowercycle said:

    Any chance you could query your card so I can compare it to mine with a mstconfig -d 'device-id' query? I wonder if I have some funky config. Mine is below:

    If I'm not completely wrong I can't query the configuration from my card because I deleted the BIOS ROM from my card for faster boot speeds and because I simply don't need it, these are only the settings for "Next Boot".

     

    But I can try.

     

    EDIT: Yes, can't query the card.

    root@Server:~# mstconfig -d 01:00.0 query
    
    Device #1:
    ----------
    
    Device type:    ConnectX3       
    Device:         01:00.0         
    
    Configurations:                              Next Boot
    -E- Failed to query device current configuration

     

    Link to comment
    5 hours ago, trypowercycle said:

    Is that a connectx 3 card?

    Yes, firmware should be latest 2.42.5000 ( 2017 release ), FYR.

    Link to comment
    Just now, Vr2Io said:

    Yes, firmware should be latest 2.42.5000 ( 2017 release ), FYR.

    Yeah, I updated to that firmware version last night to see if that was the issue... No dice unfortunately. 

    Link to comment

    Nov 16 21:32:57 Unraid kernel: mlx4 core 0000:01:00.0: can't change power state from D3cold to DO (config space inaccessible)

     

    Suggest you try different slot.

    Link to comment
    6 minutes ago, Vr2Io said:

    Nov 16 21:32:57 Unraid kernel: mlx4 core 0000:01:00.0: can't change power state from D3cold to DO (config space inaccessible)

     

    Suggest you try different slot.

    That's worth a shot... I suppose I could try a bios update as well for the hell of it. Googling that issue shows a bunch of Nvidia driver issues which (kinda makes since since they own Mellanox now) and talks about pcie power settings.

    Link to comment
    10 minutes ago, trypowercycle said:

    That's worth a shot... I suppose I could try a bios update as well for the hell of it. Googling that issue shows a bunch of Nvidia driver issues which (kinda makes since since they own Mellanox now) and talks about pcie power settings.

    Got similar founding, may be those are generic error message.

     

    Does your NIC was dual pprt and passthough one port by VFIO ?

    Edited by Vr2Io
    Link to comment
    8 hours ago, Vr2Io said:

    Got similar founding, may be those are generic error message.

     

    Does your NIC was dual pprt and passthough one port by VFIO ?

    Well the plot thickens... I guess this is solved. I swapped it with my SAS controller and now the Mellanox works but the SAS controller doesn't... So something is preventing devices in that x16 slot to work.

    Link to comment
    4 minutes ago, trypowercycle said:

    Well the plot thickens... I guess this is solved. I swapped it with my SAS controller and now the Mellanox works but the SAS controller doesn't... So something is preventing devices in that x16 slot to work.

     

    Nice 👍

     

    8 hours ago, Vr2Io said:

    Suggest you try different slot.

     

    Link to comment

    I'm going to close out this thread and file a new more genera bug report for the "can't change power state from D3cold to DO (config space inaccessible)" occurring on any pcie device plugged into the x16 slot on my board.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.