Mellanox MCX311A-XCAT CX311A No longer working


BBergle

Recommended Posts

As the title says, I have a direct connection to my server from my PC using Mellanox ConnectX3 10G card. On first setup it worked fine and I got the correct transfer speeds. Now, a couple days later the network adapter keeps going from interface down and reconnecting every 5 seconds. I bought a new cable and that was not the problem. I also switched the PCIE ports in both computers and nothing changed. Has anyone encountered this issue before? Here is the Unraid Log. Thanks for any help.

 

 

Jan  3 12:12:51 UnraidServer kernel: br1: port 1(eth1) entered disabled state
Jan  3 12:12:53 UnraidServer kernel: mlx4_en: eth1: Link Up
Jan  3 12:12:53 UnraidServer kernel: br1: port 1(eth1) entered blocking state
Jan  3 12:12:53 UnraidServer kernel: br1: port 1(eth1) entered forwarding state
Jan  3 12:12:56 UnraidServer kernel: mlx4_en: eth1: Link Down
Jan  3 12:12:56 UnraidServer kernel: br1: port 1(eth1) entered disabled state
Jan  3 12:12:57 UnraidServer kernel: mlx4_en: eth1: Link Up
Jan  3 12:12:57 UnraidServer kernel: br1: port 1(eth1) entered blocking state
Jan  3 12:12:57 UnraidServer kernel: br1: port 1(eth1) entered forwarding state
Jan  3 12:13:00 UnraidServer kernel: mlx4_en: eth1: Link Down
Jan  3 12:13:00 UnraidServer kernel: br1: port 1(eth1) entered disabled state
Jan  3 12:13:02 UnraidServer kernel: mlx4_en: eth1: Link Up
Jan  3 12:13:02 UnraidServer kernel: br1: port 1(eth1) entered blocking state
Jan  3 12:13:02 UnraidServer kernel: br1: port 1(eth1) entered forwarding state
Jan  3 12:13:05 UnraidServer kernel: mlx4_en: eth1: Link Down
Jan  3 12:13:05 UnraidServer kernel: br1: port 1(eth1) entered disabled state
Jan  3 12:13:07 UnraidServer kernel: mlx4_en: eth1: Link Up
Jan  3 12:13:07 UnraidServer kernel: br1: port 1(eth1) entered blocking state
Jan  3 12:13:07 UnraidServer kernel: br1: port 1(eth1) entered forwarding state
Jan  3 12:13:10 UnraidServer kernel: mlx4_en: eth1: Link Down
Jan  3 12:13:10 UnraidServer kernel: br1: port 1(eth1) entered disabled state
Jan  3 12:13:11 UnraidServer kernel: mlx4_en: eth1: Link Up
Jan  3 12:13:11 UnraidServer kernel: br1: port 1(eth1) entered blocking state
Jan  3 12:13:11 UnraidServer kernel: br1: port 1(eth1) entered forwarding state
Jan  3 12:13:14 UnraidServer kernel: mlx4_en: eth1: Link Down
Jan  3 12:13:14 UnraidServer kernel: br1: port 1(eth1) entered disabled state
Jan  3 12:13:16 UnraidServer kernel: mlx4_en: eth1: Link Up
Jan  3 12:13:16 UnraidServer kernel: br1: port 1(eth1) entered blocking state
Jan  3 12:13:16 UnraidServer kernel: br1: port 1(eth1) entered forwarding state
Jan  3 12:13:19 UnraidServer kernel: mlx4_en: eth1: Link Down
Jan  3 12:13:19 UnraidServer kernel: br1: port 1(eth1) entered disabled state
Jan  3 12:13:21 UnraidServer kernel: mlx4_en: eth1: Link Up
Jan  3 12:13:21 UnraidServer kernel: br1: port 1(eth1) entered blocking state
Jan  3 12:13:21 UnraidServer kernel: br1: port 1(eth1) entered forwarding state
Jan  3 12:13:24 UnraidServer kernel: mlx4_en: eth1: Link Down

 

Link to comment

you did not tell us, WHICH connection your cards are using, Twisted Pair, Fiber or "Direct Attach" ?

So you get 3 answers, pick the right one.

 

"Twisted Pair" :

"You've got the wrong cable". Although most cables today are capable to support 10Gbe Ethernet, still most plugs aren't. So it is often pure Luck to get some cables that are fully compliant to the specs. There is a very small chance to "feel" the difference: the old plugs are a bit shorter than the new ones. If you slip them into the card, they can still be moved back and forth a very little bit. The new ones sit tight again.

It took me some months of testing, what to buy safely. It turned out that price is no real good base for guessing, at the end the quite cheap ones from Amazon worked well: (I'm not really a fan of Amazon, but they have some things in stock that they use internally too and the quality is excellent) https://www.amazon.de/gp/product/B07S8QR4YW/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1

Note: the length of the cable is very important! If you come close to 30m, you seriously should consider to switch to fiber instead

 

"Direct Attach":

Many DA cables seem to work, but have stabilty problems like link losts when used in certain equipments. Also, cables longer than 2m are more or less pure luck. All 10 cables that I have used here (different brands and sources) failed immediatly or after some months. You trade in money for stress. The very last one that is working here is just 20cm long...

 

"Fiber":

Sounds complicated, delicate and expensive, but it is not anymore. The modules are rather cheap, the cables can be bought ready made of any length, up to 300m are no problem and power consumption is low. I only had one faulty module so far (and it got replaced, still under warrenty). At the end, after a lot of stress, all computers and switches here are using fiber on all ports. Old 1Gbe Twisted Pair devices are on seperate switches with fiber uplink. Runs for a year now already without any lost/dead connection.

 

 

 

Link to comment

I'm using direct attach copper. It's a 1.5 meter cable and my original cable was 3m which worked for a while. I bought both cards from the same seller on eBay, it's been less than a month so i'll be able to return them

Edited by BBergle
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.