Mellanox Connectx2 Issues


Recommended Posts

2 minutes ago, Hannibal said:

The distance between PC's is less than id say 2ft the PC's that are connected with the cisco twinax cable are both in the same rack....

if they are so close, then passive DAC should work just fine, just find one that's working :)

6 minutes ago, Hannibal said:

efit to me switching over to fibre vs using the twinax cable?

none at the moment i think, but you can latter achieve very length cable distances with fibre if something changes with your setup, e.g. move server to the basement or so..

Link to comment
  • Replies 83
  • Created
  • Last Reply

Top Posters In This Topic

alright, well I've double checked the card i was having issues with that i pulled from my unraid server and everything from that card matches the one that was in my windows machine... so idk why the call trace issues i was having are occurring... I should have pulled diag when it happened but i was locked out of the web GUI every time it happened... if i remember correctly it said something to effect of "dump stack then some code i believe" idk what to do im kinda afraid to stick the card back in my unraid machine in fear another call trace will happen then i will be forced into another parity check by unraid....

Link to comment

I have reinstalled the mellanox card back into the unraid system... I am keeping my array stopped while i monitor to see if another call trace happens so i wont be forced into another parity check... Both cards match up identical in mlxup... so i honestly have no idea what could have caused the issue... I went over bios settings and made sure no power saving features were enabled... so i guess im just going to monitor the unraid pc and see what happens.... Fingers crossed to everything working this time around.... Here is a screenshot just so anyone can double check that i have this setup correctly... 

 

 

Screenshot (1).png

Link to comment
6 minutes ago, Hannibal said:

I have reinstalled the mellanox card back into the unraid system... I am keeping my array stopped while i monitor to see if another call trace happens so i wont be forced into another parity check... Both cards match up identical in mlxup... so i honestly have no idea what could have caused the issue... I went over bios settings and made sure no power saving features were enabled... so i guess im just going to monitor the unraid pc and see what happens.... Fingers crossed to everything working this time around.... Here is a screenshot just so anyone can double check that i have this setup correctly... 

 

 

Screenshot (1).png

Might want to force a MTU of 1500 just to make sure.  Jumbo frames and 10G can have interesting results!

 

Tim

 

Link to comment
1 minute ago, Hannibal said:

The connection has disabled itself again... idk what to do is there anyway to check the cards power saving settings through unraid like you can in windows?

You can try to copy over the utility program to unraid but you will probably face some missing library issues, you could always live boot ubuntu and test it out that way.  Id say 20% you got bad cards, 50% bad cable and the rest a weird config in the bios.

 

If i was you, live boot ubuntu and run a long ass ping test or iperf between machines.

 

Tim

 

Link to comment

I was honestly thinking of just converting my windows machine over to ubuntu.... Due to all these privacy leaks i hear about with windows 10 not sure how much truth there is to it.. But anywho... i noticed a bios setting for SR-IOV should i disable it on the card and in my bios? Or does this have no effect on the issue I'm having? Also the windows system is using ryzen CPU i know there were some launch issues with but im pretty sure they all got hammered out... Using ryzen over intel wouldn't have any effect on this specific issue would it?

Link to comment
1 minute ago, Hannibal said:

I was honestly thinking of just converting my windows machine over to ubuntu.... Due to all these privacy leaks i hear about with windows 10 not sure how much truth there is to it.. But anywho... i noticed a bios setting for SR-IOV should i disable it on the card and in my bios? Or does this have no effect on the issue I'm having? Also the windows system is using ryzen CPU i know there were some launch issues with but im pretty sure they all got hammered out... Using ryzen over intel wouldn't have any effect on this specific issue would it?

I think ryzen support is only hammered out in the RC releases, I had issues with 6.3 releases so i havent been paying attention to the newer stuff, i could be totally wrong.  The SR-IOV is only a system bios thing and should be disabled.  You can live boot your unraid + windows 10 systems to have a level playing field, install iperf and let it rip for a while.

 

TIm

 

Link to comment

This is where i found it on the card settings in windows and then i noticed the BIOS setting on my unraid server which is using a gigabyte gaming K7 mobo and a i7-7700k... but alright i will try iperf.... I was also thinking about buying a new DAC cable.... should i just purchase another cisco cable or should i purchase a mellanox link X cable? 

Screenshot (2).png

Link to comment
Just now, Hannibal said:

This is where i found it on the card settings in windows and then i noticed the BIOS setting on my unraid server which is using a gigabyte gaming K7 mobo and a i7-7700k... but alright i will try iperf.... I was also thinking about buying a new DAC cable.... should i just purchase another cisco cable or should i purchase a mellanox link X cable? 

Screenshot (2).png

If you disable from the bios it doesnt matter what the card has enabled, you might want to get some fibre gbics and do away with the twinax, id assume it would be cheaper!

 

TIm

 

Link to comment
7 minutes ago, Hannibal said:

ok if i may ask what fibre transceivers and cable are compatible with my mellanox cards?  

I used some old qlogic gbics i had laying around, these *SHOULD* work (i use those in my dlink switch):

https://www.amazon.com/gp/product/B00U8Q7946/ref=oh_aui_search_detailpage?ie=UTF8&psc=1

 

This is suppose to be for mellanox cards:

 

https://www.amazon.com/gp/product/B01D84KUBS/ref=oh_aui_search_detailpage?ie=UTF8&th=1

 

cable:

 

https://www.amazon.com/gp/product/B00X7TE91M/ref=oh_aui_search_detailpage?ie=UTF8&psc=1

Edited by klamath
new linkage
Link to comment
2 minutes ago, Hannibal said:

ok thank you for those links... im sitting here watching network connections manager and literally every few seconds you can see the connection say cable unplugged then it quickly goes back to being ok... Im starting to think i have a bad cable...

If you live in a populated area there are probably a few stores you can pop into and pick up a new twinax or fibre today, or prime now/prime same day.  SHI/Frys/Altex (TX)

Link to comment

nope i live in a small crappy town in PA lol... It's only 52 dollars plus shipping for me to order from amazon so i think I'm just going to go that route and covert over to fiber... I think my Cisco cable is bad... But then again i had my call trace issue also happen when both machines weren't physically connected as well... 

Edited by Hannibal
Link to comment

so then you think the cards themselves are bad? Because even when the link between both machines was working the max speed i ever saw was around id say 300 or 400 MB/s i think if i recall correctly.... Also those cards pictured in the link you sent me look different vs. my connectx-2 cards mine have fatter silver heatsinks on them...

Edited by Hannibal
Link to comment
16 minutes ago, Hannibal said:

so then you think the cards themselves are bad? Because even when the link between both machines was working the max speed i ever saw was around id say 300 or 400 MB/s i think if i recall correctly.... Also those cards pictured in the link you sent me look different vs. my connectx-2 cards mine have fatter silver heatsinks on them...

The cards are bad or something in your bios is mucking things up, you could try to do a bios reset to factory and live boot ubuntu and try.  The cards linked are single spf+ not dual like your cards (im assuming dual because your output shows two mac addrs)

 

Tim

 

Link to comment

Mine looks to have the same output:

 

C:\WINDOWS\system32>flint -d /dev/mst/mt26448_pci_cr0 query
Image type:          FS2
FW Version:          2.9.1200
Rom Info:            type=PXE version=3.3.400 devid=26448 proto=VPI
Device ID:           26448
Description:    Port1            Port2
MACs:                    0002c952b5be     0002c952b5bf
VSD:
PSID:                MT_0F60110010

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.