(FYI) Mellanox 10GbE with VLAN tagging


Recommended Posts

I realize this may be somewhat of a niche use case, but I thought I'd document it here in case anyone else encounters similar issues.

 

I've recently been encountering stack traces, network connectivity issues, and the occasional hard crash on my unRAID box. I'm running the latest version of unRAID (6.5.3 at the time of writing), and have a Asus P8B-X mobo with a Xeon e3-1240v2, and 2 on-board Intel 82574L NICs. I've also added a Mellanox ConnectX-2 10GbE NIC. Originally I had 2 VLANs configured on eth0 (the Mellanox card), and the on-board Intel NICs were unconnected.

 

I began noticing the stack traces shortly after upgrading to 6.5.3, though I can't say for sure that it's software version related, or whether that's just coincidental with when I happened to check the syslog at just the right time. I'm not terribly familiar with diagnosing stack traces, but I did notice that each of the traces specified "last unloaded: mlx4_core" (the driver for my 10GbE NIC), and that there was always the following sequence in the trace:

Aug 19 16:29:23 nas kernel: do_softirq+0x46/0x52
Aug 19 16:29:23 nas kernel: netif_rx_ni+0x1a/0x20
Aug 19 16:29:23 nas kernel: macvlan_broadcast+0x117/0x14f [macvlan]
Aug 19 16:29:23 nas kernel: macvlan_process_broadcast+0xc5/0x10c [macvlan]
Aug 19 16:29:23 nas kernel: process_one_work+0x155/0x237

The simplistic interpretation of this combination of messages is that there's some bug in the way the Mellanox drivers are interacting with the macvlan network drivers, and it's causing stack crashes when it encounters some particular broadcast traffic. So as a test I removed all VLAN configuration from the Mellanox card, configured an unused port on my switch to be an access port to the second VLAN, and connected it to one of the Intel on-board NICs. I haven't seen any stack traces or issues otherwise since completing this change back on August 19th (just over a week). Previously I'd see traces/crashes about once a day or so.

 

So, I'm pretty satisfied that I've addressed my problem. I'm not sure how common this scenario is... how many of us are using 10GbE Mellanox cards, and how many of that subset of us have VLANs configured? All the same, I figured I'd post my experience here sort of as a PSA in case anyone else with a similar setup is trying to troubleshoot.

 

Cheers,

 

-A

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.