4 Port LAG not behaving as expected

Menaan · March 24, 2022

I've recently built a new NAS server and am testing out unRAID on it. The server has an onboard 2.5Gb NIC, and I added a 4 x 1 Gb PCIx expansion card. The card supports LAG 802.3. My old server also has 4 x 1 Gb ports configured with LAG. I know the LAG configuration on the old server is working because I can initiate transfers on multiple clients and max out multiple 1 Gb ports.

The unraid server however is not behaving as I'd expect. I fired off multiple rsync jobs (I know each one will be limited to a single 1 Gb port) but, rather than using multiple ports, they all loaded up onto eth1 and shared the bandwidth of it. I'm not sure why this is? I have confirmed in my switch that the LACP is recognized on the 4 ports, so that part seems to be configured correctly.

I then tried just leaving each port individual and assigning them all different static IPs. This seemed to work for 2 rsyncs once, but then the 3rd still went onto the same ethernet port as one of the 2 already running rsyncs.

Can someone help me understand what is going on here? I should be able to get at least 3 x 1 Gb streams going on this thing (I know on the old server the 4th port is used for failover so I won't get 4). But I can't figure a reason for why it picks which port it does and when LACP is on, why it doesn't use more than one port.

EDIT:

Just to add to this, I know the nic grouping isn't an issue on the old server. If I do just 2 x Rsync processes and look at the NICs on the old server, it's using 2 different NICs (just limited to around half the bandwidth each since the new server is only using 1 NIC for some reason).

Edited March 24, 2022 by Menaan

Menaan · March 24, 2022

Ugh, even more annoying now I've observed the server switching streams that were running on separate nics to the same nic mid stream...

I had atm 3 rsyncs running, and they were all running on one of the 4 port bond1 ports (still no idea why it won't use more than one port). I had only been able to get them to even use bond1 by unplugging the 2.5 Gb onboard that is currently bond0 (even though it's not actually bonded to anything, I read one post somewhere saying that the stand alone eth0 had to be set to bond for the 802.3ad to work on the others).

So anyway, I figured since the server isn't properly utilizing all the ports, I'd start a copy progress over another PC with a 2.5 Gb port to the 2.5 Gb port in the server. Even though I'd only be getting part of the traffic since that computer would have to both pull the data from the old server and send the data to the new server on the same single port in that computer. So, there it was maxing out a single 1 Gb port on the bond1 set, leaving 3 ports idle. And as soon as I start copying data from another device through the 2.5 Gb port, all the transfers jump over to the 2.5 Gb port... and since my 2.5gb switch feeds into my 1 gb switch through a single 1 gb port, it's limited to 1 Gb as well.

Ugh... this networking behavior just does not feel right and I have no idea why it's behaving this way or how to fix it. It should not be leaving 3 ports idle in a LAG setup with multiple connections, much less the now 4 ports idle. *sigh*

Edited March 24, 2022 by Menaan

Menaan · March 24, 2022

And another odd piece of this puzzle...

Just trying different things out to see if I can maximize transferring data to this new server. I turned off all bonding. And have 2 transfers successfully going but again not as I would expect based on configuration settings.

So currently I have eth0 set with ip 160 (yes there is more but just the last 3 digits for uniqueness), eth1 ip 161, eth2 ip 162. eth3 and eth4 are both not configured. I have one rsync stream running on the new server pulling data from the old server, and one running on the old server pushing data to the IP 161. And it's successfully maxing out 2 x 1 Gb ports. Now, I would think based on my configuration, one stream would be goign to eth1, and one to either eth0 or eth2 depending on which one the pull decided to use.

But no... if I look at network traffic on the new server eth0 is running at 1Gb and eth2 is running at 1Gb. And a few minutes ago it was eth2 and eth3. Remember, eth3 not even being configured, yet it was still somehow deciding to use that.

This just adds to my confusion on exactly what the heck the networking is even doing in this =/

Vr2Io · March 25, 2022

4 hours ago, Menaan said:

I turned off all bonding.

4 hours ago, Menaan said:

eth0 set with ip 160 (yes there is more but just the last 3 digits for uniqueness), eth1 ip 161, eth2 ip 162.

You should set them in different subnet, so it will correctly route the traffic to corresponding interface.

Menaan · March 25, 2022

Yes, I could do that, but I'd still like to understand why it does what it does as it is. Why doesn't it balance when using LACP? The old server does that just fine, if I look at it with multiple connections they are spread across 3 ports. And even when LACP is off, I don't understand how it can decide to route traffic through a port that isn't even configured. I'd like to understand these things so I can be sure the networking on unraid is going to be sufficient for my needs before I put this server into a real use case.

Vr2Io · March 25, 2022

4 hours ago, Menaan said:

how it can decide to route traffic through a port that isn't even configured.

Quite weird traffic will route to those non-member port, does your switch correctly recognize which port is the LACP member ? I haven't apply LACP with Unraid, just simple try this technic in longtime ago.

Actually, I try to suggest why traffic won't correctly route with individual interface setup, not about LACP.

4 hours ago, Menaan said:

I'd like to understand these things so I can be sure the networking on unraid is going to be sufficient for my needs before I put this server into a real use case.

Sure, I also use much time to change my network design upon software / hardware / application change.

Edited March 25, 2022 by Vr2Io

Menaan · March 27, 2022

Anyone else have any information on this for me? Is the networking behavior as expected for unraid? I ran into another issue with it today and I'm trying to figure out if these are unraid issues, or there might be an issue with the 4 port NIC card I have or what.

The issue I ran into today was after setting my 4 port bond back up with LACP configured. I could tell the unraid server was responding slow but I was trying to install plex. It kept erroring out with "upstream timed out" errors. After a lot of various troubleshooting and searching I figured out it was due to LACP. As soon as I disabled bonding on those 4 ports everything started responding much quicker and I was able to install plex without a problem.

I know my switch is working correctly and has the LACP correctly configured (I have other devices that are working with LACP on this same switch as mentioned before with my old server). So that means these problems are either due to the NIC card that I got, or something to do with how unRAID is handling the LACP.

4 Port LAG not behaving as expected

Recommended Posts

Menaan

Link to comment

Menaan

Link to comment

Menaan

Link to comment

Vr2Io

Link to comment

Menaan

Link to comment

Vr2Io

Link to comment

Menaan

Link to comment

Join the conversation