UPDATED: Network drops on 10g switch. Now unknown issue again.


Recommended Posts

Getting an odd issue with my Unraid 6.6.2 (happened on 6.6.1 also) network connection dropping when connected to my Ubiquiti US-16-XG switch. First sign of issue was that booting the server wouldn't pull an IP address. When I booted with the server plugged into my US-24 switch all was fine. Hot swapping the RJ45 from the US-24 to the XG also was fine for a while. But the network connection would ultimately drop after anywhere from 10 min to a couple hours. And then the server was unreachable. My Unraid server has a mainboard with two NICs, both Intel 10g, and I am only using one of them. I know that Unraid 6.6+ now bridges multiple physical NICs into br0. Not sure if that is related, but it's odd that this only happens when the connection is on the XG at 10g and doesn't happen on the US-24 at 1g. Pretty frustrated at this point. I've escalated this issue with Ubiquiti support to get some help from them. I will also note that my ESXi server - same exact hardware config - is solid as a rock when connected to the XG via a single NIC. So there is something about the Unraid networking that is contributing or even causing this issue.

 

I'm not a networking expert so hoping someone here can give me some insights or suggestions. Is the default NIC bridging a requirement? I plan to only use one of the NICs - eth0 - and wonder if I can just turn off bridging eth1 into br0 to see if that results in the connection holding. The XG switch defaults to Spanning Tree RSTP if that matters. STP is also an option but not selected. But it's also RSTP on the US-24 and there is no issue with connection stability when that is the termination point.

 

Any guidance or suggestions for resolving this?

Edited by misterwiggles
mispelling
Link to comment
  • misterwiggles changed the title to Network drops on Ubiquiti switch. STP related?

I have 16-XG, it link to main 1G network. All 10G run in SFP+ optic and same VLAN to seperate with 1G network.

 

2 Unraid server each have 1G and 10G NIC, they are Intel and Emulex.

 

No issue. ( no VM, no bridge, just simple Network in 2 subnet)

Edited by Benson
Link to comment
33 minutes ago, Benson said:

I have 16-XG, it link to main 1G network. All 10G run in SFP+ optic and same VLAN to seperate with 1G network.

 

2 Unraid server each have 1G and 10G NIC, they are Intel and Emulex.

 

No issue. ( no VM, no bridge, just simple Network in 2 subnet)

@Benson I'm running the default Unraid networking for 6.6. That default is to bridge multiple NICs, when they exist, into a br0. I'm running the default configuration on Unraid and this is how it is setup. My mainboard has the two NICs on board but I am only using one of them as eth0. But eth1 is bridged in to that br0 as I described above. So can I remove one of the NICs from this bridge and just run Unraid off the one eth0 connection? My understanding is that the bridge is required for Docker and VM networking. I am also running my 10g connection off the copper/RJ45 Port 13 on the XG and not the SFP+ ports.

 

I wonder if I was ONLY using the one NIC without a bridge whether this would solve my connection issue. It would seem that your not having issues narrows the problem to either Unraid bridging setup and/or Unraid's interaction with that RJ45 copper port on the XG. I suspect it has something to do with the bridging. As I said above, I have EXACT same hardware in another server running ESXi with same single NIC being used with the 2nd one unattached. That server has NO issues on the XG via the copper ports. So this is why I am seeking help from the Unraid networking gurus as I know that 6.6 changes a bunch of networking code and this bridging setup may be the root cause of my issue when on the XG.

 

 

Link to comment

I haven’t set bridge in Unraud, if you haven't docker / VM, pls try no bridge.

 

Early 16XG hardware revision, RJ45 have issue with many NIC if run in 10G, mine was that one. But seems you have identical hw but no issue in RJ45 10G, so I think your one was new hw revision.

 

Or could you try set the port speed to 1G on problem Unraid  server and check have same problem first.

Edited by Benson
Link to comment

@BensonI am using Docker containers and VMs but nothing specific to the bridge that I am aware of. I don’t understand how bridging works in Unraid. Can I just delete my unused eth1 NIC connection from the br0 bridge? So that would mean that I still have Bridging=YES set but it would only be using the eth0 connection for the single NIC I have cabled. I ask this in case I need to have bridging turned on for VMs and Docker containers to use as needed. Or do I just turn bridging off completely? But again, I thought Docker and VMs needed that.

Link to comment

UPDATE: So to continue making progress on this issue I decided remove eth1 from the default bridge (br0) since there is no cable attached and I'm not using it. I then shut down the Unraid server and plugged the RJ45 back into the XG port that was giving me problems. This time booting the server all went well and the server pulled the IP address as expected. I've also had NONE of the network drops - so far - that I was experiencing on this switch prior when eth1 was bridged with eth0 (NIC with cable). So I'm concluding that for some reason the XG doesn't like Unraid's bridge with two NICs when only one is connected. This is NOT an issue when plugging the same exact setup into a 1g UBNT switch. I have no idea why this presents as a problem at 10g on the XG but seems resolved for now.

 

My next step is to ultimately connect my 2nd NIC to the XG and set up bonding with balance-alp. Hoping that doesn't cause any issues to return.

Link to comment
  • misterwiggles changed the title to UPDATED: Network drops on 10g switch. Appears to be Bridge issue w. 2 NICs
4 minutes ago, misterwiggles said:

So I'm concluding that for some reason the XG doesn't like Unraid's bridge with two NICs

When two or more interfaces are member of a bridge, spanning tree protocol (stp) is enabled to prevent layer 2 loops.

Perhaps the Ubiquiti switch has issues with STP?

 

Ps. Setting up a bonded interface with two or more interfaces does not involve STP

Link to comment
53 minutes ago, bonienl said:

When two or more interfaces are member of a bridge, spanning tree protocol (stp) is enabled to prevent layer 2 loops.

Perhaps the Ubiquiti switch has issues with STP?

 

Ps. Setting up a bonded interface with two or more interfaces does not involve STP

The XG switch is setup for RSTP as is the other US-24 switch that doesn't see this problem. So totally unclear on what is going on there.

Link to comment

I'm having a similar issue to yours with one of my servers. I can connect to my 48 port with two ethernet cables and aggregate 802.3ad no problem, but on the XG it starts to give issues.

I recently tried it again with having the cables connected to 48 I aggregated two ports on the XG and plugged said cables in. Watching the switch for a few seconds you could tell something wasn't right, the light was going off and on slowly instead of staying solid/normal networking blinks. Sure enough I went back to see and couldn't connect. 

I'm going to try your way and see if that fixes it. 

Link to comment
1 hour ago, slimshizn said:

I'm having a similar issue to yours with one of my servers. I can connect to my 48 port with two ethernet cables and aggregate 802.3ad no problem, but on the XG it starts to give issues.

I recently tried it again with having the cables connected to 48 I aggregated two ports on the XG and plugged said cables in. Watching the switch for a few seconds you could tell something wasn't right, the light was going off and on slowly instead of staying solid/normal networking blinks. Sure enough I went back to see and couldn't connect. 

I'm going to try your way and see if that fixes it. 

Yeah, the whole thing is weird. I escalated this issue to UBNT and they are supposed to be looking at it through their escalation team. I'm happy for now to just have a stable connection for the single eth0 NIC on the XG. But like I said, I will connect the 2nd NIC to the XG and turn on bonding=balance-alp as most of what I've read suggests this will work fine. For some reason the XG just doesn't like Unraid bridging when there is no connection to one of the NICs.

Link to comment

NEW UPDATE:

 

New eth0 only setup as per above held all day to the XG but the connection was dropped again by approx. 8pm this evening. Server is inaccessible yet 10g LED light is lit on the XG. But Unifi Controller shows the server as gone from the client list. So I don't know what to do at this point. I've simplified the networking as much as possible short of just turning off bridging altogether, but I assume that would make using Docker and VMs a non-starter.

 

Anyone have any other insights to share on what might be going on here? Pretty frustrated. Going back to the US-24 and the 1g connection. But I just don't get why this 10g connection won't hold up. Seems like others having similar issues such as @slimshizn

Link to comment

NEW NEW UPDATE:

 

A complete reboot of the Unraid server DID NOT result is an accessible server after reboot. Something going on with the switch and its interaction with Unraid. Just not clear on the root cause. But I would have expected a reboot to reset the networking issues whatever they are and start the whole process all over again. My guess is that I will need to shutdown the server again, move the network connection back to the 1g switch, boot and things will be normal again. Just not at 10g, which is the whole reason I bought this server and switch. Ugh.

Link to comment

I'm probably going to send mine back and just do direct connections. It would have been nice for future expansion but if there's going to be issues it's obviously not worth it.

When I switched from 10G to 1G on the XG on the RJ45 ports, it held a connection, so from my findings it's just unstable at 10G on those ports. I'm holding a steady connection on the SFP+ ports ( Although there are other problems with that as well ).

Edited by slimshizn
Link to comment
2 hours ago, Benson said:

Nice to note 16-XG have such issue.

If I am correct stp/rstp could turn off in XG ?

So I only need spanning tree (RSTP or STP) when bridging the two physical NICs? In other words, I can turn OFF any spanning tree on this switch if I am not bridging two or more NICs. Still using Docker and VMs on Unraid so not clear if I need STP or not. Ready to try anything to further isolate issue and pass on to UBNT team.

Link to comment

Getting desperate now to narrow issue. I had my Unraid server set to static IP and Gateway in Unraid itself. I noticed on the UBNT side that when I tried to set a static address on the router/USG for this server the Unifi Controller would give me an error. And it seems odd to me that my network drops appear to be the router losing the IP and route to my server on the XG but not the network link connection itself. So last night I set Unraid to dynamic IP and gateway and it pulled a new IP address while still connected to the XG. I then went back to the Unifi side and this time it let me set the 'fixed' IP address for the Unraid server on the new IP. I rebooted the server and it cleanly connected to the network and was assigned the correct IP address by the USG. That connection has been up on the XG all night now. Will see how it goes today. I'm speculating that the XG is somehow losing the route to the Unraid server over time when the IP is being set on the Unraid side. But if I let the network/router manage setting the address and gateway addressing I'm wondering if this will change. Could be totally off the mark here but just trying to narrow things down.

  • Like 1
Link to comment
5 hours ago, misterwiggles said:

So I only need spanning tree (RSTP or STP) when bridging the two physical NICs?

You don't need RSTP or STP for bridging, Unraid also no relate setting for RSTP/STP.

 

I set off because I haven't implement any RSTP/STP in my network.

 

1.png.7d46f5c6d2f94f5dbbcde2544212678d.png

 

If according your last result, it seems not relate STP/RSTP or DHCP/Static IP. You should got XG unstable problem if RJ45 run in 10G. like @slimshizn case.

 

https://community.ubnt.com/t5/UniFi-Routing-Switching-Beta/stacking-2-SW-16-XG-via-copper-ports-getting-many-errors-Still/m-p/2522048#M27597

 

https://community.ubnt.com/t5/UniFi-Routing-Switching-Beta/UniFi-Switch-16-XG-Xeon-D-1541-10GBase-T/m-p/1777358/highlight/true#M1572

 

I have setup below case for testing ( Unraid, 10G in SFP+ optic ), no issue and will update the result for 1hrs test.

- Static IP on 10G br0

- br0 form eth1 + eth2 ( eth2 no SFP+ )

- XG Switch turn-on STP

 

2.png.107e6967f5417d5c5b87279c5acc9587.png

 

** Update **

No issue found for above test with 1hr.

3.png.809823581278eabf32c8c7eb773d2fb5.png

 

Edited by Benson
Link to comment
4 hours ago, slimshizn said:

I'm probably going to send mine back and just do direct connections. It would have been nice for future expansion but if there's going to be issues it's obviously not worth it.

When I switched from 10G to 1G on the XG on the RJ45 ports, it held a connection, so from my findings it's just unstable at 10G on those ports. I'm holding a steady connection on the SFP+ ports ( Although there are other problems with that as well ).

Return XG should be good choice, my one buy at beta phase ( half price ). I not live in US, shipping cost too much.

Link to comment

@Benson Do you have any updates in your testing? My update is that after switching IP addresses to set the static IP from the Unifi side of things, I managed to have my network connections stay up for 12 hours straight until I had to leave for the weekend. I also left my Unraid log open on screen to see if I could see anything happening once I got back. Unfortunately, some time over Friday-Sunday my connection dropped again and there was nothing new in the log file.

 

My next step is to set Port 13 on the XG to 1g and isolate whether it is the XG itself that is losing the IP/routing or something to do with the 10g link speed itself on those ports.

Link to comment
  • misterwiggles changed the title to UPDATED: Network drops on 10g switch. Now unknown issue again.
19 hours ago, misterwiggles said:

My next step is to set Port 13 on the XG to 1g and isolate whether it is the XG itself that is losing the IP/routing or something to do with the 10g link speed itself on those ports. 

The 1 hrs test no issue, I already change back all setting.

 

Set port to 1G expect to be good trouble shooting step.

Edited by Benson
Link to comment
Just now, Benson said:

The 1 hrs test no issue.

Set port to 1G expect to be good trouble shooting step.

Yes @Benson. I've had no issues since setting Port 13 on the XG to 1g. What can we conclude from this? That this is DEFINITELY a XG issue, i.e. it won't hold the network connection when the port is set for 10g? There were a LOT of these reported issues with this switch during it's beta phase. Apparently UBNT worked on the issues with firmware updates. A lot of the issues were specific to Intel NICs which is what I have. However, I have another server running ESXi which is the exact same hardware and it doesn't lose the connection to the XG at 10g. So I keep concluding that there is something related to Unraid networking at 10g connected to this switch. I'm completely out of ideas. But running the XG ports at 1g for Unraid is not why I bought the switch.

 

I'm wondering if getting a 10g-T transceiver for one of the SFP+ ports would fix this. But those cost around $130 so not exactly what I want as the fix. Still waiting on UBNT support for a response on my escalated issue.

Link to comment
11 minutes ago, misterwiggles said:

That this is DEFINITELY a XG issue

Yes, firmware cant fix this.

 

11 minutes ago, misterwiggles said:

I'm wondering if getting a 10g-T transceiver for one of the SFP+ ports would fix this. 

I dont agree do this, if possible, pls contact UBNT to change whole switch.

When I first setup 10G, I know those issue from UBNT forum, so I plan all 10G in SFP+ optic.

Edited by Benson
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.