Network Connection Keeps Disconnecting


Recommended Posts

I've had my new UnRAID server running for about a month now, but in the last few days the network connection to my server suddenly disconnects and I have no network connectivity. Internal communication between the server & the VM I have running on it works fine, but the VM cannot communicate with the rest of the network either. The only thing that resolves the issue is rebooting the server. The connection will be fine for a couple or more days & then drop out again.

 

It is a 10Gb connection to a Chelsio T520-LL-CR 2-port SFP+ PCIE card I installed. That is connected via a 5-meter 10G Twinax cable to a MikroTik CRS305-1G-4S switch. I have a Windows machine with the same Chelsio card connected to the MikroTik switch as well & haven't had any issues. I tried disconnecting & reconnecting the cable, using another cable, using the other port on the Chelsio card, using a different interface on the MikroTik switch. My motherboard is an ASUS Z10PE-D8 WS with dual Gigabit Ethernet interfaces which I don't use, but since the server is configured to bond all interfaces in active-backup mode, I tried connecting to the motherboard's Ethernet interfaces & neither of those got the connection to come back up even though the server logs listed the following:

 

Jul 26 15:43:32 UnRAIDSVR kernel: igb 0000:09:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Jul 26 15:43:32 UnRAIDSVR kernel: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex

Jul 26 16:07:58 UnRAIDSVR kernel: igb 0000:09:00.0 eth1: igb: eth1 NIC Link is Down
Jul 26 16:07:58 UnRAIDSVR kernel: bond0: link status definitely down for interface eth1, disabling it

Jul 26 16:08:04 UnRAIDSVR kernel: igb 0000:08:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Jul 26 16:08:05 UnRAIDSVR kernel: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex

 

The 10Gb interfaces are eth2 & eth3 on my server. I've attached the diagnostics from the server. The most recent network connection drop occurred around 2:17 PM today. Any help would be appreciated.

 

Thanks

 

 

unraidsvr-diagnostics-20200726-1704.zip

Link to comment

Hi there,

 

One major question before we can start diagnosing the issue is in regards to this:

 

On 7/26/2020 at 5:25 PM, Derrikdj said:

but in the last few days the network connection to my server suddenly disconnects

 

What happened in the last few days?  I'm asking because it's very unusual for a working system to all of the sudden start exhibiting this kind of behavior without any changes to the software or hardware.  Did you recently update Unraid and this started occurring after that?  What about updates to your router or switch?  Any hardware changes on the server or network recently?  Also from looking at the logs, it appears the behavior doesn't start until a little over 7 hours after you've booted it up.  Does that sound right?

 

Another thing you can try is to disable the use of eth0 and eth1 in the bonding group.  Stop the array and navigate to the Network Settings page and try taking those unused devices out of the bond configuration.

Link to comment
4 hours ago, jonp said:

Hi there,

 

One major question before we can start diagnosing the issue is in regards to this:

 

 

What happened in the last few days?  I'm asking because it's very unusual for a working system to all of the sudden start exhibiting this kind of behavior without any changes to the software or hardware.  Did you recently update Unraid and this started occurring after that?  What about updates to your router or switch?  Any hardware changes on the server or network recently?  Also from looking at the logs, it appears the behavior doesn't start until a little over 7 hours after you've booted it up.  Does that sound right?

 

Another thing you can try is to disable the use of eth0 and eth1 in the bonding group.  Stop the array and navigate to the Network Settings page and try taking those unused devices out of the bond configuration.

I did replace my NetApp X2065A-R6 Controller with an LSI 9200-8e controller on 7/23 per a recommendation due to multiple drive errors & missing disks after a reboot or shutdown. Here is the thread topic I started concerning that issue:

 

That seemed to resolve the disk errors & missing disks issue so far.

 

I can't say for sure if the network disconnects only started after the controller was replaced, but I believe it had occurred at least once before the controller was replaced. It has definitely become more frequent in the last few days since the controller was replaced with disconnects occurring at least once a day now. Other than that I've been continuously adding drives to my array, but I've been doing that since I first setup my UnRAID server. 

 

I started by converting the Windows OS running on the server into a VM. Then installed UnRAID on the server & passed through the SATA controllers on the motherboard to the Windows VM. I would then transfer data from the disks connected to the motherboard to disks in my UnRAID array which were installed in a NetApp DS4246 shelf. I would then move the empty disk to the disk shelf & add it to my array. I've been repeating this process starting with three 16TB drives (1 for the data array & 2 for parity) and have moved 7 disks from the motherboard to the disk shelf & array so far. I did notice that the disconnects seem to occur during large & long data transfers that had been running for several hours, but I've been almost constantly transferring data from the local motherboard disks to the UnRAID array since I've got my UnRAID server up & running with almost 100TB transferred & counting, So that might just be due to the fact that I'm usually transferring data than not. I've started large data transfers to run over night in the past without any issues. Also the data transfer between the VM & array don't appear to traverse the NIC as the transfers continue even after the network connectivity drops out. I often wait for the transfers to complete before I reboot the server.

 

I did add a QNAP QNA-UC5G1T USB 5GbE adapter to a laptop on my network back on 7/8. Connected via a Cat6 cable, I plugged an ipolex 10G SFP+ RJ45 Copper Transceiver into the same MikroTik switch as the UnRAID server. I've had no connectivity issues with the laptop or the other Windows machine with the same Chelsio NIC that is connected to that switch.

 

I've had two more disconnects since my original post occurring on 7/27 at 6:46am & again this morning 7/28 at 4:46am. The 7/27 disconnect occurred during a large file transfer from the Windows VM that I had started the previous night that was still running after the disconnect. The 7/28 disconnect this morning occurred after a file transfer that I started the previous night had completed maybe about hour prior to the latest disconnect. I've attached the diagnostic file I grabbed after the 7/27 disconnect.

 

I've probably provided a lot of information that may or may not be useful but I just wanted to provide any & all information that maybe helpful in helping me diagnose this issue.

 

I will try disabling eth0 and eth1 in the bonding group as you recommended. I currently have another large file transfer running that should finish in the next 2-3 hours. I will disable those interfaces once the transfer is complete.

unraidsvr-diagnostics-20200727-1420.zip

Link to comment

So I tried removing eth0 & eth1 from the bonding group but obviously you can't remove eth0 if the bonding group is configured on eth0. So I tried moving the static IP configuration to eth2 with no bonding group, but for some reason I couldn't configure DNS for any interface other than eth0. The UnRAID server apps & dockers were unable to resolve URLs with this configuration.

 

I ran into some issues trying to revert back to the original eth0 network configuration & completely lost network access to the server. After spending a couple hours late last night troubleshooting trying to re-establish network connectivity I was able to get things working again by deleting the network config file on the USB flash drive & letting UnRAID auto regenerate the network config file during boot with a DHCP configuration & then reapplying my original static IP address on eth0 with the other interfaces included in the bonding group. I noticed after the network config file was regenerated by UnRAID interfaces eth1, eth2, & eth3 looked different on the network setting page. Before they just said unconfigured, but now it says they are part of a bonding group & to see eth0. I thought maybe an issue with the previous network config may have been the cause of the previous network disconnects. I was tired at this point & didn't want to continue tinkering with the network settings last night, so I left the config as is & started another large file transfer to run over night. 

 

The next morning the network connection dropped out again at around 8:51am this morning (Diagnostic file attached). This time I modified the network-rules config file so that the 10Gbe interfaces on the PCIE card were now eth0 & eth1 and the motherboard NICs are now eth2 & Eth3. I was then able to disable the bonding group for eth0. We'll see if removing the bonding group makes a difference.

unraidsvr-diagnostics-20200729-1008.zip

Link to comment
  • 1 month later...

@jonp

Since setting the PCIE card's 10GBe interfaces to eth0 & eth1 and disabling the bonding group for eth0 after my last post on 7/29, I didn't experience any drops for almost two weeks. During that time period I continued adding HDDs to my UnRAID array without any issues. On 8/10 or 8/11 I added a M.2 NVME to PCIE adapter card along with a 2TB NVME SSD. On 8/13 I experienced another drop and had to reboot the server to restore connectivity. I went almost another two weeks before experiencing another drop on 8/24. Since then I've began experiencing drops more frequently again, occurring roughly every 1 or 2 days & in one case dropping less than 20 minutes after having just rebooted due to a previous drop on 8/29. Below is a list of times & dates that I received loss of connectivity alerts from software I have running on a VM running within the server:

 

Aug 13, 3:54 PM
Aug 24, 8:54 AM
Aug 27, 2:42 AM
Aug 29, 8:47 AM

Aug 29, 12:18 PM (Server was only online for about 20 min after previous reboot)
Aug 31, 12:36 PM
Sep 1, 6:55 AM
Sep 3, 2:00 AM
Sep 4, 3:07 AM

 

I've attached diagnostic files that I pulled after each drop listed above, while the network connection was still down before I rebooted the server. I may have pulled the 8/13 diagnostic file after I had rebooted.

 

Your help is appreciated in helping me try to identify the root cause of these network connection drops.

unraidsvr-diagnostics-20200904-1051.zip unraidsvr-diagnostics-20200903-1120.zip unraidsvr-diagnostics-20200901-0920.zip unraidsvr-diagnostics-20200831-1237.zip unraidsvr-diagnostics-20200829-1239.zip unraidsvr-diagnostics-20200829-1157.zip unraidsvr-diagnostics-20200827-1008.zip unraidsvr-diagnostics-20200824-1007.zip unraidsvr-diagnostics-20200813-1621.zip

Link to comment
  • 2 weeks later...
21 hours ago, akamemmnon said:

I have been having this same problem recently. I'm not sure what prompted it. I haven't made any hardware changes. I rebooted, I guess the next time this happens I will pull the diagnostics on my system too.

 

 

Ok so it happened again, the only way I notice this happens is because things like nextcloud and home assistant which I have accessible outside of my network are no longer accessible, but when I am on the local network and I navigate to the IP of my unraid server, its still running fine. But, when I look at my Google Wifi router (which I have used without this problem for years) I notice the device is grayed out and "not connected." when I restart the unraid server, it all works again until the next time...

 

here are the diagnostics, any help would be greatly appreciated

tower-diagnostics-20200913-1632.zip

Link to comment
On 7/29/2020 at 8:34 AM, Derrikdj said:

 I plugged an ipolex 10G SFP+ RJ45 Copper Transceiver into the same MikroTik switch

Does this have use long time without problem.
If it just new setup, then I would suggest avoid use media converter. Best connect both end in optical cable or DAC cable. ( Of course, you also need a SFP+ NIC )

Edited by Benson
Link to comment
34 minutes ago, Benson said:

Does this have use long time enough without problem.
If it just new setup, then I would suggest avoid use media converter. Best connect both end in optical cable or DAC cable.

The ipolex transceiver was installed back in early July & is only being used to connect a laptop to the network. I am indeed using a DAC cable to connect my Unraid server to the MikroTik switch. I've had no issues with any connectivity for the laptop & the transceiver. Nor have I had any issues with the Windows machine that is also connected to the same MikroTik switch via DAC cable. As I mentioned previously The Windows machine has the same model Chelsio T520-LL-CR 2-port SFP+ PCIE card as the Unraid server. The MikroTik switch isn't showing any errors for any interfaces or connections that may have indirectly affected the Unraid server's connection. Also I checked to make sure that the ipolex transceiver was compatible with the MikroTik switch before I purchased & installed it. I only mentioned the ipolex transceiver just to be thorough in describing my network environment. I highly doubt the ipolex transceiver is playing ANY part in the connectivity issues I've been experiencing with my Unraid server's network connection.

 

Fortunately I haven't experienced any drops since my last post on 9/4, but as I mentioned previously there was about a two week period that I didn't experiencing drops before I suddenly started experiencing frequent drops again. I will continue to monitor & update this thread if I start experiencing drops again in the near future.

Edited by Derrikdj
Link to comment
12 hours ago, Benson said:

Note. I have drop issue ( up and down frequently ) in one of Intel NIC, problem fix by replace another optical SFP+. I use MokroTik switch too.

My connection doesn't go up & down. Once it goes down, it won't come back up unless I reboot the server. The interface shows up on the server but it's not forwarding any network traffic. This was happening for the onboard motherboard NIC as well as the PCIE SFP+ card.

Link to comment
  • 1 year later...

@Derrikdj Did you ever find a solution for this? I've been having the same issue for over a year where Unraid will randomly disconnect from my network and isn't accessible on LAN either. It's driving me nuts and I cannot find the source of the issue. Reboot fixes things for some arbitrary period of time, sometimes a few hours, sometimes a few days, sometimes a few months.

Link to comment
1 hour ago, Raesche said:

@Derrikdj Did you ever find a solution for this? I've been having the same issue for over a year where Unraid will randomly disconnect from my network and isn't accessible on LAN either. It's driving me nuts and I cannot find the source of the issue. Reboot fixes things for some arbitrary period of time, sometimes a few hours, sometimes a few days, sometimes a few months.

I figured out my NIC was overheating. I cleaned the dust filters & fans in my server case which was way overdue & added some additional fans to improve airflow through the case. Haven't had any issues since.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.