Jump to content

(6.12.6) Network connection dropouts & inconsistent performance


Go to solution Solved by Sakura_Nohana,

Recommended Posts

Hi All,

 

I just built a new server using UNRAID yesterday, and I've been running into a major issue with my network connection. Ranging from an hour to 5 hours, the connection to my network will completely dropout where I then have to unplug and replug my ethernet cable to the server in. And along with that, when it is connected and working performance seems to be all over the place. For a while speeds will be perfectly fine and then out of nowhere speeds will be bouncing crazily between ~40Mbps to 650Mbps while transferring large files for an hour.

image.png.924f16bbc05dc357fbc4c3bb2c71a691.png

I've installed the RTL8125 Drivers, rebooted multiple times, made sure the r8169 drivers are black listed, switching from 1Gbps to 2.5Gbps connection, different ethernet cables, entire different switches, and connecting directly into my router. The only change from doing any of this is possibly the time in-between dropouts increasing. Along with all of this the server stays on the listed devices on my router for a minute or two before dropping along with the lights on the ethernet ports stay lit along with flashing and the port in my router shows connected and at the right speed.

image.png.efb9a1d722999d05db437dd76300593f.png

 

Any help on solving this would be greatly appreciated, thanks!

 

Hardware/Specs

CPU: AMD R5 5600G
MOBO: Gigabyte B550I Aorus Pro AX
RAM: 1x 8GB DDR4 2400

HBA Card: LSI SAS2008 PCI-Express Fusion-MPT SAS-2

HDD: 3x ST20000NM007D

Boot Drive: SAMSUNG FIT Plus

unraidserver-diagnostics-20231230-1618.zip

Edited by Sakura_Nohana
Had some information that I got wrong that I updated as the problem happened again
Link to comment

There can be several reasons for this, you have to check them out one by one. I try to tell them in the "most likely" sequence:

 

1) bad cable / plug (try to wobble the plugs, if they can be moved, you have an outdated cable  (no, "Cat7" does not mean a thing if it is "raw cable". The cable is ok, but the plugs are still cat 5 type. "real cat7" plugs are minimal longer and sit tight in the card/switch)

2) missing flow control. 2.5 does not really exist. its a marketing gag to sell "not so good" productions of PHYs.

3) Many switches and cards do not know about it, they treat the connection as 10G and without proper flow control, they try to send date at the time slots that are "not good" for 2.5G. This data is lost and will be repeated later on. The more is to send, the more will go wrong and there will be a buffer jam soon. This looks like the up and downs in your diagram, it WILL finish someday but with ridiculous throughput

 

Errors like this are hard to track and fix. Everytime you think you have done it, it might show up again sooner or later.

 

The only real fix I know of is to throw away all that RJ45 stuff, get cards and switches with SFP+ cages and use fiber. I did this 2 years ago, no problems since then...

 

(update: there is something else that might help, I did this 2months ago and no problems since then: there are now real cheap 5 or 8 port 2.5G switches with one or two SFP+ cages on the market. They do the 10G pacing internally quite well. The normal lan runs with 10G and clients of these switches run with 2,5G constant speed. Here the 60€ were well invested to speed up some NUCs that cannot have new LAN cards)

 

Edited by MAM59
Link to comment
10 hours ago, MAM59 said:

There can be several reasons for this, you have to check them out one by one. I try to tell them in the "most likely" sequence:

 

1) bad cable / plug (try to wobble the plugs, if they can be moved, you have an outdated cable  (no, "Cat7" does not mean a thing if it is "raw cable". The cable is ok, but the plugs are still cat 5 type. "real cat7" plugs are minimal longer and sit tight in the card/switch)

2) missing flow control. 2.5 does not really exist. its a marketing gag to sell "not so good" productions of PHYs.

3) Many switches and cards do not know about it, they treat the connection as 10G and without proper flow control, they try to send date at the time slots that are "not good" for 2.5G. This data is lost and will be repeated later on. The more is to send, the more will go wrong and there will be a buffer jam soon. This looks like the up and downs in your diagram, it WILL finish someday but with ridiculous throughput

 

Errors like this are hard to track and fix. Everytime you think you have done it, it might show up again sooner or later.

 

The only real fix I know of is to throw away all that RJ45 stuff, get cards and switches with SFP+ cages and use fiber. I did this 2 years ago, no problems since then...

 

(update: there is something else that might help, I did this 2months ago and no problems since then: there are now real cheap 5 or 8 port 2.5G switches with one or two SFP+ cages on the market. They do the 10G pacing internally quite well. The normal lan runs with 10G and clients of these switches run with 2,5G constant speed. Here the 60€ were well invested to speed up some NUCs that cannot have new LAN cards)

 

The thing is the issues also happened when I had it connected via a 1gig link. I also looked into the server logs and it doesn't appear that any issues are logged until I unplug the ethernet. It's like the connection is still there and going but my router just loses the device listing.

 

And sadly getting a different NIC isn't exactly on the table. My MOBO is an ITX one and the PCIe slot is being consumed by my HBA card. The only way to fit one in is using a M.2 NVMe to PCIe adaptor and I haven't been able to find a reputable one that I can trust to work while poking around for one today.

 

 

Link to comment
Posted (edited)

I didn't think to mention it originally since the issues still persisted When hooked directly into my router. Every attempt failing, I've come to the conclusion that the chip has to be just straight faulty in someway and am just going to RMA the board. Which hopefully solves the issues since any other course of action is going to be more expensive than what I originally budgeted.

Edited by Sakura_Nohana
Additional Info
Link to comment

hmmm... strange...

since your problem looks to be "inbound" to unraid (from your diagram), I wonder, who is the sending side and why it maybe gets panicing..

 

It is unlikely to find anything in the UNRAID's logs because it is the innocent victim

Edited by MAM59
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...