December 30, 20232 yr Hi All, I just built a new server using UNRAID yesterday, and I've been running into a major issue with my network connection. Ranging from an hour to 5 hours, the connection to my network will completely dropout where I then have to unplug and replug my ethernet cable to the server in. And along with that, when it is connected and working performance seems to be all over the place. For a while speeds will be perfectly fine and then out of nowhere speeds will be bouncing crazily between ~40Mbps to 650Mbps while transferring large files for an hour. I've installed the RTL8125 Drivers, rebooted multiple times, made sure the r8169 drivers are black listed, switching from 1Gbps to 2.5Gbps connection, different ethernet cables, entire different switches, and connecting directly into my router. The only change from doing any of this is possibly the time in-between dropouts increasing. Along with all of this the server stays on the listed devices on my router for a minute or two before dropping along with the lights on the ethernet ports stay lit along with flashing and the port in my router shows connected and at the right speed. Any help on solving this would be greatly appreciated, thanks! Hardware/Specs CPU: AMD R5 5600G MOBO: Gigabyte B550I Aorus Pro AX RAM: 1x 8GB DDR4 2400 HBA Card: LSI SAS2008 PCI-Express Fusion-MPT SAS-2 HDD: 3x ST20000NM007D Boot Drive: SAMSUNG FIT Plus unraidserver-diagnostics-20231230-1618.zip Edited December 30, 20232 yr by Sakura_Nohana Had some information that I got wrong that I updated as the problem happened again
December 31, 20232 yr Community Expert There can be several reasons for this, you have to check them out one by one. I try to tell them in the "most likely" sequence: 1) bad cable / plug (try to wobble the plugs, if they can be moved, you have an outdated cable (no, "Cat7" does not mean a thing if it is "raw cable". The cable is ok, but the plugs are still cat 5 type. "real cat7" plugs are minimal longer and sit tight in the card/switch) 2) missing flow control. 2.5 does not really exist. its a marketing gag to sell "not so good" productions of PHYs. 3) Many switches and cards do not know about it, they treat the connection as 10G and without proper flow control, they try to send date at the time slots that are "not good" for 2.5G. This data is lost and will be repeated later on. The more is to send, the more will go wrong and there will be a buffer jam soon. This looks like the up and downs in your diagram, it WILL finish someday but with ridiculous throughput Errors like this are hard to track and fix. Everytime you think you have done it, it might show up again sooner or later. The only real fix I know of is to throw away all that RJ45 stuff, get cards and switches with SFP+ cages and use fiber. I did this 2 years ago, no problems since then... (update: there is something else that might help, I did this 2months ago and no problems since then: there are now real cheap 5 or 8 port 2.5G switches with one or two SFP+ cages on the market. They do the 10G pacing internally quite well. The normal lan runs with 10G and clients of these switches run with 2,5G constant speed. Here the 60€ were well invested to speed up some NUCs that cannot have new LAN cards) Edited December 31, 20232 yr by MAM59
December 31, 20232 yr Author 10 hours ago, MAM59 said: There can be several reasons for this, you have to check them out one by one. I try to tell them in the "most likely" sequence: 1) bad cable / plug (try to wobble the plugs, if they can be moved, you have an outdated cable (no, "Cat7" does not mean a thing if it is "raw cable". The cable is ok, but the plugs are still cat 5 type. "real cat7" plugs are minimal longer and sit tight in the card/switch) 2) missing flow control. 2.5 does not really exist. its a marketing gag to sell "not so good" productions of PHYs. 3) Many switches and cards do not know about it, they treat the connection as 10G and without proper flow control, they try to send date at the time slots that are "not good" for 2.5G. This data is lost and will be repeated later on. The more is to send, the more will go wrong and there will be a buffer jam soon. This looks like the up and downs in your diagram, it WILL finish someday but with ridiculous throughput Errors like this are hard to track and fix. Everytime you think you have done it, it might show up again sooner or later. The only real fix I know of is to throw away all that RJ45 stuff, get cards and switches with SFP+ cages and use fiber. I did this 2 years ago, no problems since then... (update: there is something else that might help, I did this 2months ago and no problems since then: there are now real cheap 5 or 8 port 2.5G switches with one or two SFP+ cages on the market. They do the 10G pacing internally quite well. The normal lan runs with 10G and clients of these switches run with 2,5G constant speed. Here the 60€ were well invested to speed up some NUCs that cannot have new LAN cards) The thing is the issues also happened when I had it connected via a 1gig link. I also looked into the server logs and it doesn't appear that any issues are logged until I unplug the ethernet. It's like the connection is still there and going but my router just loses the device listing. And sadly getting a different NIC isn't exactly on the table. My MOBO is an ITX one and the PCIe slot is being consumed by my HBA card. The only way to fit one in is using a M.2 NVMe to PCIe adaptor and I haven't been able to find a reputable one that I can trust to work while poking around for one today.
December 31, 20232 yr Author Okay so I've tried four different cables now across four different ports between my MOCA adaptor and Router itself, along with forcing it into 1Gbps mode. And it's all failed. It also does seem to get worse the longer the server is on, with the time inbetween network crashes being closer together.
January 1, 20242 yr Community Expert You did not mention the MOCA adapter before! This is a serious source for problems.
January 1, 20242 yr Author I didn't think to mention it originally since the issues still persisted When hooked directly into my router. Every attempt failing, I've come to the conclusion that the chip has to be just straight faulty in someway and am just going to RMA the board. Which hopefully solves the issues since any other course of action is going to be more expensive than what I originally budgeted. Edited January 1, 20242 yr by Sakura_Nohana Additional Info
January 1, 20242 yr Community Expert hmmm... strange... since your problem looks to be "inbound" to unraid (from your diagram), I wonder, who is the sending side and why it maybe gets panicing.. It is unlikely to find anything in the UNRAID's logs because it is the innocent victim Edited January 1, 20242 yr by MAM59
January 1, 20242 yr Author The sending side is my old server, I've been moving stuff over from it in preparation to shut it down. And the issue has to be on the new server side as the crash happens even when completely unloaded too. It just happens a bit quicker when loaded.
January 4, 20242 yr Author Solution Well, I got a replacement motherboard and swapped it out. I've been stress testing it now for 5 hours and not a single issue from before has cropped up so it looks like I just had a bad network chip.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.