Sakura_Nohana Posted December 30, 2023 Share Posted December 30, 2023 (edited) Hi All, I just built a new server using UNRAID yesterday, and I've been running into a major issue with my network connection. Ranging from an hour to 5 hours, the connection to my network will completely dropout where I then have to unplug and replug my ethernet cable to the server in. And along with that, when it is connected and working performance seems to be all over the place. For a while speeds will be perfectly fine and then out of nowhere speeds will be bouncing crazily between ~40Mbps to 650Mbps while transferring large files for an hour. I've installed the RTL8125 Drivers, rebooted multiple times, made sure the r8169 drivers are black listed, switching from 1Gbps to 2.5Gbps connection, different ethernet cables, entire different switches, and connecting directly into my router. The only change from doing any of this is possibly the time in-between dropouts increasing. Along with all of this the server stays on the listed devices on my router for a minute or two before dropping along with the lights on the ethernet ports stay lit along with flashing and the port in my router shows connected and at the right speed. Any help on solving this would be greatly appreciated, thanks! Hardware/Specs CPU: AMD R5 5600G MOBO: Gigabyte B550I Aorus Pro AX RAM: 1x 8GB DDR4 2400 HBA Card: LSI SAS2008 PCI-Express Fusion-MPT SAS-2 HDD: 3x ST20000NM007D Boot Drive: SAMSUNG FIT Plus unraidserver-diagnostics-20231230-1618.zip Edited December 30, 2023 by Sakura_Nohana Had some information that I got wrong that I updated as the problem happened again Quote Link to comment
MAM59 Posted December 31, 2023 Share Posted December 31, 2023 (edited) There can be several reasons for this, you have to check them out one by one. I try to tell them in the "most likely" sequence: 1) bad cable / plug (try to wobble the plugs, if they can be moved, you have an outdated cable (no, "Cat7" does not mean a thing if it is "raw cable". The cable is ok, but the plugs are still cat 5 type. "real cat7" plugs are minimal longer and sit tight in the card/switch) 2) missing flow control. 2.5 does not really exist. its a marketing gag to sell "not so good" productions of PHYs. 3) Many switches and cards do not know about it, they treat the connection as 10G and without proper flow control, they try to send date at the time slots that are "not good" for 2.5G. This data is lost and will be repeated later on. The more is to send, the more will go wrong and there will be a buffer jam soon. This looks like the up and downs in your diagram, it WILL finish someday but with ridiculous throughput Errors like this are hard to track and fix. Everytime you think you have done it, it might show up again sooner or later. The only real fix I know of is to throw away all that RJ45 stuff, get cards and switches with SFP+ cages and use fiber. I did this 2 years ago, no problems since then... (update: there is something else that might help, I did this 2months ago and no problems since then: there are now real cheap 5 or 8 port 2.5G switches with one or two SFP+ cages on the market. They do the 10G pacing internally quite well. The normal lan runs with 10G and clients of these switches run with 2,5G constant speed. Here the 60€ were well invested to speed up some NUCs that cannot have new LAN cards) Edited December 31, 2023 by MAM59 Quote Link to comment
Sakura_Nohana Posted December 31, 2023 Author Share Posted December 31, 2023 10 hours ago, MAM59 said: There can be several reasons for this, you have to check them out one by one. I try to tell them in the "most likely" sequence: 1) bad cable / plug (try to wobble the plugs, if they can be moved, you have an outdated cable (no, "Cat7" does not mean a thing if it is "raw cable". The cable is ok, but the plugs are still cat 5 type. "real cat7" plugs are minimal longer and sit tight in the card/switch) 2) missing flow control. 2.5 does not really exist. its a marketing gag to sell "not so good" productions of PHYs. 3) Many switches and cards do not know about it, they treat the connection as 10G and without proper flow control, they try to send date at the time slots that are "not good" for 2.5G. This data is lost and will be repeated later on. The more is to send, the more will go wrong and there will be a buffer jam soon. This looks like the up and downs in your diagram, it WILL finish someday but with ridiculous throughput Errors like this are hard to track and fix. Everytime you think you have done it, it might show up again sooner or later. The only real fix I know of is to throw away all that RJ45 stuff, get cards and switches with SFP+ cages and use fiber. I did this 2 years ago, no problems since then... (update: there is something else that might help, I did this 2months ago and no problems since then: there are now real cheap 5 or 8 port 2.5G switches with one or two SFP+ cages on the market. They do the 10G pacing internally quite well. The normal lan runs with 10G and clients of these switches run with 2,5G constant speed. Here the 60€ were well invested to speed up some NUCs that cannot have new LAN cards) The thing is the issues also happened when I had it connected via a 1gig link. I also looked into the server logs and it doesn't appear that any issues are logged until I unplug the ethernet. It's like the connection is still there and going but my router just loses the device listing. And sadly getting a different NIC isn't exactly on the table. My MOBO is an ITX one and the PCIe slot is being consumed by my HBA card. The only way to fit one in is using a M.2 NVMe to PCIe adaptor and I haven't been able to find a reputable one that I can trust to work while poking around for one today. Quote Link to comment
Sakura_Nohana Posted December 31, 2023 Author Share Posted December 31, 2023 Okay so I've tried four different cables now across four different ports between my MOCA adaptor and Router itself, along with forcing it into 1Gbps mode. And it's all failed. It also does seem to get worse the longer the server is on, with the time inbetween network crashes being closer together. Quote Link to comment
MAM59 Posted January 1 Share Posted January 1 You did not mention the MOCA adapter before! This is a serious source for problems. Quote Link to comment
Sakura_Nohana Posted January 1 Author Share Posted January 1 (edited) I didn't think to mention it originally since the issues still persisted When hooked directly into my router. Every attempt failing, I've come to the conclusion that the chip has to be just straight faulty in someway and am just going to RMA the board. Which hopefully solves the issues since any other course of action is going to be more expensive than what I originally budgeted. Edited January 1 by Sakura_Nohana Additional Info Quote Link to comment
MAM59 Posted January 1 Share Posted January 1 (edited) hmmm... strange... since your problem looks to be "inbound" to unraid (from your diagram), I wonder, who is the sending side and why it maybe gets panicing.. It is unlikely to find anything in the UNRAID's logs because it is the innocent victim Edited January 1 by MAM59 Quote Link to comment
Sakura_Nohana Posted January 1 Author Share Posted January 1 The sending side is my old server, I've been moving stuff over from it in preparation to shut it down. And the issue has to be on the new server side as the crash happens even when completely unloaded too. It just happens a bit quicker when loaded. Quote Link to comment
Solution Sakura_Nohana Posted January 4 Author Solution Share Posted January 4 Well, I got a replacement motherboard and swapped it out. I've been stress testing it now for 5 hours and not a single issue from before has cropped up so it looks like I just had a bad network chip. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.