Ned Posted March 28, 2016 Share Posted March 28, 2016 Over the last few days the ethernet connection keeps dropping every few hours on my unraid box. Lights on the ethernet port and switch are still flashing but I cannot ping unraid. I need to hard reset (physically unplug and then reconnect the ethernet cable) the link to fix it, then it comes right back up. I've tried moving to another switch but issue persists. This is all I see in the log. Does anyone know what could be causing this? I'm on the latest version of unraid. Thanks! Mar 28 11:21:07 Tower kernel: r8169 0000:03:00.0 eth0: link down Mar 28 11:21:07 Tower kernel: br0: port 1(eth0) entered disabled state Mar 28 11:21:17 Tower kernel: r8169 0000:03:00.0 eth0: link up Mar 28 11:21:17 Tower kernel: br0: port 1(eth0) entered forwarding state Mar 28 11:21:17 Tower kernel: br0: port 1(eth0) entered forwarding state Mar 28 11:51:00 Tower kernel: hrtimer: interrupt took 20937 ns Mar 28 13:24:17 Tower kernel: r8169 0000:03:00.0 eth0: link down Mar 28 13:24:17 Tower kernel: br0: port 1(eth0) entered disabled state Mar 28 13:24:21 Tower kernel: r8169 0000:03:00.0 eth0: link up Mar 28 13:24:21 Tower kernel: br0: port 1(eth0) entered forwarding state Mar 28 13:24:21 Tower kernel: br0: port 1(eth0) entered forwarding state Quote Link to comment
RobJ Posted March 28, 2016 Share Posted March 28, 2016 Without diagnostics, it's hard to give a complete answer, but just from the above, you will probably be better off installing an Intel NIC (disabling the onboard Realtek). That's a really long interrupt wait. The Intel will offload some of the CPU workload. Quote Link to comment
Ned Posted March 28, 2016 Author Share Posted March 28, 2016 Thanks... I was doing some CPU intensive tasks on the box during the time that this has been happening... you think that could have something to do with it given that this is an onboard NIC? Quote Link to comment
MSattler Posted March 28, 2016 Share Posted March 28, 2016 Is irqbalance enabled? Years ago I ran into issues like this on xenserver because irqbalance was disabled, and a single threaded application would make core 0 very busy, and then cause network issues. Enabling irqbalance would then let any cores deal with network requests. Quote Link to comment
RobJ Posted March 28, 2016 Share Posted March 28, 2016 Is irqbalance enabled? Years ago I ran into issues like this on xenserver because irqbalance was disabled, and a single threaded application would make core 0 very busy, and then cause network issues. Enabling irqbalance would then let any cores deal with network requests. You would have to direct this to Tom. I haven't seen any clues as to how it's set. Quote Link to comment
Ned Posted March 28, 2016 Author Share Posted March 28, 2016 I was just going to ask how I check/set that! It just happened again, even with the system running normal now (CPU utilization that is). It's only been a couple of hours since my last hard reset of the link. I've been running the server for years off this onboard NIC and never had any issues. Could it have gone bad? If I were to put in a separate NIC to test, do I just install it, power up, connect to network and it should automatically work? Quote Link to comment
RobJ Posted March 28, 2016 Share Posted March 28, 2016 Thanks... I was doing some CPU intensive tasks on the box during the time that this has been happening... you think that could have something to do with it given that this is an onboard NIC? I do. Onboard NIC's require the CPU's constant attention, the Intel needs some attention but not near as much (that's my understanding any way). It just happened again, even with the system running normal now (CPU utilization that is). It's only been a couple of hours since my last hard reset of the link. I've been running the server for years off this onboard NIC and never had any issues. Could it have gone bad? Realtek chipsets do go bad. But about the only way I know to diagnose for sure is replace it and see if there are any more issues. Many users have done that here, and not seen any more issues. If I were to put in a separate NIC to test, do I just install it, power up, connect to network and it should automatically work? Basically yes, but make sure you first disable the onboard NIC too, in the BIOS settings. Quote Link to comment
Ned Posted March 28, 2016 Author Share Posted March 28, 2016 Ok thanks, I will do that! Appreciate your help. Quote Link to comment
Ned Posted March 28, 2016 Author Share Posted March 28, 2016 Should I go with the Intel PRO/1000 CT adapter? The CT model is the PCIe version. The server models are very expensive and I don't need dual ports, etc. Quote Link to comment
Frank1940 Posted March 28, 2016 Share Posted March 28, 2016 Should I go with the Intel PRO/1000 CT adapter? The CT model is the PCIe version. The server models are very expensive and I don't need dual ports, etc. This is the one that I use. (Only complaint that I have with is that it has a lot of 'receive dropped packets' BUT those don't cause any problem! They are only generated when I am streaming BluRay ISO's to my Netgear 550 Media Players.) http://www.amazon.com/Intel-Gigabit-Network-Adapter-EXPI9301CTBLK/dp/B001CY0P7G?ie=UTF8&psc=1&redirect=true&ref_=oh_aui_detailpage_o00_s00 Quote Link to comment
Ned Posted March 28, 2016 Author Share Posted March 28, 2016 Yep, that's the one. Thanks! Quote Link to comment
Ned Posted March 31, 2016 Author Share Posted March 31, 2016 Well I disabled the onboard NIC and replaced it with the Intel NIC and it happened again just now. The link goes down and I cannot even ping the box. All I see in the log is the following: Mar 31 10:01:28 Tower kernel: e1000e: eth0 NIC Link is Down Mar 31 10:01:28 Tower kernel: br0: port 1(eth0) entered disabled state Mar 31 10:01:32 Tower kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Mar 31 10:01:32 Tower kernel: br0: port 1(eth0) entered forwarding state Mar 31 10:01:32 Tower kernel: br0: port 1(eth0) entered forwarding state Disconnect and reconnect the ethernet cable fixed the problem as usual. How do I troubleshoot this problem further? Quote Link to comment
Frank1940 Posted March 31, 2016 Share Posted March 31, 2016 Try googling 'port 1(eth0) entered forwarding state' When I did that I got a number of results that appear to be related to your problem. You should also get a diagnostics file to upload in a new post. If your can't get into the server any another way, hookup a monitor and keyboard and use the combination to log into your server. From the command line, type diagnostics and the file will be somewhere on your flash drive. (You can also shutdown your server by typing powerdown while logged in.) Quote Link to comment
Ned Posted March 31, 2016 Author Share Posted March 31, 2016 Diagnostics file attached... I took a quick look through it but it doesn't seem to show anything. Hopefully someone else might be able to find something in it? towerlog.zip Quote Link to comment
Ned Posted March 31, 2016 Author Share Posted March 31, 2016 Now that I think of it, these issues started to happen after I recently added a second (windows 10) vm to my system. The other VM is windows 7 and has been running fine for a long time. I'm not having any issues with either VM so not sure if the addition of the new VM has anything to do with this. Are there some commands I can execute locally on the command line when the connection drops to try and figure out what's going on? Quote Link to comment
morgish Posted July 25, 2016 Share Posted July 25, 2016 Having the exact same issue. Did you find a solution Ned? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.