Jump to content

Network interfaces keep going down


cinereus

Recommended Posts

I have two interfaces eth0 and eth1.

 

eth0 goes directly to the router and provide and internet connection

eth1 goes directly to my PC

 

Last night I noticed that eth0 had dropped. I unplugged and replugged the ethernet cable and all was fine.

I had to do this one more time later in the evening.

When I woke up this morning I found that eth0 had dropped to 100 Mbps.

I unplugged and replugged and it reconnected instantly at 1000 Mbps.

 

A couple of hours later I noticed that eth1 wasn't working. I unplugged and replugged which did nothing.

Now eth1 is flickering between connecting and disconnecting. It stays connected for long enough to go to "unidentified network" but doesn't have time to resolve an IP before it says "interface down" and my PC says "not connected".

 

As I was trying to diagnose this it seems eth0 has gone down completely and won't connect at all. This means I can't even get diagnostics.

 

On the outside it seems that the onboard ethernet is just dying! Is this possible? Could I see anything in diagnostics to show whether this is the case?

 

The hardware is Supermicro SuperChassis CSE-826 with a X9DRH-7TF V1.02 motherboard.

Link to comment

Clearer diagnostics after reboot:

 

And here's the system log:

 

 Sep 29 14:29:34 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/ TX
 Sep 29 14:29:34 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
Sep 29 14:29:40 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX
 Sep 29 14:29:40 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
 Sep 29 14:29:46 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX
 Sep 29 14:29:47 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
 Sep 29 14:29:56 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX
 Sep 29 14:29:56 fs kernel: ixgbe 0000: 05:00.0 eth0: NIC Link is Down
 Sep 29 14:30:20 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX
 Sep 29 14:30:20 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
Sep 29 14:30:35 fs emhttpd: cmd: /usr/local/emhttp/plugins/user.scripts/showLog.php dropbox and drive sync
 Sep 29 14:30:44 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX
 Sep 29 14:30:45 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
 Sep 29 14:30:57 fs kernel: ixgbe 0000:05: 00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX
 Sep 29 14:30:57 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
 Sep 29 14:31:14 fs kernel: ixgbe 0000 :05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX

 

fs-diagnostics-20220929-1434.zip

Link to comment

For eth0 looks more like a connection problem, try replacing the cable or using a different switch/router

eth1 crashed, only a reboot will fix that:

Sep 29 05:36:42 fs kernel: DMAR: DRHD: handling fault status reg 2
Sep 29 05:36:42 fs kernel: DMAR: [DMA Read] Request device [05:00.1] PASID ffffffff fault addr f4a19000 [fault reason 06] PTE Read access is not set
Sep 29 05:36:42 fs kernel: DMAR: [DMA Read] Request device [05:00.1] PASID ffffffff fault addr f9d54000 [fault reason 06] PTE Read access is not set

 

Unrelated but the server is detecting RAM errors, should fix that.

  • Like 1
Link to comment
12 hours ago, JorgeB said:

For eth0 looks more like a connection problem, try replacing the cable or using a different switch/router

eth1 crashed, only a reboot will fix that:

Sep 29 05:36:42 fs kernel: DMAR: DRHD: handling fault status reg 2
Sep 29 05:36:42 fs kernel: DMAR: [DMA Read] Request device [05:00.1] PASID ffffffff fault addr f4a19000 [fault reason 06] PTE Read access is not set
Sep 29 05:36:42 fs kernel: DMAR: [DMA Read] Request device [05:00.1] PASID ffffffff fault addr f9d54000 [fault reason 06] PTE Read access is not set

 

Unrelated but the server is detecting RAM errors, should fix that.

Thanks. I have now rebooted and will see how it goes. Where do you see the eth1 crash?

Link to comment
8 minutes ago, JorgeB said:

It crashed again:

 

Oct  1 13:44:26 fs kernel: DMAR: DRHD: handling fault status reg 2

 

Try updating to v6.10.3 or v6.11.0 since from v6.10.3 DMA remapping is no longer used, and that appears to be what's causing the problem.

I don't get why it would have worked for years with no issue before though?

 

syslog keeps repeating this what does it mean?

 

Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: Detected Tx Unit Hang
Oct 3 17:56:14 fs kernel: Tx Queue <20>
Oct 3 17:56:14 fs kernel: TDH, TDT <0>, <2>
Oct 3 17:56:14 fs kernel: next_to_use <2>
Oct 3 17:56:14 fs kernel: next_to_clean <0>
Oct 3 17:56:14 fs kernel: tx_buffer_info[next_to_clean]
Oct 3 17:56:14 fs kernel: time_stamp <115869ae9>
Oct 3 17:56:14 fs kernel: jiffies <11586a9c0>
Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: tx hang 17692 detected on queue 20, resetting adapter
Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: initiating reset due to tx timeout
Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: Reset adapter
Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: RXDCTL.ENABLE for one or more queues not cleared within the polling period
Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: TXDCTL.ENABLE for one or more queues not cleared within the polling period
Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1: master disable timed out
Oct 3 17:56:18 fs kernel: ixgbe 0000:05:00.1 eth1: NIC Link is Up 1 Gbps, Flow Control: RX/TX
Oct 3 17:56:24 fs kernel: ixgbe 0000:05:00.1 eth1: Detected Tx Unit Hang

 

Link to comment
3 minutes ago, cinereus said:

I don't get why it would have worked for years with no issue before though?

NICs might be going bad, if one goes it's expected that the other goes at the same time, but it won't hurt to upgrade to see if there's any difference, you should upgrade anyway since v6.9.32 is quite old now.

 

4 minutes ago, cinereus said:

syslog keeps repeating this what does it mean?

That's because of the earlier crash.

Link to comment
7 minutes ago, JorgeB said:

NICs might be going bad, if one goes it's expected that the other goes at the same time, but it won't hurt to upgrade to see if there's any difference, you should upgrade anyway since v6.9.32 is quite old now.

 

That's because of the earlier crash.

Thanks. If my NICs are "going bad" is there anything I can do? I think I'd need to replace the whole motherboard?!

Link to comment
8 hours ago, JorgeB said:

It crashed again:

 

Oct  1 13:44:26 fs kernel: DMAR: DRHD: handling fault status reg 2

 

Try updating to v6.10.3 or v6.11.0 since from v6.10.3 DMA remapping is no longer used, and that appears to be what's causing the problem.

After installing the update I'm getting this on eth0:

Oct  4 01:57:51 fs kernel: tun: Universal TUN/TAP device driver, 1.6
Oct  4 01:57:54 fs  ntpd[1467]: Listen normally on 4 eth0 192.168.0.250:123
Oct  4 01:57:54 fs  ntpd[1467]: Listen normally on 5 eth0 [fe80::ec4:7aff:fe59:76ee%5]:123
Oct  4 01:57:54 fs  ntpd[1467]: new interface(s) found: waking up resolver
Oct  4 02:04:12 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
Oct  4 02:04:13 fs  ntpd[1467]: Deleting interface #4 eth0, 192.168.0.250#123, interface stats: received=40, sent=40, dropped=0, active_time=379 secs
Oct  4 02:04:13 fs  ntpd[1467]: 216.239.35.0 local addr 192.168.0.250 -> <null>
Oct  4 02:04:13 fs  ntpd[1467]: 216.239.35.4 local addr 192.168.0.250 -> <null>
Oct  4 02:04:13 fs  ntpd[1467]: 216.239.35.8 local addr 192.168.0.250 -> <null>
Oct  4 02:04:13 fs  ntpd[1467]: 216.239.35.12 local addr 192.168.0.250 -> <null>
Oct  4 02:04:13 fs  ntpd[1467]: Deleting interface #5 eth0, fe80::ec4:7aff:fe59:76ee%5#123, interface stats: received=0, sent=0, dropped=0, active_time=379 secs
Oct  4 02:04:15 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 1 Gbps, Flow Control: RX/TX
Oct  4 02:04:16 fs  ntpd[1467]: Listen normally on 6 eth0 192.168.0.250:123
Oct  4 02:04:16 fs  ntpd[1467]: Listen normally on 7 eth0 [fe80::ec4:7aff:fe59:76ee%5]:123
Oct  4 02:04:16 fs  ntpd[1467]: new interface(s) found: waking up resolver
Oct  4 02:04:22 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
Oct  4 02:04:24 fs  ntpd[1467]: Deleting interface #6 eth0, 192.168.0.250#123, interface stats: received=4, sent=4, dropped=0, active_time=8 secs
Oct  4 02:04:24 fs  ntpd[1467]: 216.239.35.0 local addr 192.168.0.250 -> <null>
Oct  4 02:04:24 fs  ntpd[1467]: 216.239.35.4 local addr 192.168.0.250 -> <null>
Oct  4 02:04:24 fs  ntpd[1467]: 216.239.35.8 local addr 192.168.0.250 -> <null>
Oct  4 02:04:24 fs  ntpd[1467]: 216.239.35.12 local addr 192.168.0.250 -> <null>
Oct  4 02:04:24 fs  ntpd[1467]: Deleting interface #7 eth0, fe80::ec4:7aff:fe59:76ee%5#123, interface stats: received=0, sent=0, dropped=0, active_time=8 secs
Oct  4 02:04:25 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 1 Gbps, Flow Control: RX/TX
Oct  4 02:04:27 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
Oct  4 02:04:35 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX
Oct  4 02:04:36 fs  ntpd[1467]: Listen normally on 8 eth0 192.168.0.250:123
Oct  4 02:04:36 fs  ntpd[1467]: Listen normally on 9 eth0 [fe80::ec4:7aff:fe59:76ee%5]:123
Oct  4 02:04:36 fs  ntpd[1467]: new interface(s) found: waking up resolver
Oct  4 02:05:30 fs  vnstatd[6518]: Detected bandwidth limit for "eth0" changed from 1000 Mbit to 100 Mbit.

 

Link to comment
On 10/4/2022 at 8:19 AM, JorgeB said:

Looks like the NICs really have a problem, assuming you replaced/swapped cables before.

eth0 has been stuck at 100 Mbps since my reboot. Just bought a new cable for eth0 and now get this:

 

 

Oct  7 13:55:33 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
Oct  7 13:55:35 fs  ntpd[1467]: Deleting interface #36 eth0, 192.168.0.250#123, interface stats: received=696, sent=696, dropped=0, active_time=176867 secs
Oct  7 13:55:35 fs  ntpd[1467]: 216.239.35.0 local addr 192.168.0.250 -> <null>
Oct  7 13:55:35 fs  ntpd[1467]: 216.239.35.4 local addr 192.168.0.250 -> <null>
Oct  7 13:55:35 fs  ntpd[1467]: 216.239.35.8 local addr 192.168.0.250 -> <null>
Oct  7 13:55:35 fs  ntpd[1467]: 216.239.35.12 local addr 192.168.0.250 -> <null>
Oct  7 13:55:35 fs  ntpd[1467]: Deleting interface #37 eth0, fe80::ec4:7aff:fe59:76ee%5#123, interface stats: received=0, sent=0, dropped=0, active_time=176867 secs
Oct  7 14:07:37 fs  ntpd[1467]: Listen normally on 38 eth0 192.168.0.250:123
Oct  7 14:07:37 fs  ntpd[1467]: Listen normally on 39 eth0 [fe80::ec4:7aff:fe59:76ee%5]:123
Oct  7 14:07:37 fs  ntpd[1467]: new interface(s) found: waking up resolver
Oct  7 14:07:40 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down
Oct  7 14:07:42 fs  ntpd[1467]: Deleting interface #38 eth0, 192.168.0.250#123, interface stats: received=2, sent=8, dropped=0, active_time=5 secs
Oct  7 14:07:42 fs  ntpd[1467]: 216.239.35.0 local addr 192.168.0.250 -> <null>
Oct  7 14:07:42 fs  ntpd[1467]: 216.239.35.4 local addr 192.168.0.250 -> <null>
Oct  7 14:07:42 fs  ntpd[1467]: 216.239.35.8 local addr 192.168.0.250 -> <null>
Oct  7 14:07:42 fs  ntpd[1467]: 216.239.35.12 local addr 192.168.0.250 -> <null>
Oct  7 14:07:42 fs  ntpd[1467]: Deleting interface #39 eth0, fe80::ec4:7aff:fe59:76ee%5#123, interface stats: received=0, sent=0, dropped=0, active_time=5 secs
Oct  7 14:12:39 fs  ntpd[1467]: no peer for too long, server running free now

 

Dashboard says "interface down" with brand new cable. Swapping to old cable also says "interface down" not even the 100 Mbps I had before.

 

What gives?!

Link to comment

Curiouser and curiouser.

 

eth0 refused to connect to my router after trying multiple cables. However, after moving the router connection to eth1, eth0 now works fine with other connections. Several hours of testing later And the cable that was previously limited to 100 Mbps is now very happy at 1 Gbps.

 

Not sure whether diagnostics say anything sensible about this?

fs-diagnostics-20221007-1603.zip

Edited by cinereus
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...