cinereus Posted September 29, 2022 Share Posted September 29, 2022 I have two interfaces eth0 and eth1. eth0 goes directly to the router and provide and internet connection eth1 goes directly to my PC Last night I noticed that eth0 had dropped. I unplugged and replugged the ethernet cable and all was fine. I had to do this one more time later in the evening. When I woke up this morning I found that eth0 had dropped to 100 Mbps. I unplugged and replugged and it reconnected instantly at 1000 Mbps. A couple of hours later I noticed that eth1 wasn't working. I unplugged and replugged which did nothing. Now eth1 is flickering between connecting and disconnecting. It stays connected for long enough to go to "unidentified network" but doesn't have time to resolve an IP before it says "interface down" and my PC says "not connected". As I was trying to diagnose this it seems eth0 has gone down completely and won't connect at all. This means I can't even get diagnostics. On the outside it seems that the onboard ethernet is just dying! Is this possible? Could I see anything in diagnostics to show whether this is the case? The hardware is Supermicro SuperChassis CSE-826 with a X9DRH-7TF V1.02 motherboard. Quote Link to comment
cinereus Posted September 29, 2022 Author Share Posted September 29, 2022 Managed to log in long enough to get diagnostics: fs-diagnostics-20220929-1325.zip Quote Link to comment
cinereus Posted September 29, 2022 Author Share Posted September 29, 2022 Clearer diagnostics after reboot: And here's the system log: Sep 29 14:29:34 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/ TX Sep 29 14:29:34 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Sep 29 14:29:40 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX Sep 29 14:29:40 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Sep 29 14:29:46 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX Sep 29 14:29:47 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Sep 29 14:29:56 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX Sep 29 14:29:56 fs kernel: ixgbe 0000: 05:00.0 eth0: NIC Link is Down Sep 29 14:30:20 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX Sep 29 14:30:20 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Sep 29 14:30:35 fs emhttpd: cmd: /usr/local/emhttp/plugins/user.scripts/showLog.php dropbox and drive sync Sep 29 14:30:44 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX Sep 29 14:30:45 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Sep 29 14:30:57 fs kernel: ixgbe 0000:05: 00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX Sep 29 14:30:57 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Sep 29 14:31:14 fs kernel: ixgbe 0000 :05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX fs-diagnostics-20220929-1434.zip Quote Link to comment
JorgeB Posted September 29, 2022 Share Posted September 29, 2022 For eth0 looks more like a connection problem, try replacing the cable or using a different switch/router eth1 crashed, only a reboot will fix that: Sep 29 05:36:42 fs kernel: DMAR: DRHD: handling fault status reg 2 Sep 29 05:36:42 fs kernel: DMAR: [DMA Read] Request device [05:00.1] PASID ffffffff fault addr f4a19000 [fault reason 06] PTE Read access is not set Sep 29 05:36:42 fs kernel: DMAR: [DMA Read] Request device [05:00.1] PASID ffffffff fault addr f9d54000 [fault reason 06] PTE Read access is not set Unrelated but the server is detecting RAM errors, should fix that. 1 Quote Link to comment
cinereus Posted September 30, 2022 Author Share Posted September 30, 2022 12 hours ago, JorgeB said: For eth0 looks more like a connection problem, try replacing the cable or using a different switch/router eth1 crashed, only a reboot will fix that: Sep 29 05:36:42 fs kernel: DMAR: DRHD: handling fault status reg 2 Sep 29 05:36:42 fs kernel: DMAR: [DMA Read] Request device [05:00.1] PASID ffffffff fault addr f4a19000 [fault reason 06] PTE Read access is not set Sep 29 05:36:42 fs kernel: DMAR: [DMA Read] Request device [05:00.1] PASID ffffffff fault addr f9d54000 [fault reason 06] PTE Read access is not set Unrelated but the server is detecting RAM errors, should fix that. Thanks. I have now rebooted and will see how it goes. Where do you see the eth1 crash? Quote Link to comment
JorgeB Posted September 30, 2022 Share Posted September 30, 2022 It starts with the log snippet posted above, device 05:00.1 is eth1. Quote Link to comment
cinereus Posted October 3, 2022 Author Share Posted October 3, 2022 eth1 is going up and down constantly again even after a reboot. Any idea what the issue is that's causing it to crash every other day? fs-diagnostics-20221003-1735.zip Quote Link to comment
JorgeB Posted October 3, 2022 Share Posted October 3, 2022 It crashed again: Oct 1 13:44:26 fs kernel: DMAR: DRHD: handling fault status reg 2 Try updating to v6.10.3 or v6.11.0 since from v6.10.3 DMA remapping is no longer used, and that appears to be what's causing the problem. Quote Link to comment
cinereus Posted October 3, 2022 Author Share Posted October 3, 2022 8 minutes ago, JorgeB said: It crashed again: Oct 1 13:44:26 fs kernel: DMAR: DRHD: handling fault status reg 2 Try updating to v6.10.3 or v6.11.0 since from v6.10.3 DMA remapping is no longer used, and that appears to be what's causing the problem. I don't get why it would have worked for years with no issue before though? syslog keeps repeating this what does it mean? Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: Detected Tx Unit Hang Oct 3 17:56:14 fs kernel: Tx Queue <20> Oct 3 17:56:14 fs kernel: TDH, TDT <0>, <2> Oct 3 17:56:14 fs kernel: next_to_use <2> Oct 3 17:56:14 fs kernel: next_to_clean <0> Oct 3 17:56:14 fs kernel: tx_buffer_info[next_to_clean] Oct 3 17:56:14 fs kernel: time_stamp <115869ae9> Oct 3 17:56:14 fs kernel: jiffies <11586a9c0> Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: tx hang 17692 detected on queue 20, resetting adapter Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: initiating reset due to tx timeout Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: Reset adapter Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: RXDCTL.ENABLE for one or more queues not cleared within the polling period Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1 eth1: TXDCTL.ENABLE for one or more queues not cleared within the polling period Oct 3 17:56:14 fs kernel: ixgbe 0000:05:00.1: master disable timed out Oct 3 17:56:18 fs kernel: ixgbe 0000:05:00.1 eth1: NIC Link is Up 1 Gbps, Flow Control: RX/TX Oct 3 17:56:24 fs kernel: ixgbe 0000:05:00.1 eth1: Detected Tx Unit Hang Quote Link to comment
JorgeB Posted October 3, 2022 Share Posted October 3, 2022 3 minutes ago, cinereus said: I don't get why it would have worked for years with no issue before though? NICs might be going bad, if one goes it's expected that the other goes at the same time, but it won't hurt to upgrade to see if there's any difference, you should upgrade anyway since v6.9.32 is quite old now. 4 minutes ago, cinereus said: syslog keeps repeating this what does it mean? That's because of the earlier crash. Quote Link to comment
cinereus Posted October 3, 2022 Author Share Posted October 3, 2022 7 minutes ago, JorgeB said: NICs might be going bad, if one goes it's expected that the other goes at the same time, but it won't hurt to upgrade to see if there's any difference, you should upgrade anyway since v6.9.32 is quite old now. That's because of the earlier crash. Thanks. If my NICs are "going bad" is there anything I can do? I think I'd need to replace the whole motherboard?! Quote Link to comment
JorgeB Posted October 3, 2022 Share Posted October 3, 2022 You can install add-on NICs. Quote Link to comment
cinereus Posted October 4, 2022 Author Share Posted October 4, 2022 8 hours ago, JorgeB said: It crashed again: Oct 1 13:44:26 fs kernel: DMAR: DRHD: handling fault status reg 2 Try updating to v6.10.3 or v6.11.0 since from v6.10.3 DMA remapping is no longer used, and that appears to be what's causing the problem. After installing the update I'm getting this on eth0: Oct 4 01:57:51 fs kernel: tun: Universal TUN/TAP device driver, 1.6 Oct 4 01:57:54 fs ntpd[1467]: Listen normally on 4 eth0 192.168.0.250:123 Oct 4 01:57:54 fs ntpd[1467]: Listen normally on 5 eth0 [fe80::ec4:7aff:fe59:76ee%5]:123 Oct 4 01:57:54 fs ntpd[1467]: new interface(s) found: waking up resolver Oct 4 02:04:12 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Oct 4 02:04:13 fs ntpd[1467]: Deleting interface #4 eth0, 192.168.0.250#123, interface stats: received=40, sent=40, dropped=0, active_time=379 secs Oct 4 02:04:13 fs ntpd[1467]: 216.239.35.0 local addr 192.168.0.250 -> <null> Oct 4 02:04:13 fs ntpd[1467]: 216.239.35.4 local addr 192.168.0.250 -> <null> Oct 4 02:04:13 fs ntpd[1467]: 216.239.35.8 local addr 192.168.0.250 -> <null> Oct 4 02:04:13 fs ntpd[1467]: 216.239.35.12 local addr 192.168.0.250 -> <null> Oct 4 02:04:13 fs ntpd[1467]: Deleting interface #5 eth0, fe80::ec4:7aff:fe59:76ee%5#123, interface stats: received=0, sent=0, dropped=0, active_time=379 secs Oct 4 02:04:15 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 1 Gbps, Flow Control: RX/TX Oct 4 02:04:16 fs ntpd[1467]: Listen normally on 6 eth0 192.168.0.250:123 Oct 4 02:04:16 fs ntpd[1467]: Listen normally on 7 eth0 [fe80::ec4:7aff:fe59:76ee%5]:123 Oct 4 02:04:16 fs ntpd[1467]: new interface(s) found: waking up resolver Oct 4 02:04:22 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Oct 4 02:04:24 fs ntpd[1467]: Deleting interface #6 eth0, 192.168.0.250#123, interface stats: received=4, sent=4, dropped=0, active_time=8 secs Oct 4 02:04:24 fs ntpd[1467]: 216.239.35.0 local addr 192.168.0.250 -> <null> Oct 4 02:04:24 fs ntpd[1467]: 216.239.35.4 local addr 192.168.0.250 -> <null> Oct 4 02:04:24 fs ntpd[1467]: 216.239.35.8 local addr 192.168.0.250 -> <null> Oct 4 02:04:24 fs ntpd[1467]: 216.239.35.12 local addr 192.168.0.250 -> <null> Oct 4 02:04:24 fs ntpd[1467]: Deleting interface #7 eth0, fe80::ec4:7aff:fe59:76ee%5#123, interface stats: received=0, sent=0, dropped=0, active_time=8 secs Oct 4 02:04:25 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 1 Gbps, Flow Control: RX/TX Oct 4 02:04:27 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Oct 4 02:04:35 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Up 100 Mbps, Flow Control: RX/TX Oct 4 02:04:36 fs ntpd[1467]: Listen normally on 8 eth0 192.168.0.250:123 Oct 4 02:04:36 fs ntpd[1467]: Listen normally on 9 eth0 [fe80::ec4:7aff:fe59:76ee%5]:123 Oct 4 02:04:36 fs ntpd[1467]: new interface(s) found: waking up resolver Oct 4 02:05:30 fs vnstatd[6518]: Detected bandwidth limit for "eth0" changed from 1000 Mbit to 100 Mbit. Quote Link to comment
JorgeB Posted October 4, 2022 Share Posted October 4, 2022 Looks like the NICs really have a problem, assuming you replaced/swapped cables before. 1 Quote Link to comment
cinereus Posted October 7, 2022 Author Share Posted October 7, 2022 On 10/4/2022 at 8:19 AM, JorgeB said: Looks like the NICs really have a problem, assuming you replaced/swapped cables before. eth0 has been stuck at 100 Mbps since my reboot. Just bought a new cable for eth0 and now get this: Oct 7 13:55:33 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Oct 7 13:55:35 fs ntpd[1467]: Deleting interface #36 eth0, 192.168.0.250#123, interface stats: received=696, sent=696, dropped=0, active_time=176867 secs Oct 7 13:55:35 fs ntpd[1467]: 216.239.35.0 local addr 192.168.0.250 -> <null> Oct 7 13:55:35 fs ntpd[1467]: 216.239.35.4 local addr 192.168.0.250 -> <null> Oct 7 13:55:35 fs ntpd[1467]: 216.239.35.8 local addr 192.168.0.250 -> <null> Oct 7 13:55:35 fs ntpd[1467]: 216.239.35.12 local addr 192.168.0.250 -> <null> Oct 7 13:55:35 fs ntpd[1467]: Deleting interface #37 eth0, fe80::ec4:7aff:fe59:76ee%5#123, interface stats: received=0, sent=0, dropped=0, active_time=176867 secs Oct 7 14:07:37 fs ntpd[1467]: Listen normally on 38 eth0 192.168.0.250:123 Oct 7 14:07:37 fs ntpd[1467]: Listen normally on 39 eth0 [fe80::ec4:7aff:fe59:76ee%5]:123 Oct 7 14:07:37 fs ntpd[1467]: new interface(s) found: waking up resolver Oct 7 14:07:40 fs kernel: ixgbe 0000:05:00.0 eth0: NIC Link is Down Oct 7 14:07:42 fs ntpd[1467]: Deleting interface #38 eth0, 192.168.0.250#123, interface stats: received=2, sent=8, dropped=0, active_time=5 secs Oct 7 14:07:42 fs ntpd[1467]: 216.239.35.0 local addr 192.168.0.250 -> <null> Oct 7 14:07:42 fs ntpd[1467]: 216.239.35.4 local addr 192.168.0.250 -> <null> Oct 7 14:07:42 fs ntpd[1467]: 216.239.35.8 local addr 192.168.0.250 -> <null> Oct 7 14:07:42 fs ntpd[1467]: 216.239.35.12 local addr 192.168.0.250 -> <null> Oct 7 14:07:42 fs ntpd[1467]: Deleting interface #39 eth0, fe80::ec4:7aff:fe59:76ee%5#123, interface stats: received=0, sent=0, dropped=0, active_time=5 secs Oct 7 14:12:39 fs ntpd[1467]: no peer for too long, server running free now Dashboard says "interface down" with brand new cable. Swapping to old cable also says "interface down" not even the 100 Mbps I had before. What gives?! Quote Link to comment
cinereus Posted October 7, 2022 Author Share Posted October 7, 2022 Here are diagnostics after a fresh reboot where eth0 is still not working. fs-diagnostics-20221007-1425.zip Quote Link to comment
JorgeB Posted October 7, 2022 Share Posted October 7, 2022 Did we need believe the problem were the NICs? Quote Link to comment
cinereus Posted October 7, 2022 Author Share Posted October 7, 2022 16 minutes ago, JorgeB said: Did we need believe the problem were the NICs? It's hard to tell. They ha e been working solidly for the last couple of days. I don't understand these errors in the syslog. Quote Link to comment
cinereus Posted October 7, 2022 Author Share Posted October 7, 2022 (edited) Curiouser and curiouser. eth0 refused to connect to my router after trying multiple cables. However, after moving the router connection to eth1, eth0 now works fine with other connections. Several hours of testing later And the cable that was previously limited to 100 Mbps is now very happy at 1 Gbps. Not sure whether diagnostics say anything sensible about this? fs-diagnostics-20221007-1603.zip Edited October 7, 2022 by cinereus Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.