e1000e Detected Hardware Unit Hang

sota · April 22, 2022

Apr 21 23:14:44 Tigger kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Apr 21 23:14:44 Tigger kernel:  TDH                  <2d>
Apr 21 23:14:44 Tigger kernel:  TDT                  <44>
Apr 21 23:14:44 Tigger kernel:  next_to_use          <44>
Apr 21 23:14:44 Tigger kernel:  next_to_clean        <2c>
Apr 21 23:14:44 Tigger kernel: buffer_info[next_to_clean]:
Apr 21 23:14:44 Tigger kernel:  time_stamp           <13a8c3651>
Apr 21 23:14:44 Tigger kernel:  next_to_watch        <2d>
Apr 21 23:14:44 Tigger kernel:  jiffies              <13a8c3f00>
Apr 21 23:14:44 Tigger kernel:  next_to_watch.status <0>
Apr 21 23:14:44 Tigger kernel: MAC Status             <80083>
Apr 21 23:14:44 Tigger kernel: PHY Status             <796d>
Apr 21 23:14:44 Tigger kernel: PHY 1000BASE-T Status  <3800>
Apr 21 23:14:44 Tigger kernel: PHY Extended Status    <3000>
Apr 21 23:14:44 Tigger kernel: PCI Status             <10>
Apr 21 23:14:46 Tigger kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Apr 21 23:14:46 Tigger kernel:  TDH                  <2d>
Apr 21 23:14:46 Tigger kernel:  TDT                  <44>
Apr 21 23:14:46 Tigger kernel:  next_to_use          <44>
Apr 21 23:14:46 Tigger kernel:  next_to_clean        <2c>
Apr 21 23:14:46 Tigger kernel: buffer_info[next_to_clean]:
Apr 21 23:14:46 Tigger kernel:  time_stamp           <13a8c3651>
Apr 21 23:14:46 Tigger kernel:  next_to_watch        <2d>
Apr 21 23:14:46 Tigger kernel:  jiffies              <13a8c46c0>
Apr 21 23:14:46 Tigger kernel:  next_to_watch.status <0>
Apr 21 23:14:46 Tigger kernel: MAC Status             <80083>
Apr 21 23:14:46 Tigger kernel: PHY Status             <796d>
Apr 21 23:14:46 Tigger kernel: PHY 1000BASE-T Status  <3800>
Apr 21 23:14:46 Tigger kernel: PHY Extended Status    <3000>
Apr 21 23:14:46 Tigger kernel: PCI Status             <10>
Apr 21 23:14:48 Tigger kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Apr 21 23:14:48 Tigger kernel:  TDH                  <2d>
Apr 21 23:14:48 Tigger kernel:  TDT                  <44>
Apr 21 23:14:48 Tigger kernel:  next_to_use          <44>
Apr 21 23:14:48 Tigger kernel:  next_to_clean        <2c>
Apr 21 23:14:48 Tigger kernel: buffer_info[next_to_clean]:
Apr 21 23:14:48 Tigger kernel:  time_stamp           <13a8c3651>
Apr 21 23:14:48 Tigger kernel:  next_to_watch        <2d>
Apr 21 23:14:48 Tigger kernel:  jiffies              <13a8c4ec0>
Apr 21 23:14:48 Tigger kernel:  next_to_watch.status <0>
Apr 21 23:14:48 Tigger kernel: MAC Status             <80083>
Apr 21 23:14:48 Tigger kernel: PHY Status             <796d>
Apr 21 23:14:48 Tigger kernel: PHY 1000BASE-T Status  <3800>
Apr 21 23:14:48 Tigger kernel: PHY Extended Status    <3000>
Apr 21 23:14:48 Tigger kernel: PCI Status             <10>
Apr 21 23:14:49 Tigger kernel: e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
Apr 21 23:14:50 Tigger kernel: bond0: (slave eth0): link status definitely down, disabling slave
Apr 21 23:14:50 Tigger kernel: device eth0 left promiscuous mode
Apr 21 23:14:50 Tigger kernel: bond0: now running without any active interface!
Apr 21 23:14:50 Tigger kernel: br0: port 1(bond0) entered disabled state
Apr 21 23:14:53 Tigger kernel: e1000e 0000:00:19.0 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Apr 21 23:14:53 Tigger kernel: bond0: (slave eth0): link status definitely up, 1000 Mbps full duplex
Apr 21 23:14:53 Tigger kernel: bond0: (slave eth0): making interface the new active one
Apr 21 23:14:53 Tigger kernel: device eth0 entered promiscuous mode
Apr 21 23:14:53 Tigger kernel: bond0: active interface up!
Apr 21 23:14:53 Tigger kernel: br0: port 1(bond0) entered blocking state
Apr 21 23:14:53 Tigger kernel: br0: port 1(bond0) entered forwarding state

Started having this problem recently. Not physically at the machine right now, but does this look like a faulty card, bad port on the switch, or a bad cable? Or, something else software related.

Machine is basically a glorified file cabinet, running a single Windows 7 x64 VM for SageTV (haven't gotten the docker to work to my liking, and i'm under a time crunch with a failing physical SageTV server and a 4/25 "hard" cut over date for Cablevision switching to encrypted.)

Everything seemed to be working fine until a couple days ago, when while I was remoted into the VM (anydesk) I kept and keep getting disconnected. Finally tried to watch the log, only to discover /var/log was full. I caught the above after the most recent disconnect.

diagnostics and syslog.1 are attached.

syslog.zip tigger-diagnostics-20220421-2322.zip

aarontry · June 18, 2022

I am having the same issue. I used to run 6.9.2 on my ThinCcentre M700 tiny for almost half a year without any issues. This box is mainly for my home network. But since I upgraded to 6.10.3 I've had 3 hangs in the last a few days.

smeehrrr · August 10, 2022

I started hitting this today after upgrading to 6.10.3. Attaching diagnostics just in case that's helpful.

nova-diagnostics-20220809-1753.zip

3limin4tor · August 18, 2022

On 8/10/2022 at 2:57 AM, smeehrrr said:

I started hitting this today after upgrading to 6.10.3. Attaching diagnostics just in case that's helpful.

nova-diagnostics-20220809-1753.zip 193.51 kB · 0 downloads

I have exactly the same issue... worked great for almost a year with unraid 6.9.2. Then upgraded to unraid 6.10.3 and now my servers 4 ethernet ports (Intel Nic Pro/1000) suddenly go down one after another until bond0 is completely gone and ssh connections drop. Not even the terminal is responding fine afterwards anymore (local keyboard and monitor installed due to this error). Typing in reboot -f doesn't work neither... i have to force shutdown the server with longpressing the pwr button.

smeehrrr · August 18, 2022

For what it's worth, running

 ethtool -K eth0 tso off

made the problem go away for me. I don't know what I'm giving up in terms for performance but it was a good enough temporary workaround to get all my files copied over.

I did not have any of your additional issues with terminal, etc, just a momentary hitch that was long enough to interrupt file copies but not long enough to make ssh drop.

3limin4tor · August 24, 2022

Just a quick update from my side. After try and error for many more hours, i had no idea left what could possibly cause this issue. I tried everything from rollback, to fresh installation of unraid (even bought new flash stick in case my one was faulty) runned a memtest for about 3 and a half days, where each test passed without a single error etc.
After trying almost everything i felt really lost because i had no further ideas what could possibly cause this issue. Then i googled a bit for similar issues but not specific for unraid, more for general computer systems (Googled for the symptoms). At some point i found many interesting articles about "half faulty" PSU's that still power the computer / server but do not deliver a constant / correct voltage anymore. I tested this theory by letting the server (unraid 6.10.3) run in safe mode without any dockers or other services, just plain unraid. The server then stayed up for 3 days without any crash. As soon as i rebooted back to normal mode and started some load intensive docker containers, the server crashed again. Then i borrowed an unused power supply from a friend and temporary installed it in the server... and voila! it was working like it did all the month before. So i bought a new power supply, installed it in the server and since then it is up and running again for 4 days. on Monday and yesterday i was intensively stress testing the server. No more crashes, everything back to normal. As the server was still powering on and booting normal, the psu was actually the last thing i thought about. Anyway, really strange issue that almost brought me to the madhouse / nuthouse, but i could finally solve it.

PS: Sorry for my english it is not my motherlanguage

engin33rh3r3 · January 14

On 8/18/2022 at 5:15 PM, smeehrrr said:
For what it's worth, running
 ethtool -K eth0 tso off
made the problem go away for me. I don't know what I'm giving up in terms for performance but it was a good enough temporary workaround to get all my files copied over.

I did not have any of your additional issues with terminal, etc, just a momentary hitch that was long enough to interrupt file copies but not long enough to make ssh drop.

Thank you immensely to the @smeehrrr for sharing this crucial workaround. For several years, I've been grappling with persistent network issues on my Unraid server, pushing me to the brink of frustration. Implementing the suggested command has finally brought relief. This problem began around 2018-2019, and my system had been functioning smoothly prior to that. The resolution provided here has been a game changer, and I'm deeply grateful for the shared knowledge and support.

I'm curious about the underlying mechanics of this fix. Is there a more permanent solution that can be implemented? I'm keen to understand why this particular command was effective, especially considering the issue's persistence from 2018-2019 to the present day (Unraid Version: 6.12.6).

Here are some specifics of my setup for context:

Server Model: LENOVO ThinkServer TS440

BIOS Version: FBKTDIAUS (Dated: Thu 16 Sep 2021)

Processor: Intel® Xeon® CPU E3-1275L v3 @ 2.70GHz

Any additional insights or suggestions for a long-term resolution would be greatly appreciated!

e1000e Detected Hardware Unit Hang

Recommended Posts

sota

Link to comment

aarontry

Link to comment

smeehrrr

Link to comment

3limin4tor

Link to comment

smeehrrr

Link to comment

3limin4tor

Link to comment

engin33rh3r3

Link to comment

Join the conversation