server suddenly stops responding and reboot at random, help :(

February 17, 20251 yr

i started having this issue 4-5 days ago and tought it must be the psu, so i ordered a new one but still having issue. so i changed all the sata cables too, no luck. i have no idea how to make this stop, and i feel i need to fix it soon because i am trying to rebuild a disk that got "disabled" tag after this started. disk is tested and have no faults so it is not the issue either. but would really like to have this sorted and disk rebuilt before something else fail and kills my array completely.. i attached what i think is the diagnostics that is supposed to help, i red trough it but cant find the fault, only it keeps say something about something have no pulse.. please help

tower-diagnostics-20250217-1854.zip

Quote

February 17, 20251 yr

Community Expert

Server rebooting by itself is almost always a hardware issue, if you have multiple RAM sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

Quote

February 17, 20251 yr

Author

10 minutes ago, JorgeB said:

Server rebooting by itself is almost always a hardware issue, if you have multiple RAM sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

i have 8 x 16gb ddr4 multi-bit ecc so this would take forever as this can take days between the reboots. from the little i understood in the logs it did gracefully reboot, and it should also put ecc ram errors in the log if im correct? as a last resort i will just order new ram, but was hoping it was not it and if someone with log knowledge could have a look at it first 😇 i was crossing fingers for it being software/network or flash device error .. 🤞

Quote

February 17, 20251 yr

Community Expert

26 minutes ago, gloory91 said:

from the little i understood in the logs it did gracefully reboot

This doesn't make sense with random reboots that you mentioned in the tile, was it you that initiated the reboots? Unraid doesn't do it on its own.

Quote

February 17, 20251 yr

Author

1 minute ago, JorgeB said:

This doesn't make sense with random reboots that you mentioned in the tile, was it you that initiated the reboots? Unraid doesn't do it on its own.

no, did not touch it, i try to let it be in peace for the rebuild. here are some of the lines in the logs that i mean:

Feb 17 18:22:05 Tower rc.local_shutdown: Stopping emhttpd
Feb 17 18:22:05 Tower rc.local_shutdown: /usr/local/sbin/emhttp stop
Feb 17 18:22:05 Tower emhttp: Stopping web services...
Feb 17 18:22:05 Tower rc.nginx: Stopping Nginx server daemon gracefully...
Feb 17 18:22:05 Tower dhcpcd[2362]: br0: carrier lost
Feb 17 18:22:05 Tower kernel: bnx2x: [bnx2x_timer:5810(eth1)]MFW seems hanged: drv_pulse (0x217) != mcp_pulse (0x7fff)

Quote

February 17, 20251 yr

Community Expert

Something or someone initiated that, it can be a bad power button, or a cat pressing it.

Quote

February 17, 20251 yr

Author

i have it standing on a table next to my computer desk, no people or animals in my apartment. i am looking at the box then it happen, so 100% not that. i am totally confused, that is why i think it must be something else.. it would be weird that it doing this with nothing happening on the server, only the disk rebuild is running and nobody or nothing is even near it ..

Quote

February 17, 20251 yr

Author

this is that it spams before the reboot as far as i can see

Feb 17 18:16:05 Tower kernel: bnx2x: [bnx2x_timer:5810(eth1)]MFW seems hanged: drv_pulse (0xb7) != mcp_pulse (0x7fff)
Feb 17 18:16:05 Tower kernel: bnx2x: [bnx2x_acquire_hw_lock:2022(eth1)]lock_status 0xffffffff resource_bit 0x1
Feb 17 18:16:05 Tower kernel: bnx2x 0000:03:00.0 eth1: MDC/MDIO access timeout
Feb 17 18:16:05 Tower kernel: bnx2x: [bnx2x_timer:5810(eth2)]MFW seems hanged: drv_pulse (0xb4) != mcp_pulse (0x7fff)
Feb 17 18:16:05 Tower kernel: bnx2x 0000:03:00.0 eth1: MDC/MDIO access timeout
Feb 17 18:16:05 Tower kernel: bnx2x: [bnx2x_acquire_hw_lock:2022(eth2)]lock_status 0xffffffff resource_bit 0x1
Feb 17 18:16:05 Tower kernel: bnx2x 0000:03:00.1 eth2: MDC/MDIO access timeout

Quote

February 17, 20251 yr

Community Expert

That looks like a NIC related problem, but that would initiate a shutdown or reboot.

Quote

February 17, 20251 yr

Author

3 minutes ago, JorgeB said:

That looks like a NIC related problem, but that would initiate a shutdown or reboot.

i do have a dual 10G pcie card in the server that i currently not using for anything, but it has been there forever. could this card suddenly start making this problem, is that what you say? or did you mean it would not initiate shutdown or reboot?

Quote

February 17, 20251 yr

Community Expert

It should not, but if you are not using them, blacklist the driver and reboot:

echo "blacklist bnx2x" > /boot/config/modprobe.d/bnx2x.conf

Quote

February 17, 20251 yr

Author

Just now, JorgeB said:
It should not, but if you are not using them, blacklist the driver and reboot:
echo "blacklist bnx2x" > /boot/config/modprobe.d/bnx2x.conf

i could just remove the card next time it reboots, or i reboot it if im able to rebuild the disk on this time. also, here is what looks like a problem, it is eth0 which is the port on my motherboard that is in use.. again, it does this all by itself at a random time

### [PREVIOUS LINE REPEATED 1 TIMES] ###
Feb 17 18:20:44 Tower kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Feb 17 18:20:44 Tower kernel: TDH <17>
Feb 17 18:20:44 Tower kernel: TDT <90>
Feb 17 18:20:44 Tower kernel: next_to_use <90>
Feb 17 18:20:44 Tower kernel: next_to_clean <17>
Feb 17 18:20:44 Tower kernel: buffer_info[next_to_clean]:
Feb 17 18:20:44 Tower kernel: time_stamp <101e941e7>
Feb 17 18:20:44 Tower kernel: next_to_watch <18>
Feb 17 18:20:44 Tower kernel: jiffies <10204bbc4>
Feb 17 18:20:44 Tower kernel: next_to_watch.status <0>
Feb 17 18:20:44 Tower kernel: MAC Status <80083>
Feb 17 18:20:44 Tower kernel: PHY Status <796d>
Feb 17 18:20:44 Tower kernel: PHY 1000BASE-T Status <3c00>
Feb 17 18:20:44 Tower kernel: PHY Extended Status <3000>
Feb 17 18:20:44 Tower kernel: PCI Status <10>
Feb 17 18:20:44 Tower kernel: e1000e 0000:00:19.0 eth0: NIC Link is Down
Feb 17 18:20:44 Tower kernel: BTRFS info (device loop3): last unmount of filesystem 8e17c87d-08a7-4c10-8d21-50858b08015b
Feb 17 18:20:44 Tower kernel: bond0: (slave eth0): link status definitely down, disabling slave
Feb 17 18:20:44 Tower kernel: e1000e 0000:00:19.0 eth0: left promiscuous mode
Feb 17 18:20:44 Tower kernel: e1000e 0000:00:19.0 eth0: left allmulticast mode
Feb 17 18:20:44 Tower kernel: bond0: now running without any active interface!
Feb 17 18:20:44 Tower kernel: br0: port 1(bond0) entered disabled state
Feb 17 18:20:45 Tower rc.docker: Unraid managed containers stopped.
Feb 17 18:20:45 Tower rc.docker: Stopping network...
Feb 17 18:20:45 Tower kernel: bnx2x: [bnx2x_timer:5810(eth1)]MFW seems hanged: drv_pulse (0x1c8) != mcp_pulse (0x7fff)
Feb 17 18:20:45 Tower rc.docker: Network stopped.
Feb 17 18:20:45 Tower rc.docker: Stopping Docker daemon...

Quote

February 17, 20251 yr

Community Expert

You could try removing the S3 Sleep plugin to see if that helps. The plugin has been know to kick in when it should not.

Quote

February 17, 20251 yr

Author

1 minute ago, itimpi said:

You could try removing the S3 Sleep plugin to see if that helps. The plugin has been know to kick in when it should not.

oh didnt know i had it, but i uninstalled it now and will see if it does anything!

Quote

server suddenly stops responding and reboot at random, help :(

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)