February 17, 20251 yr i started having this issue 4-5 days ago and tought it must be the psu, so i ordered a new one but still having issue. so i changed all the sata cables too, no luck. i have no idea how to make this stop, and i feel i need to fix it soon because i am trying to rebuild a disk that got "disabled" tag after this started. disk is tested and have no faults so it is not the issue either. but would really like to have this sorted and disk rebuilt before something else fail and kills my array completely.. i attached what i think is the diagnostics that is supposed to help, i red trough it but cant find the fault, only it keeps say something about something have no pulse.. please help tower-diagnostics-20250217-1854.zip
February 17, 20251 yr Community Expert Server rebooting by itself is almost always a hardware issue, if you have multiple RAM sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.
February 17, 20251 yr Author 10 minutes ago, JorgeB said: Server rebooting by itself is almost always a hardware issue, if you have multiple RAM sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM. i have 8 x 16gb ddr4 multi-bit ecc so this would take forever as this can take days between the reboots. from the little i understood in the logs it did gracefully reboot, and it should also put ecc ram errors in the log if im correct? as a last resort i will just order new ram, but was hoping it was not it and if someone with log knowledge could have a look at it first 😇 i was crossing fingers for it being software/network or flash device error .. 🤞
February 17, 20251 yr Community Expert 26 minutes ago, gloory91 said: from the little i understood in the logs it did gracefully reboot This doesn't make sense with random reboots that you mentioned in the tile, was it you that initiated the reboots? Unraid doesn't do it on its own.
February 17, 20251 yr Author 1 minute ago, JorgeB said: This doesn't make sense with random reboots that you mentioned in the tile, was it you that initiated the reboots? Unraid doesn't do it on its own. no, did not touch it, i try to let it be in peace for the rebuild. here are some of the lines in the logs that i mean: Feb 17 18:22:05 Tower rc.local_shutdown: Stopping emhttpd Feb 17 18:22:05 Tower rc.local_shutdown: /usr/local/sbin/emhttp stop Feb 17 18:22:05 Tower emhttp: Stopping web services... Feb 17 18:22:05 Tower rc.nginx: Stopping Nginx server daemon gracefully... Feb 17 18:22:05 Tower dhcpcd[2362]: br0: carrier lost Feb 17 18:22:05 Tower kernel: bnx2x: [bnx2x_timer:5810(eth1)]MFW seems hanged: drv_pulse (0x217) != mcp_pulse (0x7fff)
February 17, 20251 yr Community Expert Something or someone initiated that, it can be a bad power button, or a cat pressing it.
February 17, 20251 yr Author i have it standing on a table next to my computer desk, no people or animals in my apartment. i am looking at the box then it happen, so 100% not that. i am totally confused, that is why i think it must be something else.. it would be weird that it doing this with nothing happening on the server, only the disk rebuild is running and nobody or nothing is even near it ..
February 17, 20251 yr Author this is that it spams before the reboot as far as i can see Feb 17 18:16:05 Tower kernel: bnx2x: [bnx2x_timer:5810(eth1)]MFW seems hanged: drv_pulse (0xb7) != mcp_pulse (0x7fff) Feb 17 18:16:05 Tower kernel: bnx2x: [bnx2x_acquire_hw_lock:2022(eth1)]lock_status 0xffffffff resource_bit 0x1 Feb 17 18:16:05 Tower kernel: bnx2x 0000:03:00.0 eth1: MDC/MDIO access timeout Feb 17 18:16:05 Tower kernel: bnx2x: [bnx2x_timer:5810(eth2)]MFW seems hanged: drv_pulse (0xb4) != mcp_pulse (0x7fff) Feb 17 18:16:05 Tower kernel: bnx2x 0000:03:00.0 eth1: MDC/MDIO access timeout Feb 17 18:16:05 Tower kernel: bnx2x: [bnx2x_acquire_hw_lock:2022(eth2)]lock_status 0xffffffff resource_bit 0x1 Feb 17 18:16:05 Tower kernel: bnx2x 0000:03:00.1 eth2: MDC/MDIO access timeout
February 17, 20251 yr Community Expert That looks like a NIC related problem, but that would initiate a shutdown or reboot.
February 17, 20251 yr Author 3 minutes ago, JorgeB said: That looks like a NIC related problem, but that would initiate a shutdown or reboot. i do have a dual 10G pcie card in the server that i currently not using for anything, but it has been there forever. could this card suddenly start making this problem, is that what you say? or did you mean it would not initiate shutdown or reboot?
February 17, 20251 yr Community Expert It should not, but if you are not using them, blacklist the driver and reboot: echo "blacklist bnx2x" > /boot/config/modprobe.d/bnx2x.conf
February 17, 20251 yr Author Just now, JorgeB said: It should not, but if you are not using them, blacklist the driver and reboot: echo "blacklist bnx2x" > /boot/config/modprobe.d/bnx2x.conf i could just remove the card next time it reboots, or i reboot it if im able to rebuild the disk on this time. also, here is what looks like a problem, it is eth0 which is the port on my motherboard that is in use.. again, it does this all by itself at a random time ### [PREVIOUS LINE REPEATED 1 TIMES] ### Feb 17 18:20:44 Tower kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: Feb 17 18:20:44 Tower kernel: TDH <17> Feb 17 18:20:44 Tower kernel: TDT <90> Feb 17 18:20:44 Tower kernel: next_to_use <90> Feb 17 18:20:44 Tower kernel: next_to_clean <17> Feb 17 18:20:44 Tower kernel: buffer_info[next_to_clean]: Feb 17 18:20:44 Tower kernel: time_stamp <101e941e7> Feb 17 18:20:44 Tower kernel: next_to_watch <18> Feb 17 18:20:44 Tower kernel: jiffies <10204bbc4> Feb 17 18:20:44 Tower kernel: next_to_watch.status <0> Feb 17 18:20:44 Tower kernel: MAC Status <80083> Feb 17 18:20:44 Tower kernel: PHY Status <796d> Feb 17 18:20:44 Tower kernel: PHY 1000BASE-T Status <3c00> Feb 17 18:20:44 Tower kernel: PHY Extended Status <3000> Feb 17 18:20:44 Tower kernel: PCI Status <10> Feb 17 18:20:44 Tower kernel: e1000e 0000:00:19.0 eth0: NIC Link is Down Feb 17 18:20:44 Tower kernel: BTRFS info (device loop3): last unmount of filesystem 8e17c87d-08a7-4c10-8d21-50858b08015b Feb 17 18:20:44 Tower kernel: bond0: (slave eth0): link status definitely down, disabling slave Feb 17 18:20:44 Tower kernel: e1000e 0000:00:19.0 eth0: left promiscuous mode Feb 17 18:20:44 Tower kernel: e1000e 0000:00:19.0 eth0: left allmulticast mode Feb 17 18:20:44 Tower kernel: bond0: now running without any active interface! Feb 17 18:20:44 Tower kernel: br0: port 1(bond0) entered disabled state Feb 17 18:20:45 Tower rc.docker: Unraid managed containers stopped. Feb 17 18:20:45 Tower rc.docker: Stopping network... Feb 17 18:20:45 Tower kernel: bnx2x: [bnx2x_timer:5810(eth1)]MFW seems hanged: drv_pulse (0x1c8) != mcp_pulse (0x7fff) Feb 17 18:20:45 Tower rc.docker: Network stopped. Feb 17 18:20:45 Tower rc.docker: Stopping Docker daemon...
February 17, 20251 yr Community Expert You could try removing the S3 Sleep plugin to see if that helps. The plugin has been know to kick in when it should not.
February 17, 20251 yr Author 1 minute ago, itimpi said: You could try removing the S3 Sleep plugin to see if that helps. The plugin has been know to kick in when it should not. oh didnt know i had it, but i uninstalled it now and will see if it does anything!
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.