Kaldek Posted April 17, 2023 Share Posted April 17, 2023 (edited) So folks my unRAID server has an Intel server NIC in it with dual XFP ports and runs the ixgbe driver. Over a few uNRAID revisions now, there are random instances where the NIC driver dies (and yes, I uploaded the diagnostics files when it happened). There's really been no solution to this issue, and it of course usually happens when I'm overseas for work and it's 12+ hours before I can remote in to PikVM and bounce the server to get the network back up. I got a little tired of this, so here's the result of me and ChatGPT4 having a bit of a discussion about how to deal with it automatically. The solution documented here is a pair of User scripts, one a "Ping Watchdog" and the other a supervisor for the watchdog (in case the watchdog dies). Here is the watchdog script, called "ping_watchdog" and is running a ping against a pair of IP addresses (my core switch and my gateway) so that one single IP being down doesn't trigger the reboot. Sometimes my gateway is off the air for a while as I do some arcane Mikrotik things on it. This script is set to run at the first array start only and stays running forever (unless it dies for some reason; see the supervisor script below). #!/bin/bash TARGET_IP_1="192.168.0.254" # Replace with your gateway router IP address TARGET_IP_2="192.168.0.240" # Replace with your core switch IP address PING_COUNT=4 # Number of pings to send PING_TIMEOUT=5 # Timeout for each ping in seconds FAIL_THRESHOLD=30 # Number of consecutive failed ping checks before restarting CHECK_INTERVAL=60 # Time in seconds between ping checks failed_pings=0 ping_check() { local target_ip=$1 ping -c $PING_COUNT -W $PING_TIMEOUT $target_ip >/dev/null 2>&1 return $? } while true; do ping_check $TARGET_IP_1 result1=$? ping_check $TARGET_IP_2 result2=$? if [ $result1 -ne 0 ] && [ $result2 -ne 0 ]; then failed_pings=$((failed_pings + 1)) echo "$(date) - Pings to $TARGET_IP_1 and $TARGET_IP_2 failed. Consecutive failed ping checks: $failed_pings" else failed_pings=0 fi if [ $failed_pings -ge $FAIL_THRESHOLD ]; then echo "$(date) - Restarting unRAID server due to $FAIL_THRESHOLD consecutive failed ping checks" /usr/local/sbin/powerdown -r exit 0 fi sleep $CHECK_INTERVAL # Wait for the specified time before the next iteration done Next is the "ping_watchdog_supervisor" which is set to run every hour. If the first script is seen as not running, it kicks it off again. #!/bin/bash PING_WATCHDOG_SCRIPT="ping_watchdog" pid=$(pgrep -f "^/bin/bash.*/tmp/user.scripts/tmpScripts/$PING_WATCHDOG_SCRIPT") if [ -z "$pid" ]; then echo "$(date) - Ping watchdog script not running. Restarting..." /usr/local/emhttp/plugins/user.scripts/start_script.sh "$PING_WATCHDOG_SCRIPT" else echo "$(date) - Ping watchdog script running with PID $pid" fi Coupled together, these two scripts ensure that if my NIC ever dies, uNRAID performs a clean reboot without hurting the array. Edited April 17, 2023 by Kaldek Quote Link to comment
MrGrey Posted April 17, 2023 Share Posted April 17, 2023 1 hour ago, Kaldek said: here's the result of me and ChatGPT4 having a bit of a discussion about I've heard a lot about ChatGPT. Is it true? MrGrey. Quote Link to comment
Kaldek Posted April 17, 2023 Author Share Posted April 17, 2023 17 minutes ago, MrGrey said: I've heard a lot about ChatGPT. Is it true? Depends what the question is but yes, it's amazing for turning ideas into code. I don't trust it 100% of course, and I'm using it to give me ideas and examples. I get to bypass all the grief I'd get by asking a human. In my view, these generative AI models are necessary. The amount of time we all burn on questions when the respondee of the question has their own emotions around the question and how they want to answer it is utterly insane. ChatGPT in particular has a very simple, concise and objective response to everything asked of it. The trick is knowing how to phrase your questions, hence the term "prompt engineering". I'm much better at this than I ever was at "Google-Fu". Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.