-
Pikvm monitoring and powercycling script
I have a Pikvm attached to my Unraid server, I wanted to see if I could make pikvm powercycle the server if it becomes unresponsive. I asked ChatGPT to help me make a script, I've tested it a little bit, and it seems to get the job done. I wanted to share it if anyone else is interested. It is running directly on my PiKVM, but in theory it could run on a different raspberry pi or similar. It uses ping, so it will work for anything, not just Unraid. I have very little knowledge of shell scripts, so I am not really able to review the script properly, but as I stated previously, it seems to get the job done. Here is the post from ChatGPT: Disclaimer: This script and service were generated and refined with help from ChatGPT. Always review and test scripts before using them in production to ensure they suit your specific environment. 🖥 PiKVM Server Monitor – Auto Power Cycle via systemd This solution allows your server to be automatically power-cycled via PiKVM if it becomes unresponsive to ping. It's ideal for headless or remote setups where physical access isn't always possible. ✅ Features Monitors server responsiveness via ping Tries soft power-off via PiKVM, with fallback to hard-off if needed Confirms power states before continuing Includes grace period logic to avoid false power cycles (e.g. during reboots) Waits for power state transitions instead of using fixed sleep times All settings configurable through systemd Environment variables 📂 Systemd Service File Save this in: /etc/systemd/system/pikvm-monitor.service [Unit] Description=PiKVM Server Monitor Script After=network-online.target Wants=network-online.target [Service] Type=simple ExecStart=/usr/local/bin/manage_server.sh Restart=always RestartSec=10 User=root Environment=PIKVM_USER=your_kvm_username Environment=PIKVM_PASS=your_kvm_password Environment=PIKVM_URL=https://<your-kvm-ip>/api Environment=TARGET_IP=<your-server-ip> Environment=MAX_RETRIES=5 Environment=RETRY_DELAY=10 Environment=TIMEOUT=5 Environment=WAIT_AFTER_ON_TIME=90 Environment=WAIT_AFTER_OFF_BEFORE_ON=30 Environment=SOFT_OFF_TIMEOUT=60 Environment=GRACE_PERIOD=180 [Install] WantedBy=multi-user.target 🔧 Shell Script Save this as: /usr/local/bin/manage_server.sh Make it executable: chmod +x /usr/local/bin/manage_server.sh #!/bin/bash : "${PIKVM_URL:?Missing PIKVM_URL}" : "${PIKVM_USER:?Missing PIKVM_USER}" : "${PIKVM_PASS:?Missing PIKVM_PASS}" : "${TARGET_IP:?Missing TARGET_IP}" : "${MAX_RETRIES:?Missing MAX_RETRIES}" : "${RETRY_DELAY:?Missing RETRY_DELAY}" : "${TIMEOUT:?Missing TIMEOUT}" : "${WAIT_AFTER_OFF_BEFORE_ON:?Missing WAIT_AFTER_OFF_BEFORE_ON}" : "${WAIT_AFTER_ON_TIME:?Missing WAIT_AFTER_ON_TIME}" : "${SOFT_OFF_TIMEOUT:?Missing SOFT_OFF_TIMEOUT}" : "${GRACE_PERIOD:?Missing GRACE_PERIOD}" POWER_CYCLES=0 LAST_SEEN_ALIVE=$(date +%s) check_server_alive() { retries=0 while [ $retries -lt $MAX_RETRIES ]; do echo "Pinging $TARGET_IP (attempt $((retries + 1)) of $MAX_RETRIES)..." ping -c 1 -W $TIMEOUT "$TARGET_IP" > /dev/null 2>&1 if [ $? -eq 0 ]; then echo "Server at $TARGET_IP is alive!" LAST_SEEN_ALIVE=$(date +%s) return 0 fi retries=$((retries + 1)) echo "No response. Retrying in $RETRY_DELAY seconds..." sleep $RETRY_DELAY done echo "Server is unreachable after $MAX_RETRIES attempts." return 1 } check_power_state() { RESPONSE=$(curl -s -k -u "$PIKVM_USER:$PIKVM_PASS" "$PIKVM_URL/atx") if [ -z "$RESPONSE" ]; then echo "Error: No response from PiKVM"; return 2; fi if [[ "$RESPONSE" == *"Unauthorized"* ]]; then echo "Unauthorized - check credentials"; exit 1; fi POWER_STATE=$(echo "$RESPONSE" | grep -o '"power": [^,]*' | awk -F': ' '{print $2}' | tr -d '"') [[ "$POWER_STATE" == "true" ]] } wait_until_power_state_on() { echo "Waiting until power state is ON..." until check_power_state; do echo "Still OFF... waiting..." sleep 1 done echo "Power state confirmed ON." } perform_power_cycle() { echo "Sending soft power off..." curl -X POST -k -u "$PIKVM_USER:$PIKVM_PASS" "$PIKVM_URL/atx/power?action=off" echo "Waiting for power off (timeout: $SOFT_OFF_TIMEOUT sec)..." START=$(date +%s) while check_power_state; do NOW=$(date +%s) if [ $((NOW - START)) -ge "$SOFT_OFF_TIMEOUT" ]; then echo "Soft off timed out. Sending hard power off..." break fi echo "Still ON... waiting..." sleep 1 done if check_power_state; then echo "Sending HARD power off..." curl -X POST -k -u "$PIKVM_USER:$PIKVM_PASS" "$PIKVM_URL/atx/power?action=off_hard" until ! check_power_state; do echo "Waiting for hard off to complete..." sleep 1 done fi echo "Waiting $WAIT_AFTER_OFF_BEFORE_ON seconds before power on..." sleep "$WAIT_AFTER_OFF_BEFORE_ON" echo "Powering on..." curl -X POST -k -u "$PIKVM_USER:$PIKVM_PASS" "$PIKVM_URL/atx/power?action=on" wait_until_power_state_on echo "Waiting $WAIT_AFTER_ON_TIME seconds for server to boot..." sleep "$WAIT_AFTER_ON_TIME" } echo "Startup: Waiting for ping (max wait: $((GRACE_PERIOD * 2)) sec)..." START=$(date +%s) while true; do if check_server_alive; then echo "Server is alive. Starting monitor." break fi NOW=$(date +%s) ELAPSED=$((NOW - START)) if [ "$ELAPSED" -ge $((GRACE_PERIOD * 2)) ]; then echo "Startup timeout ($ELAPSED sec). Exiting." exit 1 fi echo "Still no response ($ELAPSED sec)..." sleep "$RETRY_DELAY" done while true; do echo "Checking server status..." if check_power_state; then if check_server_alive; then echo "Server responsive." else NOW=$(date +%s) SINCE_ALIVE=$((NOW - LAST_SEEN_ALIVE)) if [ "$SINCE_ALIVE" -lt "$GRACE_PERIOD" ]; then echo "Unresponsive, but within grace period ($SINCE_ALIVE/$GRACE_PERIOD sec)." else echo "Server unresponsive. Starting power cycle..." perform_power_cycle if check_server_alive; then echo "Server recovered after power cycle." POWER_CYCLES=0 else POWER_CYCLES=$((POWER_CYCLES + 1)) echo "Still unresponsive. Attempt $POWER_CYCLES" if [ "$POWER_CYCLES" -ge 3 ]; then echo "Failed after 3 power cycles. Stopping." break fi fi fi fi else echo "Server is OFF. Waiting to power on..." wait_until_power_state_on echo "Waiting $WAIT_AFTER_ON_TIME seconds for boot..." sleep "$WAIT_AFTER_ON_TIME" check_server_alive && LAST_SEEN_ALIVE=$(date +%s) fi echo "Sleeping 60 seconds before next check..." sleep 60 done 🚀 Enable the Service After saving both files, reload systemd and enable the service: sudo systemctl daemon-reload sudo systemctl enable --now pikvm-monitor.service 🪵 Monitor the Service Log To follow the log output live: journalctl -u pikvm-monitor.service -f Or to check logs after reboot: journalctl -u pikvm-monitor.service -b
-
ASM1166 SATA FIS-based switching?
Hello Can anyone confirm or deny that the ASM1166 support FIS-based switching? According to this: https://www.asmedia.com.tw/product/45aYq54sP8Qh7WH8/58dYQ8bxZ4UR9wG5 It seems to only support Command based switching. But google seems to suggest that some people believes it support FIS based switching. I know for a fact that JMB585 support it, but an extra port would be nice. I know that port multipliers are considered a big no for Unraid, but I have HDD dock that support up to 4 disks, that I sometimes use to copy to and from, so it's not part of my array. I have tried a ASM1064 controller that did not support FIS based switching and on a 4 disk HDD dock it slows the disk speed to about 30 MB/sec when accessing all disks. To my understanding FIS-based switching should speed that up as long as both ends support it.
-
9600-24i for Unraid
I have the same experience as Wody, I rarely if ever see any of the deeper C states on the CPU pkg, simply because alot of stuff is going on with dockers etc. If I stop the array it will go to C7 for like 40% of the time. I have a 9305-24i which does not support ASPM, but I have placed it in a PCIE slot connected to the PCH instead of the CPU. This will allow the CPU PKG to reach C7, at least on my W480 platform. I have a X710-DA2 in a CPU PCIE slot and this limits the PKG state to C7, otherwise I would see C8.
-
Observations and questions regarding power consumption
I have updated the original post with some new measurements
-
Trying 10GB LAN mainly for the tech tinkering aspect and some future proofing.
I've used a Mellanox X3 with a fiber connection to a mikrotik switch, it worked fine. Now I'm using an Intel X710-DA2 with a DAC for reduced power consumption. I purchased a Dell branded one on ebay and crossflashed it to an original Intel firmware and removed the restrictions on the modules it will use. TBH, I would go with the X710, even if it requires a little bit of extra work to make it accept all modules, just for the sake of reduced power consumption.
-
Observations and questions regarding power consumption
I believe it is accurate, I also have some smart plugs I could use for comparison, but this energy meter is somewhat expensive compared to smart plugs, and also endorsed by the national power companies, so I would assume it has some accuracy. In my Unraid server I have both APC smart UPS which is reasonably accurate and a Corsair AX760i which is definately not accurate. But I would like to wait until I have a good sense of what hardware to use before taking it apart. I am actually considering the switch you made, currently I have a 9305-24i(Actually a 9306-24i but it's the same hardware, power consumption wise) to ASM1166. I have one ASM1166 coming in soon to get some indication of what kind of power figures I would be looking at. On the downside I would need 4 x ASM1166 combined with a carrier board with some kind of PCIE packet switch as my board does not support Bifurcation. The packet switch will probably use a decent amount of power, so I would probably end up with something that consumes about the same as my current controller. The only upside would be that it would support ASPM. What kind of controller did you have before? And how much did it reduce your consumption to use ASM1166? The second thing I considers was a 9500-16i that supports ASPM and either replace some 4 TB drives for a 20TB to bring the sata interface need down to 16, or add an ASM1166 as an extra PCIE controller, giving me a total of 22 drives where I am currently in need of 19. But it all depends on the impact it will have on the total consumption. I have jumpers to disable onboard audio and ethernet ports, it did not make much of an impact, maybe 2W at most, I think the ethernet ports already go into low power mode when no cable is attached, so not a big impact, but everything counts.
-
Observations and questions regarding power consumption
I use a power meter on the primary side of the power supply. My point is, IF the difference between C3 and C8 is only 2-3 watts, then chasing PCIE devices with ASPM support for the sole purpose of the CPU PKG going beyond C3 is probably not worth it. But something else might be in play here, which is why I am asking.
-
Observations and questions regarding power consumption
Hello, I am in the process of upgrading my Unraid system to the following setup: Supermicro X12SAE Intel Xeon 1290P 4 x 32GB ECC RAM. Since I had a running system already and had plenty of time to conduct a few tests, I decided to see if I could determine what kind of impact adding components would have to the power consumption. The PSU used for the following is a Seasonic X-400 Fanless GOLD. A single 120mm fan is attached to cool the CPU. Mouse, Keyboard and USB flash drive attached. Monitor attached via display port I am booting off a Ubuntu Live USB, and letting it sit idle in the console RAM is Kingston Premier KSM32ED8/32HC ECC Udimm 1 x RAM sticks ~15.0W without ethernet cable in ~15.8W with ethernet cable in ~14.0W with ethernet cable in and powertop --auto-tune ~11.6W powertop --auto-tune and everything removed (USB flash drive, ethernet cable, keyboard, mouse and display port monitor) 2 x RAM sticks ~17.4W with ethernet cable in ~14.8W with ethernet cable in and powertop --auto-tune 3 x RAM sticks ~17.4W with ethernet cable in ~14.8W with ethernet cable in and powertop --auto-tune 4 x RAM sticks ~17.8W with ethernet cable in ~15.3W with ethernet cable in and powertop --auto-tune According to powertop all above scenarios are able to reach C8 on CPU pkg 4 x RAM sticks + JMB585 PCIE controller ~19.5W with ethernet cable in ~17.8W with ethernet cable in and powertop --auto-tune With JMB585 CPU PKG is only able to reach C3 lspci -vvv confirms that ASPM is not supported on that card, so that makes sense So a few questions: Why does adding RAM make such a little impact on consumption? Is it because I need to have some activity to make it actually consume power? Secondly, I was under the impression that only being able to reach C3 on CPU pkg would have a bigger impact on the power consumption? If what I've measured is true, it barely makes any sense to specifically go for PCIE cards with ASPM support in order to save power. (9500-16i/9600-24i vs 9305-24i) The measurements are a bit difficult as it is a gold power supply and is being loaded with like 5% making the efficiency drop. This may also be why I am not seeing alot of change when adding for example RAM sticks. I tried limiting the CPU pkg to C3 in the bios without adding PCIE cards: 4 x RAM sticks ~20W with ethernet cable in ~18.0W with ethernet cable in and powertop --auto-tune Confirmed that CPU PKG does not go below C3, it seems barely worth the effort if this is correct.. Update 21-04-2024: Some more measurements: All theese are with 4 RAM sticks, ethernet cable in and powertop --auto-tune: Intel X710-DA2 no cable attached: CPU PCI-E: 19.1W C7 confirmed PCH: 18.4W C8 confirmed LSI 9306-24i no cables attached: CPU PCI-E: 32.7W C3 confirmed PCH: 31.7W C8 confirmed LSI 9306-24i(PCH) + X710-DA2(CPU) no cables attached: 34.7W C7 confirmed 1 x Lexar NM790(M2 slots are attached to PCH): 15.9W C8 confirmed 2 x Lexar NM790(M2 slots are attached to PCH): 15.9W C8 confirmed LSPCI confirms that the NVME is present LSI 9306-24i(PCH) + X710-DA2(CPU) no cables attached + 2 x NM790(M2 slots are attached to PCH): 34.7W C7 confirmed So, as long as I stick to PCH PCI-E slots, then it does not matter if the PCIE device supports ASPM
-
Verdict on LCC and Seagate Exos
I have 3 x X20 20TB, they have been running for about 6 months and have LCC ranging between 600 and 800. I would assume your 10TBs may be an older generation? It looks like they made it less aggressive on X20. I haven't tampered with anything other than spinning disks down after an hour of inactivity, but two of them are my parity drives and they are in the same range as the one that's just a member of the array
-
9600-24i for Unraid
Thank you for investigating, this certainly helps me pick the correct hardware. For ASPM support I was considering a carrier board with a PCIE packet switch and 4 x ASM1166 controllers, but I think the sheer amount of connectors and components make the setup a bit janky and error prone. With your feedback I will definately put the 9600-24i into consideration. I am currently awaiting the new hardware and will conduct some experiments on how big an impact ASPM support will make on the total power consumption, and then make a decision.
-
9600-24i for Unraid
Thank you for the clarification on SATA 1, 2 and SAS 1, I did not know that. I don't think it will be an issue since all my drivers are SATA-3. My main concern now is ASPM support, so I hope someone can confirm/deny that.
-
9600-24i for Unraid
Hello, I am considering a 9600-24i for Unraid. I am going to use it purely for mechanical SATA drives. Are there any quirks or issues I should be aware of? Also, is anyone able to confirm that it supports ASPM and will allow the CPU PKG to go beyond C3 state?
-
Samsung 870 QVO for Lancache
I found 4TB QVO on sale and ended up purchasing two of them. I put them in RAIDZ-0(Why bother with parity for lancache?) Initital tests allow them to saturate a 1Gbit link when stuff is in the cache. Within a month I will upgrade to 10Gbit LAN for Unraid and 2.5 Gbit for clients, so it will be interesting to see how it holds up then. I will write a follow-up if it begins to behave in an unexpected way.
-
Samsung 870 QVO for Lancache
Hello, What are your thoughts on a Samsung QVO 870 8TB for lancache use? I need something to use as storage for lancache, and in the 8TB segment they seem to be the cheapest drive by far. The reason I believe it would be a decent drive for lancache is that once the cache is prefilled, it would only be the occasional patches and updates, and therefore would not be affected too much by the performance decrease once the write cache is filled.
-
LeetDonkey started following Parity Check from Terminal?
-
[support] digiblur's Docker Template Repository
I edited the dockerfile to look for the neolink.toml file in a subdirectory rather than /etc/neolink.toml I don't think this is a good way of doing it since it will no longer follow the thirtythreeforty repository. Anyways, if you'd like you can simply change the repository to leetdonkey/neolink: Then change the neolink_config path configuration: then put neolink.toml in /etc/mnt/user/appdata/neolink/ and start the docker. Note that the container use the name neolink.toml instead of config.toml If there is a way of linking directly to a file instead of a directory I am not aware of it, but I must confess that I am very new at this docker stuff
LeetDonkey
Members
-
Joined
-
Last visited