The builtin powerdown script can easily hang preventing a shutdown


103 posts in this topic Last Reply

Recommended Posts

unRAID OS Version: 6.2

 

Description: The builtin powerdown script depends on the webgui to shut down the server the same as if a user clicked shutdown on the webgui.  The powerdown script (webgui shutdown) is too easily hung waiting for something that may never happen.

 

How to reproduce: Not easily reproduced.  It depends on some errant plugins, Dockers, or VMs.

 

Expected results: The powerdown script will complete a shutdown.

 

Actual results: Server potentially hangs indefinitely and doesn't shutdown in some cases.

 

Other information: The powerdown plugin was developed to aid in the shutdown of unRAID even when there were issues that would hang the webgui shutdown.  The powerdown plugin has been deprecated and LT should now develop a more foolproof method of insuring a shutdown, especially when power is lost and a UPS shutdown is initiated.  The builtin powerdown script should be modified to not use the webgui script and instead run bash commands to shut things down and push through things hanging up the shutdown and unmounting of drives.

 

This has been an issue since the v5 days and it should really be addressed.  LT should not rely on a plugin to perform a robust shutdown.

 

Additional information: Executing the 'reboot' or 'shutdown now' commands at the cli appears to bypass the event script processing and I assume other processing the webgui does when called as in the powerdown script.  These are common Linux scripts and if used, should result in a clean shut down and not hang.  This also needs to be addressed.  The rc.local_shutdown and/or rc.6 scripts need to be modified to address these shortcomings.

Link to post
  • Replies 102
  • Created
  • Last Reply

Top Posters In This Topic

Here are the commands available:

 

/sbin/poweroff

/usr/local/sbin/powerdown

 

The 'poweroff' command should work correctly to power off the server regardless of running state AND cleanly shutdown the md/unraid driver so that a parity sync is not required upon reboot.  This command directly invokes 'halt' which invokes 'shutdown' which uses the rc system to initiate shutting down the system.  Cannot be canceled.

 

The 'powerdown' command initiates the same operation but via the webGui API (that is, the same action when the Poweroff button is clicked on the Main page).  This triggers a 'Stop' operation first which can be used to trigger graceful exit of all plugins, etc., however, it is possible for this to 'hang' if there are process(es) holding open file descriptors in mounted disks.

 

All this is to minimize loss of data.  'powerdown' is a graceful shutdown, 'poweroff' is a hammer.

 

The current action of the power button is tied to 'powerdown'.

 

Here is the current powerdown script:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.
# Works only if webGui is running and listening on port 80.

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown' in webGui.
/usr/bin/wget -q -O - localhost/update.htm?shutdown=apply >/dev/null

 

Maybe all we have to do is this:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.
# Works only if webGui is running and listening on port 80.

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown' in webGui.
/usr/bin/wget -q --timeout 30 -O - localhost/update.htm?shutdown=apply >/dev/null
/sbin/poweroff

 

ie, if that 'shutdown' operation times-out, meaning server is still alive after timeout elapsed, then drop the hammer.

 

The trick is to choose the right timeout.

Link to post

I would do it this way:

 

/usr/local/sbin/powerdown

#!/bin/bash
#
if [ -z "${1}" ]
then OPT="-h"
else OPT="${1}"
fi

/sbin/shutdown -t5 ${OPT} now

 

Where:

'powerdown' - would shut system down

'powerdown -r' - would reboot system

 

Then add enhttp shutdown to /etc/rc.d/rc.local_shutdown:

#!/bin/sh
#
# /etc/rc.d/rc.local_shutdown:  Local system shutdown script.
#

# Helper script to gracefully power-down unRAID server.
# Works only if webGui is running and listening on port 80.

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown' in webGui.
/usr/bin/wget -q --timeout 30 -O - localhost/update.htm?shutdown=apply >/dev/null

# Invoke the 'stop' script
if [ -f /boot/config/stop ]; then
  logger "Starting stop script"
  fromdos </boot/config/stop >/var/tmp/stop
  chmod +x /var/tmp/stop
  /var/tmp/stop
fi

# Shutdown cpuload daemon
if [ -x /etc/rc.d/rc.cpuload ]; then
  echo "Stopping cpuload daemon: /etc/rc.d/rc.cpuload"
  /etc/rc.d/rc.cpuload stop
fi

 

Then let /etc/rc.d/rc.6 complete the shutdown because it can catch things missed if emhttp is abruptly interrupted.  Like Docker and VM shutdown, drive unmounting, and system things.

 

This would hook all powerdown events to the shutdown (rc.6) that would always do a clean or semi-clean shutdown/reboot.  For example, 'reboot' would run the /etc/rc.d/rc.6 script that would always run the emhttp shutdown so a clean or semi-clean shutdown would happen.

 

Edit: I doesn't seem to be as easy as I thought.  The emhttp call to shutdown puts Linux into the level 6 mode, so it can't be done after the level 6 has been initiated.

Link to post

Should get rid of Works only if webGui is running and listening on port 80 and allow any port instead

 

# Helper script to gracefully power-down unRAID server.

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown' in webGui.
/usr/bin/wget -q --timeout 30 -O - localhost:$port/update.htm?shutdown=apply >/dev/null

Link to post

Should get rid of Works only if webGui is running and listening on port 80 and allow any port instead

 

# Helper script to gracefully power-down unRAID server.

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown' in webGui.
/usr/bin/wget -q --timeout 30 -O - localhost:$port/update.htm?shutdown=apply >/dev/null

 

Great idea.

Link to post

From my perspective (and I think most users) there need to be a few events that will ALWAYS reliably result in a (preferably clean) shutdown:

 

(a)  A power event where the UPS code has initiated a shutdown;

 

(b)  A quick press of the power button;

 

©  The command "Plink.exe -ssh -pw <password> root@<servername> powerdown

("powerdown" can be whatever it needs to be)

 

(d)  Ctrl-Alt-Del (I never use this, but some do)

 

The first is by far the most important;  the 2nd is important in case the Web GUI is hung; the 3rd is VERY convenient for a "Shutdown" icon on the desktop and for scripts to turn off the server when done; and the 4th is used by a few folks to quickly shut down with a keyboard.

 

The plugin has always worked perfectly for all of these; although I understand there are issues with Dockers and VMs, which can be problematic to cleanly shut down.  I agree with dlandon that this functionality should be reliably part of the basic UnRAID, without requiring a plugin -- but it needs to absolutely shut the system down !!

 

 

 

Link to post

From my perspective (and I think most users) there need to be a few events that will ALWAYS reliably result in a (preferably clean) shutdown:

 

(a)  A power event where the UPS code has initiated a shutdown;

 

(b)  A quick press of the power button;

 

©  The command "Plink.exe -ssh -pw <password> root@<servername> powerdown

("powerdown" can be whatever it needs to be)

 

(d)  Ctrl-Alt-Del (I never use this, but some do)

 

The first is by far the most important;  the 2nd is important in case the Web GUI is hung; the 3rd is VERY convenient for a "Shutdown" icon on the desktop and for scripts to turn off the server when done; and the 4th is used by a few folks to quickly shut down with a keyboard.

 

The plugin has always worked perfectly for all of these; although I understand there are issues with Dockers and VMs, which can be problematic to cleanly shut down.  I agree with dlandon that this functionality should be reliably part of the basic UnRAID, without requiring a plugin -- but it needs to absolutely shut the system down !!

 

That's why I am suggesting for the emhttp shutdown to be in the rc.local_shutdown.  When any event takes Linux to run level 6, the rc.6 script executes and the rc.local_shutdown is executed.  As it stands, some events bypass the built in powerdown and the emhttp shutdown.

Link to post

From my perspective (and I think most users) there need to be a few events that will ALWAYS reliably result in a (preferably clean) shutdown:

 

(a)  A power event where the UPS code has initiated a shutdown;

<<< snip >>>

 

The first is by far the most important;  the 2nd is important in case the Web GUI is hung; the 3rd is VERY convenient for a "Shutdown" icon on the desktop and for scripts to turn off the server when done; and the 4th is used by a few folks to quickly shut down with a keyboard.

 

The plugin has always worked perfectly for all of these; although I understand there are issues with Dockers and VMs, which can be problematic to cleanly shut down.  I agree with dlandon that this functionality should be reliably part of the basic UnRAID, without requiring a plugin -- but it needs to absolutely shut the system down !!

 

Plus 1 for this one.  When a power outage occurs, there is often only a few minutes of runtime on many of the UPS's that many users have.  The  user  is often not there or even unaware that the clock is ticking on the battery capacity of the UPS!  While the user is responsible for determining how long (or what percent of battery capacity) to initiate the shutdown, unRAID has to do it within a short time window without hanging before the battery causes a shutdown of the UPS.  If it does not do this cleanly, data loss is a possibility.  Even if no data loss occurs, many users will assume that something is wrong as an unclean shutdown results a parity check on restart of the server. 

Link to post

Tom,

 

I did a little testing and I think your idea might be best.  The one request I have is that you also provide a reboot script.  It would be nice to use 'powerdown' and 'powerdown -r' to reboot.  I've put together a script that does that:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.

OPT="shutdown"
if [ "${1}" == "-r" ]; then
OPT="reboot"
fi

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown or reboot' in webGui.
/usr/bin/wget -q --timeout=30 -O - localhost:$port/update.htm?${OPT}=apply >/dev/null

# If we get here just take it down hard
/sbin/poweroff

 

Your syntax appears to be wrong in the wget.

--timeout=30

Link to post

Should get rid of Works only if webGui is running and listening on port 80 and allow any port instead

 

# Helper script to gracefully power-down unRAID server.

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown' in webGui.
/usr/bin/wget -q --timeout 30 -O - localhost:$port/update.htm?shutdown=apply >/dev/null

 

 

Could this be the issue in my case ?

i am running the gui on port 90 due to plexconnect needing port 80

 

when do /usr/local/sbin/powerdown nothing happens LOL

 

whereas i use the plugin then it shuts down nicely

Link to post

Should get rid of Works only if webGui is running and listening on port 80 and allow any port instead

 

# Helper script to gracefully power-down unRAID server.

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown' in webGui.
/usr/bin/wget -q --timeout 30 -O - localhost:$port/update.htm?shutdown=apply >/dev/null

 

 

Could this be the issue in my case ?

i am running the gui on port 90 due to plexconnect needing port 80

 

when do /usr/local/sbin/powerdown nothing happens LOL

 

whereas i use the plugin then it shuts down nicely

 

Yes, stock powerdown script works only when GUI runs on port 80.

 

Link to post

Tom,

 

I did a little testing and I think your idea might be best.  The one request I have is that you also provide a reboot script.  It would be nice to use 'powerdown' and 'powerdown -r' to reboot.  I've put together a script that does that:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.

OPT="shutdown"
if [ "${1}" == "-r" ]; then
OPT="reboot"
fi

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown or reboot' in webGui.
/usr/bin/wget -q --timeout=30 -O - localhost:$port/update.htm?${OPT}=apply >/dev/null

# If we get here just take it down hard
/sbin/poweroff

 

Your syntax appears to be wrong in the wget.

--timeout=30

 

The /sbin/poweroff does not work here.  The emhttp call always fall through whether or not it times out and executes poweroff.

Link to post

This script seems to get the job done:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.
SECONDS=0
TIMEOUT=30

OPT="shutdown"
if [ "${1}" == "-r" ]; then
OPT="reboot"
fi

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown or reboot' in webGui.
/usr/bin/wget -q --timeout=${TIMEOUT} -O - localhost:$port/update.htm?${OPT}=apply >/dev/null

if ((${SECONDS} >= ${TIMEOUT})); then
# Take it down hard
/sbin/poweroff
fi

Link to post

Sounds like it's at least very close to the "final answer"  :)    If I understand correctly, this will do the the first 3 things I listed above -- is that correct?  And the link to Ctrl-Alt-Del is available in another plugin, so all 4 of them can now be reliably achieved without the powerdown plugin.

 

... so now all we need is for the next version of UnRAID to replace the current powerdown with this nifty script -- and for the timeout value to be a setting that can be changed in the GUI  :)

 

Link to post

This script seems to get the job done:

 

Is that your final answer?  ;)

 

I think so.

 

One last request.  The inittab entry for the ctrl-alt-del should be changed:

# What to do at the "Three Finger Salute".
ca::ctrlaltdel:/usr/local/sbin/powerdown

Link to post

Sounds like it's at least very close to the "final answer"  :)    If I understand correctly, this will do the the first 3 things I listed above -- is that correct?  And the link to Ctrl-Alt-Del is available in another plugin, so all 4 of them can now be reliably achieved without the powerdown plugin.

 

... so now all we need is for the next version of UnRAID to replace the current powerdown with this nifty script -- and for the timeout value to be a setting that can be changed in the GUI  :)

 

If the timeout value is to be set in the GUI, the 'Help' should include some guidance for determining a suitable value. 

 

Link to post

Sounds like it's at least very close to the "final answer"  :)    If I understand correctly, this will do the the first 3 things I listed above -- is that correct?  And the link to Ctrl-Alt-Del is available in another plugin, so all 4 of them can now be reliably achieved without the powerdown plugin.

 

... so now all we need is for the next version of UnRAID to replace the current powerdown with this nifty script -- and for the timeout value to be a setting that can be changed in the GUI  :)

 

If the timeout value is to be set in the GUI, the 'Help' should include some guidance for determining a suitable value.

 

FWIW I'd think the 30 seconds that Dan & Tom have been using is probably a good choice for most cases; but if there are VMs and Dockers that need to shut down it would likely need to be longer ... and, for that matter, large arrays where the disks aren't spun up may need a bit longer as well.  I think 60 seconds would be long enough for just about any case -- so perhaps it should just be set to that and not adjustable in the GUI.  If there's concern about folks setting inappropriate values for it, perhaps it could just have 3 options -- short, medium, long (that would perhaps correspond to 30, 60, or 90 seconds).

 

 

Link to post

Tom suggested 30 seconds and said that it may not be the correct value.  He can determine the appropriate value.  I'm not sure this is a setting that should be available as a user setting.  Inappropriate settings i.e. too short, may result in a hard shutdown if the emhttp call times out.  After all this only breaks a hung emhttp call and should never take that long if the shutdown occurs as it should.

Link to post

How about having a separate log of shutdown? The script could insert a warning if the timeout expired and a force shutdown was called. If that warning is detected in the log, then you could either extend the timeout period, or do more troubleshooting to track down what is hanging the normal shutdown.

Link to post

I agree it doesn't necessarily have to be a value that the user can set => it simply needs to be long enough that it's only going to happen if emhttp is hung.    I just feel like 30 seconds may be a bit too short for systems with a lot of drives that have to spin up => my main media server (18 drives) can easily take that long to shut down if I do it with no drives already spinning.    Not sure how much time VM's and Dockers may add to the mix -- but I doubt it's much more than that.

 

The counterpoint is you definitely don't want it to wait so long that a UPS-initiated shutdown won't finish before the UPS turns off.  I'd think 60 seconds would be plenty ... or at most 90.    If it's not done by then, it SHOULD be forced [hit with the "hammer"  :) ].

 

Link to post

Tom suggested 30 seconds and said that it may not be the correct value.  He can determine the appropriate value.  I'm not sure this is a setting that should be available as a user setting.  Inappropriate settings i.e. too short, may result in a hard shutdown if the emhttp call times out.  After all this only breaks a hung emhttp call and should never take that long if the shutdown occurs as it should.

How about allowing adjust in the GUI but not allowing less than the 30 seconds or whatever is determined as a good value.  I've got some VMs that I would like to allow 2 minutes as a java app running on them slows the Windows shutdown to about 2 minutes.
Link to post

This script seems to get the job done:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.
SECONDS=0
TIMEOUT=30

OPT="shutdown"
if [ "${1}" == "-r" ]; then
OPT="reboot"
fi

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown or reboot' in webGui.
/usr/bin/wget -q --timeout=${TIMEOUT} -O - localhost:$port/update.htm?${OPT}=apply >/dev/null

if ((${SECONDS} >= ${TIMEOUT})); then
# Take it down hard
/sbin/poweroff
fi

 

Perhaps this is a silly question, but how is SECONDS being set to ever be greater than the timeout? Is that somehow set by wget?

 

Ah, its an internal bash variable.

Link to post

Version 2.  I've added some logging.

 

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.
PROG_NAME="Powerdown"
SECONDS=0
TIMEOUT=30

if [ "${1}" == "-r" ]; then
OPT="reboot"
logger "Rebooting server..." -t"${PROG_NAME}"
else
OPT="shutdown"
logger "Shuting down server..." -t"${PROG_NAME}"
fi

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown or reboot' in webGui.
/usr/bin/wget -q --timeout=${TIMEOUT} -O - localhost:$port/update.htm?${OPT}=apply >/dev/null

logger "Shutdown took ${SECONDS} seconds..." -t"${PROG_NAME}"

if ((${SECONDS} >= ${TIMEOUT})); then
logger "emhttp timed out - doing a hard poweroff..." -t"${PROG_NAME}"
# Take it down hard
/sbin/poweroff
fi

Link to post

This script seems to get the job done:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.
SECONDS=0
TIMEOUT=30

OPT="shutdown"
if [ "${1}" == "-r" ]; then
OPT="reboot"
fi

port=$(lsof -i -P -sTCP:LISTEN|grep -Pom1 '^emhttp.*:\K\d+')

# Access a blank page in case this is first request since startup.
/usr/bin/wget -q -O - localhost:$port/update.htm >/dev/null

# Have emhttp do all the work as if user clicked 'shutdown or reboot' in webGui.
/usr/bin/wget -q --timeout=${TIMEOUT} -O - localhost:$port/update.htm?${OPT}=apply >/dev/null

if ((${SECONDS} >= ${TIMEOUT})); then
# Take it down hard
/sbin/poweroff
fi

 

Perhaps this is a silly question, but how is SECONDS being set to ever be greater than the timeout? Is that somehow set by wget?

 

SECONDS is a built in Linux variable that increments every second.  It is set to zero and increments every second in the background.  The final value is the number of elapsed seconds.

 

Try this script:

SECONDS=0
Sleep 5
echo "Elapsed seconds ${SECONDS}"

Link to post
Guest
This topic is now closed to further replies.