Safe Powerdown


jonp

Recommended Posts

  • 1 month later...
  • 2 weeks later...

Request:

Please implement Safe Powerdown so it can be activated from the console/telnet session.

 

Reason:

For the past several years I have been using only Cyberpower PFC UPS's.  They are the reasonably affordable and designed for PFC power supplies.  Unfortunately, they don't work correctly with APCUPSD when "Power Down UPS After Shutdown" is set to yes.  I would like the ability to run Cyberpower's UPS software in a Docker container or VM, and have it initiate a power down via a scripted telnet session to UnRAID.

 

Thinking ahead.  Still on UnRAID 5.

 

Link to comment

The 'powerdown' command seems to work fine from a console/telnet session in v6 beta 9 as far as I can see.

 

The built in Powerdown does not shut down the array cleanly and can lead to an array parity check when restarted.  The Powerdown plugin deals with insuring that the array is stopped cleanly.

Link to comment

The 'powerdown' command seems to work fine from a console/telnet session in v6 beta 9 as far as I can see.

 

The built in Powerdown does not shut down the array cleanly and can lead to an array parity check when restarted.  The Powerdown plugin deals with insuring that the array is stopped cleanly.

Are you sure?  The log has messages stating that it is stopping the array.  There are also messages relating to stopping docker, VM's etc.  When I use it I do not get a parity check on restart so I was assuming that it was doing the job properly.

Link to comment

The 'powerdown' command seems to work fine from a console/telnet session in v6 beta 9 as far as I can see.

 

The built in Powerdown does not shut down the array cleanly and can lead to an array parity check when restarted.  The Powerdown plugin deals with insuring that the array is stopped cleanly.

 

The powerdown command that comes with APCUPSD works and preserves parity state, though there were some hiccups with beta 6b6, I think. 

 

I presume with the intent of integrating APCUPSD into the core that that script will be the default powerdown  script.

 

Dennis

 

Link to comment

It would be great if the process for powerdown were also included an a reboot script/process. If the GUI is unresponsive (or looping because of a process that won't allow the array to shutdown) and you are remote then powerdown isn't going to help you, but reboot would.

 

I may be mistaken, but I don't think reboot ensures a proper array shutdown currently, does it? I seem to remember having rebooting and being stuck with a parity check on startup because of an unclean shutdown.

Link to comment

It would be great if the process for powerdown were also included an a reboot script/process.

 

On a properly configured system, reboot and powerdown are exactly the same except for the final power state.

Normally I think the commands are poweroff and reboot.  On unRAID I notice that both of these are linked to the halt command.

 

An alternative would be for the LimeTech unRAID powerdown script to recognise the -r option as a reboot the same as the plugin does? 

Link to comment

It would be great if the process for powerdown were also included an a reboot script/process.

 

On a properly configured system, reboot and powerdown are exactly the same except for the final power state.

Normally I think the commands are poweroff and reboot.  On unRAID I notice that both of these are linked to the halt command.

 

An alternative would be for the LimeTech unRAID powerdown script to recognise the -r option as a reboot the same as the plugin does?

 

Yup, that would be good - however if the reboot command could just be replaced it would be much  better. If you are calm and thinking clearly, powerdown -r may work, but reboot is an instinct command, so it would be nice if that did a clean shutdown as well.

Link to comment

reboot is an instinct command, so it would be nice if that did a clean shutdown as well.

 

If I remember correctly, the powerdown add on hooks into the rc.local_shutdown script.

Doesn't that get called on a reboot? (from shutdown)?

 

ls -l /sbin/halt /sbin/reboot /sbin/shutdown

 

-rwxr-xr-x 1 root root  9360 2008-04-02 22:40 /sbin/halt*

lrwxrwxrwx 1 root root    4 2014-01-11 14:37 /sbin/reboot -> halt*

-rwxr-xr-x 1 root root 16864 2008-04-02 22:40 /sbin/shutdown*

Link to comment

reboot is an instinct command, so it would be nice if that did a clean shutdown as well.

 

If I remember correctly, the powerdown add on hooks into the rc.local_shutdown script.

Doesn't that get called on a reboot? (from shutdown)?

 

ls -l /sbin/halt /sbin/reboot /sbin/shutdown

 

-rwxr-xr-x 1 root root  9360 2008-04-02 22:40 /sbin/halt*

lrwxrwxrwx 1 root root    4 2014-01-11 14:37 /sbin/reboot -> halt*

-rwxr-xr-x 1 root root 16864 2008-04-02 22:40 /sbin/shutdown*

 

No idea. :)

 

I need someone better versed to confirm. I am moving all my filesystems from rfs to xfs so am in a 24 hour move period at the moment, so don't want to test (I also don't want to find out it's not the case and end up with a parity check on top of things).

 

Hopefully someone can verify this. It may have been a non-standard situation I last ran into this and had the parity check... I just got thinking about this reading this forum, so thought I would ask.

Link to comment
  • 2 weeks later...

I have learned about how unRAID is shutting down and why it looked like docker was not being stopped on certain events.  unRAID has a built in powerdown that uses the webgui to initiate a shutdown on cli 'powerdown' or pressing the stop button.  The webgui shutdown stops SMB, VMs and docker.  The powerdown plugin does not use the webgui initiated shutdown.  Xen is being stopped in a rc.local_shutdown script that powerdown was using.  I expected the docker to be stopped in this script and it is not.  I've corrected this in the powerdown plugin.

 

There are potentially two issues with using the webgui to shutdown:

1 - The webgui may be hung and not responsive.  A shutdown will not happen.

2 - There may be plugins or other processes keeping the webgui stuck in the unmounting loop.

 

Powerdown does things like this:

1 - Walk through the /etc/rc.d/ directory issuing 'stop' commands to all scripts not in an excluded list (rcdstock).

2 - It then stops docker, samba, ntfsd, and atalk.

3 - It then kills any pids using fuse that are active on the array drives.

4 - Sync and unmount all drives.

5 - The array is then stoped using 'mdcmd stop'.

 

Using this procedure assures that the shutdown will not get stuck and keep the powerdown from completing.  The downside to this is that there could be an unclean shutdown causing a parity check on re-boot.

Link to comment

I have learned about how unRAID is shutting down and why it looked like docker was not being stopped on certain events.  unRAID has a built in powerdown that uses the webgui to initiate a shutdown on cli 'powerdown' or pressing the stop button.  The webgui shutdown stops SMB, VMs and docker.  The powerdown plugin does not use the webgui initiated shutdown.  Xen is being stopped in a rc.local_shutdown script that powerdown was using.  I expected the docker to be stopped in this script and it is not.  I've corrected this in the powerdown plugin.

 

There are potentially two issues with using the webgui to shutdown:

1 - The webgui may be hung and not responsive.  A shutdown will not happen.

2 - There may be plugins or other processes keeping the webgui stuck in the unmounting loop.

 

Powerdown does things like this:

1 - Walk through the /etc/rc.d/ directory issuing 'stop' commands to all scripts not in an excluded list (rcdstock).

2 - It then stops docker, samba, ntfsd, and atalk.

3 - It then kills any pids using fuse that are active on the array drives.

4 - The array is then stoped using 'mdcmd stop'.

 

Using this procedure assures that the shutdown will not get stuck and keep the powerdown from completing.  The downside to this is that there could be an unclean shutdown causing a parity check on re-boot.

 

You missed the vital step, between 3 and 4: in order to power-down "cleanly": all the file systems need to be un-mounted.  This is because an unmount also does a 'sync' which forces all changed data to be written to non-volatile storage.

 

What the emhttp powerdown sequence does is:

1) stop all network protocol components (SMB, NFS, AFP, FTP) - this ensures no new files can be opened via network.

2) invoke all the plugin 'unmounting_disks' events - this lets any and all plugins invoke their specific 'stop' code

3) unmount all the mounted file systems (user shares, then disk and cache) - this ensures unraid driver will see no more writes.

4) 'stop' the array - this commits the super.dat file on the flash which holds a flag that says "cleanly shut down"

5) now run the linux /etc/rc.d/rc.6 powerdown script - this kills processes, etc, eventually halting or rebooting or powering off h/w.

 

The community 'powerdown' plugin is a sort of out-of-band mechanism to try and duplicate the above when, for some reason, emhttp fails to do so.  It's a good approach but a weakness it that it relies on a 'convention' that plugins install an 'rc.xxx' script with stop method, and that's the only thing necessary to shut them down.  Nothing enforces this rule and there are some reasons not to do it that way.

 

Why then does emhttp not always work?  The main reason is that something is preventing a file system from unmounting.  Here is a good article that enumerates some causes:

http://oletange.blogspot.com/2012/04/umount-device-is-busy-why.html

 

The most common reason of the bunch is there is an open file there somewhere.  To kill the process holding the open file nearly always guarantees some data loss - which is why we don't have a process killer.  Clearly though, we need to add it in order to facilitate shutdown following a power-loss where you're running on a battery.

 

The other common cause is a user has ssh or telnet into a shell, leaving their 'current directory' on an array/cache disk.  I guess my attitude for that has been "too bad, don't do that next time".

 

The other reason emhttp doesn't always work is because it' "hanging" on some kind of internal action it's taking.  This shouldn't happen - if it does it means there's a bug.  I've been rather stubborn in the past, wanting to root out such bugs rather than work around them, but it has obviously proven to difficult to find all the possible problems because by nature hard to reproduce.

Link to comment

I um ... like ... don't want to speak out of turn, or be too um ... pointed about it, but this might come across that way. 

 

UPS shutdown is one of those things unRaid should be able to do, with no quibbling.  So after reading your post I'm left with the question:

 

Knowing what you know, and knowing that almost anything is better than a random "plug pull", how would you go about enforcing a shutdown when on battery?

 

Even an open SSH can't be ignored despite your feeling of "well just don't do that next time". At the least process kill that.

Link to comment

I um ... like ... don't want to speak out of turn, or be too um ... pointed about it, but this might come across that way. 

 

UPS shutdown is one of those things unRaid should be able to do, with no quibbling.  So after reading your post I'm left with the question:

 

Knowing what you know, and knowing that almost anything is better than a random "plug pull", how would you go about enforcing a shutdown when on battery?

 

Even an open SSH can't be ignored despite your feeling of "well just don't do that next time". At the least process kill that.

 

Right, no choice but to kill processes.  The current behavior of just looping in un-mount was put there to remind the user: "hey go close everything that is preventing the unmount".  But when 'Stop' is invoked via ups-initiated shutdown, or power-button initiated shut-down, we will immediately and without remorse go kill everything preventing un-mount ;)

Link to comment

I have learned about how unRAID is shutting down and why it looked like docker was not being stopped on certain events.  unRAID has a built in powerdown that uses the webgui to initiate a shutdown on cli 'powerdown' or pressing the stop button.  The webgui shutdown stops SMB, VMs and docker.  The powerdown plugin does not use the webgui initiated shutdown.  Xen is being stopped in a rc.local_shutdown script that powerdown was using.  I expected the docker to be stopped in this script and it is not.  I've corrected this in the powerdown plugin.

 

There are potentially two issues with using the webgui to shutdown:

1 - The webgui may be hung and not responsive.  A shutdown will not happen.

2 - There may be plugins or other processes keeping the webgui stuck in the unmounting loop.

 

Powerdown does things like this:

1 - Walk through the /etc/rc.d/ directory issuing 'stop' commands to all scripts not in an excluded list (rcdstock).

2 - It then stops docker, samba, ntfsd, and atalk.

3 - It then kills any pids using fuse that are active on the array drives.

4 - The array is then stoped using 'mdcmd stop'.

 

Using this procedure assures that the shutdown will not get stuck and keep the powerdown from completing.  The downside to this is that there could be an unclean shutdown causing a parity check on re-boot.

 

You missed the vital step, between 3 and 4: in order to power-down "cleanly": all the file systems need to be un-mounted.  This is because an unmount also does a 'sync' which forces all changed data to be written to non-volatile storage.

 

Yes, the missing step in my post is done in the powerdown plugin.  I forgot to write it as a step.  Corrected.

 

I agree that the community powerdown is an out of the box attempt to shut down cleanly.  Things are better now with plugins moving to dockers.  This keeps those unruly plugins that wouldn't stop contained so they will behave.  I just removed the apcupsd plugin requirement that the powerdown plugin be installed.  The core powerdown will be used to shutdown and will suffice in most cases.  The powerdown plugin can be optionally installed.

 

If the rc scripts are not the way to get the job done, what would be better?  Maybe unRAID should track the plugins installed (it does) and run them with a 'method=shutdown' so they can do what is necessary to shut down cleanly.  This puts the onus on the plugin, not an add on program to brute force through the rc scripts to stop everything.

Link to comment

I um ... like ... don't want to speak out of turn, or be too um ... pointed about it, but this might come across that way. 

 

UPS shutdown is one of those things unRaid should be able to do, with no quibbling.  So after reading your post I'm left with the question:

 

Knowing what you know, and knowing that almost anything is better than a random "plug pull", how would you go about enforcing a shutdown when on battery?

 

Even an open SSH can't be ignored despite your feeling of "well just don't do that next time". At the least process kill that.

 

Right, no choice but to kill processes.  The current behavior of just looping in un-mount was put there to remind the user: "hey go close everything that is preventing the unmount".  But when 'Stop' is invoked via ups-initiated shutdown, or power-button initiated shut-down, we will immediately and without remorse go kill everything preventing un-mount ;)

 

Is this the current, or the planned way of operating?

Link to comment

If the rc scripts are not the way to get the job done, what would be better?  Maybe unRAID should track the plugins installed (it does) and run them with a 'method=shutdown' so they can do what is necessary to shut down cleanly.  This puts the onus on the plugin, not an add on program to brute force through the rc scripts to stop everything.

That's exactly how it works now - see, eg,

/usr/local/emhttp/dockerMan/event/unmounting_disks

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.