Jump to content
Joe L.

ACPI Questions? Will server Power button gracefully shut down the server?

40 posts in this topic Last Reply

Recommended Posts

One question worth answering is:

Will pressing the "Power" button on the front of the unRAID server shut it down cleanly?

 

This was recently brought up by reggie14:

One of the things that confuses me is that in the release notes for v4.5 beta 4 there's a line that says "- Pressing Power button gracefully shuts down the server."  That's what made me think that stock unRAID might safely shut down the box when pressing the power button.  So, is there any advantage to using the WeeboTech powerdown script over whatever the built-in functionality is?

 

The answer is a definite "maybe"

 

I've seen other posts basically saying that the motherboard's ACPI implementation can impact whether or not the WeeboTech script or the stock script work.

This is true.  The ability of the Power button on the case to cleanly shut down the server will depend on a correctly functioning (and enabled) ACPI implementation in the motherboard BIOS.  It will also depend on the software on the unRAID sever that is configured to run when the button is pressed.

 Is there anyway to test what happens?  Are there motherboards that tend to work or not work (e.g., new boards vs. old boards, AMD vs. Intel boards, etc.)?  What setting(s) in the motherboard BIOS should be checked?

 

Let's state what we do know:

 

ACPI is the "Advanced Configuration and Power Interface" adopted by most modern BIOS.

  • If your motherboard does not support ACPI, pressing the power button on the case will NOT shut the server down cleanly.
  • If your motherboard's BIOS has a buggy implementation of ACPI, pressing the power button on the case may not shut the server down cleanly.
  • If you have disabled ACPI in your BIOS, pressing the power button on the case  will not shut down the server cleanly.
  • If you have disabled ACPI in your syslinux.cfg, by adding a "boot code", pressing the power button on the case  will not shut down the server cleanly.

 

Ok, above are 4 ways in which the Power button will NOT shut down the array cleanly.

 

Now... assuming the ACPI in the BIOS is working, and enabled, it needs to be acted on by the unRAID software when the power button is pressed.  The software involved in Linux is configured in several places.  To determine if the power button will work, we need to see if those are as desired.

 

We can check if ACPI is enabled and has a process listening (waiting) for you to press the power button.

The process listening is built into the unRAID kernel.  It is a kernel thread named "kacpid"  It, in turn sends its events to a user-space program "/usr/sbin/acpid"  

 

If it is running, you will see it in the process list by typing the following command

ps -ef | grep acpid | grep -v grep

 

The output will look something like this:

root      1416     1  0 Apr08 ?        00:00:00 /usr/sbin/acpid

 

According to this linux-gazette article the acpid process is reading from /proc/acpi/event, waiting patiently.

 

The same article stated we can kill the existing process listening to /proc/acpi/event and listen to it ourselves using the "cat" command.   This really great, since we then have a way to test what will happen when we momentarily press the power button.  

 

I tried this on my unRAID server:

 

I killed the acpid process

killall acpid

 

I then started a "cat" process reading from /proc/acpi/event.

cat /proc/acpi/event

 

I then briefly pressed the power button on my server.  The output on my screen looked like this:

button/power PWRF 00000080 00000001

 

I then pressed "control-C" to exit the "cat" program.  We'll need to re-start the acpid program... you can do that by typing:

/etc/rc.d/rc.acpid restart

 

Now I know, if I had left the original acpid program running, it would have been notified of the button being pressed.  

 

Now, we need to determine what happens next, because we still do not have the unRAID server cleanly shutting down, we are part way through the chain of events that occur when you press the power button.

 

The next place we need to look is for the event handler program subsequently invoked by the acpid process when it gets a message from /proc/acpi/event.   On the unRAID server, it is defined in /etc/acpi/events.

 

If you look in that file, it has a single event handler for all events.  That is /etc/acpi/acpi_handler.sh

 

OK, ... now just what happens in acpi_handler.sh  ?

 

The comments at the top of acpi_handler.sh show that lime-tech edited it to do something different than usual.  The usual action is to call "init 0"  (which just shuts down the server, without cleanly stopping the array)

 

Instead of "init 0" lime-technology has the script invoke

/usr/local/sbin/powerdown

a shell program he wrote.

 

Looking at "/usr/local/sbin/powerdown", it invokes "wget" to press the "Power" button on the web-interface.

 

The first major flaw in this scheme is that it requires that the web-interface be running.  We've had many cases where it crashed, resulting in no management web-page being available.. (and subsequently it is why we were trying to reboot the server to regain control) If "emhttp" has crashed, there is no "Power" button on it to press.  If it has crashed, it will not be listening.  The ACPI handler will do nothing.

 

  If emhttp is is not listening, and you have not installed any add-on processes or scripts, the "Power" button will not shut down the server cleanly.

 

Assuming the web-interface is running, in theory, if lime-tech wrote the emhttp process correctly, processing the "Power" button, will cleanly shut down the server.   However... it is reported his implementation is flawed, since the "Power" button is not normally visible unless the array is stopped.

 

I suspect, and will experiment later this evening, that invoking the /usr/local/sbin/powerdown command will press the "Power" button without first stopping the array, resulting in a non-clean shutdown of the array.

 

My suspicions were wrong.  As long as disks are not busy and can be un-mounted, and emhttp is running, the lime-tech supplied command will cleanly stop the array.

 

Now, if pressing the "Power" button first does "Stop" the array, the array will cleanly stop and I'll not get a parity check on powering up.

 

If you installed WeeboTech's powerdown package via a line or two in your "go" script, it would probably not have intercepted the ACPI event, and you would not have invoked it to cleanly shut down when you pressed the power button.  You would still invoke lime-tech's command, and pressed the "Power" button on the web-interface.

 

As of this past week, I added instruction in the unMENU package that installs WeeboTech's powerdown command to edit /etc/acpi/acpi_handler.sh to have it invoke /sbin/powerdown (WeeboTech's command) instead of /usr/local/sbin/powerdown (the lime-tech command)

 

We are now far better prepared to answer the question:

Will pressing the "Power" button on the front of the unRAID server shut it down cleanly?

 

The answer is:

 "Maybe," but more likely it will not.

 

Next post will deal with what happens next... I think I can predict what lime-tech's version will do, but I'll test first.

 

edit: fixed typo

 

Joe L.

 

Share this post


Link to post

That's for that explanation.  So, I guess my main question now is, since I apparently can't trust whatever stock unRAID is doing, suppose I want to be able to shut down unRAID safely using the power button. What do I have to do, and what do I have to check to make sure its working? 

 

In my case, I had an ACPI process running.  Suppose I install unMenu, and I install the powerdown script via unRAID.  Will I be good to go then?

Share this post


Link to post

This is a very important thread.  Doing a shutdown when all is okay and the web interface is working is the trivial case.  You can stop the array and then issue shutdown or powerdown at the console or via the web interface or whatever. It would be nice if both the lime-tech main menu and the weebotech unMenu had a button clearly marked as "Stop Array and Shut Down System" with a confirmation required, and some diagnostic if it could not stop the array and then asking if the shut down process should continue. Beginners (like myself) are generally nervous and read all sorts of things into what buttons say.

 

The real issue is when the system has hung, nothing is responding and the only choice is to hit the power button on the front of the case.  I would like that to have the best possible chance of doing an orderly shutdown, yet I know if things are bad enough it probably can't.

 

What I would like to see is a short press (less than 4 seconds) attempt to do an orderly shutdown after confirming that that is what the user wants done, with proper messages to syslog and the console if a headed system. If the short press does not produce the confirmation message then the system is proabably in trouble. A long press greater than 4 seconds to do a kill power type of shutdown as that is all that can be done.

 

On my mobo one of the bios options is to have a short press of the case's power button do an S3 suspend and a long press a power shutdown.  That does not appear to work.  A short press appears to do nothing (???) and a long press appears to do a power kill type of shutdown.

Share this post


Link to post

That's for that explanation.  So, I guess my main question now is, since I apparently can't trust whatever stock unRAID is doing, suppose I want to be able to shut down unRAID safely using the power button. What do I have to do, and what do I have to check to make sure its working?  

 

In my case, I had an ACPI process running.  Suppose I install unMenu, and I install the powerdown script via unRAID.  Will I be good to go then?

It is not that you cannot trust what unRAID will do, it is just you have not tested if it will work as desired.

 

I think all of us will have an ACPI process running.  That is not the definitive way of knowing what will happen when the power button is pressed.  It will probably be running even if ACPI is disabled in the BIOS. It will just never be sent any signals when the button is pressed.

 

To test if the power button will send an ACPI message, do as I outlined

 

Type

killall acpid

cat /proc/acpi/event

 

then momentarily press the power button (press it for less than a second).

 

If you see a message like this:

button/power PWRF 00000080 00000001

the power button will work to send a message to shut down the server.

 

You can then type

"control-C" on the screen with the "cat" command to stop it.

 

then type

/etc/rc.d/rc.acpid restart

 

to re-start the acpi daemon process you killed earlier.

 

Then type

grep power /etc/acpi/acpi_handler.sh

 

If you see

# tmm - power off via webGui

#     power) /sbin/init 0

     power) /usr/local/sbin/powerdown

you will invoke the powerdown command lime-tech supplies that pushes the "Power" button on the web-interface via "wget"

 

We still do not know if this results in a clean stop of the array before powering down the server.  So far, I suspect it will not, but I've not yet had the server idle to where I could run my test (and shut it down)

My suspicions were wrong. The lime-tech supplied command does attempt to stop the array. As long as emhttp has not crashed, and all disks can be un-mounted (none are busy), the array will be stopped cleanly.  If any disks are busy, it will not stop.  It will loop waiting for the disks to not be busy.

If you see:

# tmm - power off via webGui

#     power) /sbin/init 0

     power) /sbin/powerdown

It will indicate you installed WeeboTech's powerdown package through the most recent unMENU pagkage installer.  It (unMENU's package) updated /etc/acpi/acpi_handler.sh to have it invoke /sbin/powerdown instead of /usr/local/sbin/powerdown.  

 

This is the only way I know (so far) that will terminate processes holding disks busy, un-mount the disks, stop the array cleanly, save copies of log files, and then power down.  It does not depend on "emhttp" running at all.

 

As I said, when my server is idle I will simply type

/usr/local/sbin/powerdown

I'll watch it power down and see if it attempts to stop the array first.

 

Joe L.

Share this post


Link to post

I'm almost positive that Tom's version DOES stop the array first, at least that was my impression when I inadvertently tried it some time ago.  But sorry, I too don't feel like testing it just now, just to verify that.

 

It does not however perform all the extra checks that WeeboTech's version does, so the WeeboTech version is strongly recommended.  But I *think* that in normal situations, Tom's built-in version does perform a safe shutdown.

 

Still needs someone to verify ...

Share this post


Link to post

I'm almost positive that Tom's version DOES stop the array first, at least that was my impression when I inadvertently tried it some time ago.  But sorry, I too don't feel like testing it just now, just to verify that.

 

It does not however perform all the extra checks that WeeboTech's version does, so the WeeboTech version is strongly recommended.  But I *think* that in normal situations, Tom's built-in version does perform a safe shutdown.

 

Still needs someone to verify ...

 

Well... I hedged my bets.  I first tried invoking the lime-tech supplied command, but with the powerdown package installed.

 

I had re-named the lime-tech supplied command as unraid_powerdown so I invoked it while I had a tail-f active in another window.

/usr/local/sbin/unraid_powerdown

 

Some good news... It appeared as if it did stop the array first.  Or, rather, it attempted to stop the array.

 

Let me explain...  the array will NOT stop, and the array will NOT power down if a disk is busy.

Instead, the array will loop, printing a message to the syslog that it is attempting to un-mount busy disks.

 

Now, my server has a process in place to stop the add-on scripts so the array can cleanly shut down.  It worked as expected and stopped my cache_dirs and spinup_when_accessed.sh scripts.  The disks were then not busy, and the array did shut down.

 

Now... I also had the powerdown script installed, and it saved the syslogs, saved smart reports, etc.  It seemed to run after the array was already stopped.

 

I just hit the power button to power the server back up.  I'll see what is in the saved syslog.

 

Joe L.

Share this post


Link to post

I am waiting anxiously for you to complete your testing.  I suspect, knowing what little I do about you, that you will then describe exactly what needs to be done to make the power button a proper mechanism to shut down the server, providing that linux is not completely hosed.

Share this post


Link to post

Here is what happened when I invoked the lime-tech powerdown command

 

It made the server beep twice.

Apr 20 17:10:25 Tower emhttp: shcmd (179): beep -r 2

It stopped the network time daemon

Apr 20 17:10:25 Tower emhttp: shcmd (180): /etc/rc.d/rc.ntpd stop >/dev/null 2>&1

Apr 20 17:10:25 Tower ntpd[1604]: ntpd exiting on signal 15

Apr 20 17:10:26 Tower emhttp: _shcmd: shcmd (180): exit status: 1

It stopped samba and NFS daemons

Apr 20 17:10:26 Tower emhttp: shcmd (181): /etc/rc.d/rc.samba stop | logger

Apr 20 17:10:26 Tower emhttp: shcmd (182): /etc/rc.d/rc.nfsd stop | logger

It spins up all drives

Apr 20 17:10:27 Tower emhttp: Spinning up all drives...

Apr 20 17:10:27 Tower emhttp: shcmd (183): /usr/sbin/hdparm -S0 /dev/hdj >/dev/null

Apr 20 17:10:27 Tower kernel: mdcmd (53231): spinup 0

Apr 20 17:10:27 Tower kernel: mdcmd (53232): spinup 1

Apr 20 17:10:27 Tower kernel: mdcmd (53233): spinup 2

Apr 20 17:10:27 Tower kernel: mdcmd (53234): spinup 3

Apr 20 17:10:27 Tower kernel: mdcmd (53235): spinup 4

Apr 20 17:10:27 Tower kernel: mdcmd (53236): spinup 5

Apr 20 17:10:27 Tower kernel: mdcmd (53237): spinup 6

Apr 20 17:10:27 Tower kernel: mdcmd (53238): spinup 7

Apr 20 17:10:27 Tower kernel: mdcmd (53239): spinup 8

Apr 20 17:10:27 Tower kernel: mdcmd (53240): spinup 10

Apr 20 17:10:27 Tower kernel: mdcmd (53241): spinup 11

It syncs all the disks so they write any burffered data to the physical disks.

Apr 20 17:10:31 Tower emhttp: shcmd (184): sync

It starts the process of un-mounting disks and removing their mount points.

Apr 20 17:10:53 Tower emhttp: shcmd (185): umount /mnt/user >/dev/null 2>&1

Apr 20 17:10:53 Tower emhttp: shcmd (186): rmdir /mnt/user >/dev/null 2>&1

Apr 20 17:10:53 Tower emhttp: shcmd (187): umount /mnt/disk1 >/dev/null 2>&1

Apr 20 17:10:53 Tower emhttp: shcmd (188): rmdir /mnt/disk1 >/dev/null 2>&1

Apr 20 17:10:53 Tower emhttp: shcmd (189): umount /mnt/disk2 >/dev/null 2>&1

Apr 20 17:10:53 Tower emhttp: shcmd (190): rmdir /mnt/disk2 >/dev/null 2>&1

Apr 20 17:10:53 Tower emhttp: shcmd (191): umount /mnt/disk3 >/dev/null 2>&1

Apr 20 17:10:53 Tower emhttp: shcmd (192): rmdir /mnt/disk3 >/dev/null 2>&1

Apr 20 17:10:53 Tower emhttp: shcmd (193): umount /mnt/disk4 >/dev/null 2>&1

Apr 20 17:10:54 Tower emhttp: shcmd (194): rmdir /mnt/disk4 >/dev/null 2>&1

Apr 20 17:10:54 Tower emhttp: shcmd (195): umount /mnt/disk5 >/dev/null 2>&1

Apr 20 17:10:54 Tower emhttp: shcmd (196): rmdir /mnt/disk5 >/dev/null 2>&1

Apr 20 17:10:54 Tower emhttp: shcmd (197): umount /mnt/disk6 >/dev/null 2>&1

Disk 6 is "busy" and cannot be un-mounted until the process keeping it busy is terminated

Apr 20 17:10:54 Tower emhttp: _shcmd: shcmd (197): exit status: 1

Apr 20 17:10:54 Tower emhttp: shcmd (198): rmdir /mnt/disk6 >/dev/null 2>&1

Apr 20 17:10:54 Tower emhttp: _shcmd: shcmd (198): exit status: 1

Apr 20 17:10:54 Tower emhttp: shcmd (199): umount /mnt/disk7 >/dev/null 2>&1

Apr 20 17:10:54 Tower emhttp: shcmd (200): rmdir /mnt/disk7 >/dev/null 2>&1

Apr 20 17:10:54 Tower emhttp: shcmd (201): umount /mnt/disk8 >/dev/null 2>&1

Apr 20 17:10:54 Tower emhttp: shcmd (202): rmdir /mnt/disk8 >/dev/null 2>&1

Apr 20 17:10:54 Tower emhttp: shcmd (203): umount /mnt/disk10 >/dev/null 2>&1

Apr 20 17:10:55 Tower emhttp: shcmd (204): rmdir /mnt/disk10 >/dev/null 2>&1

Apr 20 17:10:55 Tower emhttp: shcmd (205): umount /mnt/disk11 >/dev/null 2>&1

Apr 20 17:10:55 Tower emhttp: shcmd (206): rmdir /mnt/disk11 >/dev/null 2>&1

Apr 20 17:10:55 Tower emhttp: shcmd (207): umount /mnt/cache >/dev/null 2>&1

Apr 20 17:10:55 Tower emhttp: shcmd (208): rmdir /mnt/cache >/dev/null 2>&1

It will repeatedly try to un-mount the disks that are "busy"

Apr 20 17:10:55 Tower emhttp: Retry unmounting disk share(s)...

My unraid_addon_control script notices the attempt to un-mount the busy disk.

Apr 20 17:10:56 Tower unraid_addon_control.sh: spin_disks: User-shares not online...

Apr 20 17:10:56 Tower spin_disks: User-shares not online...

Apr 20 17:11:00 Tower emhttp: shcmd (209): umount /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:00 Tower emhttp: _shcmd: shcmd (209): exit status: 1

Apr 20 17:11:00 Tower emhttp: shcmd (210): rmdir /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:00 Tower emhttp: _shcmd: shcmd (210): exit status: 1

Apr 20 17:11:00 Tower emhttp: Retry unmounting disk share(s)...

Apr 20 17:11:05 Tower emhttp: shcmd (211): umount /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:00 Tower emhttp: Retry unmounting disk share(s)...

Apr 20 17:11:05 Tower emhttp: shcmd (211): umount /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:05 Tower emhttp: _shcmd: shcmd (211): exit status: 1

Apr 20 17:11:05 Tower emhttp: shcmd (212): rmdir /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:05 Tower emhttp: _shcmd: shcmd (212): exit status: 1

Apr 20 17:11:05 Tower emhttp: Retry unmounting disk share(s)...

Apr 20 17:11:10 Tower emhttp: shcmd (213): umount /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:10 Tower emhttp: _shcmd: shcmd (213): exit status: 1

Apr 20 17:11:10 Tower emhttp: shcmd (214): rmdir /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:10 Tower emhttp: _shcmd: shcmd (214): exit status: 1

Apr 20 17:11:10 Tower emhttp: Retry unmounting disk share(s)...

I have two processes running that need to be stopped before the array can be stopped.  The unraid_addon_control script will terminate them.

Apr 20 17:11:12 Tower unraid_addon_control.sh: Tue Apr 20 17:11:12 EDT 2010 Terminating Add-On Processes

Apr 20 17:11:12 Tower unraid_addon_control.sh: Tue Apr 20 17:11:12 EDT 2010 Stopping ADD-ON processes

Apr 20 17:11:12 Tower unraid_addon_control.sh: Stopping /etc/rc.d/unraid.d/rc.cache_dirs timer parent PID = 3198

Apr 20 17:11:12 Tower unraid_addon_control.sh: Stopped /etc/rc.d/unraid.d/rc.cache_dirs timer PID = 655

Apr 20 17:11:12 Tower unraid_addon_control.sh: Stopping /etc/rc.d/unraid.d/rc.cache_dirs stop

Apr 20 17:11:12 Tower unraid_addon_control.sh: Stopped /etc/rc.d/unraid.d/rc.cache_dirs stop, PID = 661

Apr 20 17:11:12 Tower unraid_addon_control.sh: killing cache_dirs process 28889

Apr 20 17:11:12 Tower cache_dirs: killing cache_dirs process 28889

Apr 20 17:11:13 Tower unraid_addon_control.sh: Stopping /etc/rc.d/unraid.d/rc.spinup_when_accessed timer parent PID = 3198

Apr 20 17:11:13 Tower unraid_addon_control.sh: Stopped /etc/rc.d/unraid.d/rc.spinup_when_accessed timer PID = 683

Apr 20 17:11:13 Tower unraid_addon_control.sh: Stopping /etc/rc.d/unraid.d/rc.spinup_when_accessed stop

Apr 20 17:11:13 Tower unraid_addon_control.sh: Stopped /etc/rc.d/unraid.d/rc.spinup_when_accessed stop, PID = 689

Apr 20 17:11:13 Tower spin_disks[706]: Started spinup_when_accessed.sh -q

Apr 20 17:11:13 Tower unraid_addon_control.sh: spin_disks[712]: killing spinup_when_accessed.sh process 29062

Apr 20 17:11:13 Tower spin_disks[712]: killing spinup_when_accessed.sh process 29062

Apr 20 17:11:14 Tower unraid_addon_control.sh: rc stop processing completed

Apr 20 17:11:14 Tower unraid_addon_control.sh: Tue Apr 20 17:11:14 EDT 2010 Add-On Processes Terminated - Array Can Now Stop

Once the add-on processes are stopped, disk6 is no longer "busy." emhttp has been checking once every 5 seconds and once it is able to un-mount the disk it will. 

Apr 20 17:11:15 Tower emhttp: shcmd (215): umount /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:15 Tower emhttp: _shcmd: shcmd (215): exit status: 1

Apr 20 17:11:15 Tower emhttp: shcmd (216): rmdir /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:15 Tower emhttp: _shcmd: shcmd (216): exit status: 1

Apr 20 17:11:15 Tower emhttp: Retry unmounting disk share(s)...

Apr 20 17:11:20 Tower emhttp: shcmd (217): umount /mnt/disk6 >/dev/null 2>&1

Apr 20 17:11:20 Tower emhttp: shcmd (218): rmdir /mnt/disk6 >/dev/null 2>&1

Once all the disks are un-mounted, the array can be cleanly stopped. 

Apr 20 17:11:20 Tower kernel: mdcmd (53246): stop

Apr 20 17:11:20 Tower kernel: md1: stopping

Apr 20 17:11:20 Tower kernel: md2: stopping

Apr 20 17:11:20 Tower kernel: md3: stopping

Apr 20 17:11:20 Tower kernel: md4: stopping

Apr 20 17:11:20 Tower kernel: md5: stopping

Apr 20 17:11:20 Tower kernel: md6: stopping

Apr 20 17:11:20 Tower kernel: md7: stopping

Apr 20 17:11:20 Tower kernel: md8: stopping

Apr 20 17:11:20 Tower kernel: md10: stopping

Apr 20 17:11:20 Tower kernel: md11: stopping

My cache disk is so old it does not report temperature...

Apr 20 17:11:21 Tower emhttp: disk_temperature: ATTR_Temperature_Celsius not found

emhttp now invokes the poweroff command.

Apr 20 17:11:21 Tower emhttp: shcmd (219): /sbin/poweroff

Apr 20 17:11:21 Tower shutdown[769]: shutting down for system halt

It then switches to run-level 0, that runs /etc/rc.d/rc.0

Apr 20 17:11:21 Tower init: Switching to runlevel: 0

rc.0 invokes "/etc/rc.d/rc.local_shutdown stop"  It invokes "/etc/rc.d/rc.unRAID stop"

This is part of WeeboTech's powerdown package. 

Apr 20 17:11:23 Tower rc.unRAID[779]: Stopping unRAID.

Apr 20 17:11:23 Tower version[780]: Linux version 2.6.32.9-unRAID (root@Develop) (gcc version 4.2.3) #1 SMP Fri Feb 26 19:35:20 MST 2010

Apr 20 17:11:23 Tower cmdline[781]: initrd=bzroot rootdelay=10 vga=extended BOOT_IMAGE=bzimage

Apr 20 17:11:23 Tower meminfo[782]: MemTotal:        497820 kB

Apr 20 17:11:23 Tower meminfo[782]: MemFree:          118288 kB

Apr 20 17:11:23 Tower meminfo[782]: Buffers:            1668 kB

Apr 20 17:11:23 Tower meminfo[782]: Cached:          354888 kB

Apr 20 17:11:23 Tower meminfo[782]: SwapCached:            0 kB

Apr 20 17:11:23 Tower meminfo[782]: Active:            62508 kB

Apr 20 17:11:23 Tower meminfo[782]: Inactive:          43400 kB

rc.unRAID saves all kinds of interesting system logs and data to the syslog.  It performs a smart report on each disk

I've cut the lengthy lines from here...

Apr 20 17:11:30 Tower status[1045]: SMART overall health assessment

Apr 20 17:11:30 Tower status[1045]: /dev/hda: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:30 Tower status[1045]: /dev/hdb: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:31 Tower status[1045]: /dev/hdc: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:31 Tower status[1045]: /dev/hdd: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:31 Tower status[1045]: /dev/hde: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:31 Tower status[1045]: /dev/hdf: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:32 Tower status[1045]: /dev/hdg: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:32 Tower status[1045]: /dev/hdi: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:32 Tower status[1045]: /dev/hdj: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:32 Tower status[1045]: /dev/hdk: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:32 Tower status[1045]: /dev/hdl: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:32 Tower status[1045]: /dev/sda: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:32 Tower status[1045]: /dev/sdb: SMART Health Status: OK

Apr 20 17:11:33 Tower status[1045]: /dev/sdc: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:33 Tower status[1045]: /dev/sdd: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:33 Tower status[1045]: /dev/sde: SMART overall-health self-assessment test result: PASSED

Apr 20 17:11:33 Tower status[1045]: /dev/sdf: SMART overall-health self-assessment test result: PASSED

We already killed any processes holding disks busy, so none are found.

Apr 20 17:11:33 Tower status[1045]: No active PIDS on the array

Apr 20 17:11:34 Tower rc.unRAID[1156]: Killing active pids on the array drives

disks are already un-mounted, so no mounted disks are found

Apr 20 17:11:34 Tower rc.unRAID[1160]: Umounting the drives

Apr 20 17:11:34 Tower rc.unRAID[1163]: umount: /mnt/disk*: not found

Apr 20 17:11:34 Tower rc.unRAID[1163]: Could not find /mnt/disk* in mtab

Apr 20 17:11:34 Tower rc.unRAID[1173]: Stopping the Array

rc.unRAID attemts to stop the array, but it is already stopped.

Apr 20 17:11:34 Tower kernel: mdcmd (53251): stop

Apr 20 17:11:34 Tower kernel: md: stop_array: not started

 

The array then powered down as the rc.0 script eventually invokes /sbin/poweroff

 

Share this post


Link to post

So...

If you have a completely stock unRAID server

and

have ACPI enabled in your BIOS

and

ACPI is not buggy

and

you invoke the lime-technology supplied /usr/local/sbin/powerdown command by pressing the front panel power button momentarily

and

the emhttp process is running

and

no disks are busy with any open files or processes

then

emhttp will cleanly stop the array and invoke /sbin/poweroff

to eventually power down after invoking /etc/rc.d/rc.0 to stop system services.

 

If emhttp crashed, or a disk is busy, or  ACPI not enabled, the array will not stop.

Share this post


Link to post

If you've installed WeeboTech's powerdown package AND have as part of the installation changes the acpi_handler.sh script to invoke it when the power button is pressed, it will terminate any processes holding disks busy, then unmount the disks, then stop the array cleanly.

 

This does have some risk as it will terminate processes writing to disks that are holding them busy (possibly resulting in an incomplete file)

 

Joe L.

Share this post


Link to post

If you've installed WeeboTech's powerdown package AND have as part of the installation changed the acpi_handler.sh script to invoke it when the power button is pressed....

WeeboTech's powerdown package better take care of that change. (hopefuly in his upcoming version 3).  Weebo, are you listening?  ;) 

 

Share this post


Link to post

Apr 20 17:10:27 Tower emhttp: Spinning up all drives...

Apr 20 17:10:27 Tower emhttp: shcmd (183): /usr/sbin/hdparm -S0 /dev/hdj >/dev/null

Apr 20 17:10:27 Tower kernel: mdcmd (53231): spinup 0

Apr 20 17:10:27 Tower kernel: mdcmd (53232): spinup 1

Apr 20 17:10:27 Tower kernel: mdcmd (53233): spinup 2

Apr 20 17:10:27 Tower kernel: mdcmd (53234): spinup 3

Apr 20 17:10:27 Tower kernel: mdcmd (53235): spinup 4

Apr 20 17:10:27 Tower kernel: mdcmd (53236): spinup 5

Apr 20 17:10:27 Tower kernel: mdcmd (53237): spinup 6

Apr 20 17:10:27 Tower kernel: mdcmd (53238): spinup 7

Apr 20 17:10:27 Tower kernel: mdcmd (53239): spinup 8

Apr 20 17:10:27 Tower kernel: mdcmd (53240): spinup 10

Apr 20 17:10:27 Tower kernel: mdcmd (53241): spinup 11

It syncs all the disks so they write any burffered data to the physical disks.

Apr 20 17:10:31 Tower emhttp: shcmd (184): sync

 

I am a little curious as to why emhttp issued that "hdparm -S0" command on disk hdj. 

Is that hdj outside your protected array by any chance?  Is that unRAID's way of waking up sleeping disks?

 

The reason it got me curious is that I used to wake up sleeping disks exactly the same way up intil 4.5-beta6 or something.

After 4.5-beta6, if I do that on a spundown WD-EADS, bad things happen in my syslog.  (frozen, resetting, etc..)

 

Also wondering why He's even bothering to spin up disks... as He is issuing the sync command which would spin up all disks anyway. 

(Of course it's not for us to be able to read His mind, I was just thinking aloud)

 

Share this post


Link to post

I am a little confused.  Lets assume a system with keyboard and monitor. Why would anyone want to use the power button to shutdown the machine.  Why not always be safe and first stop the array from the web page and the powerdown the system from the web page.

 

If we assume the webpage is not there because of a serious emhttp error and it can not be restarted then can the array be stopped from the keyboard and then the system shutdown from the keyboard?

 

To my way of thinking the power button on the case has always been the "last resort"; used only when the OS is hung to a degree that nothing is working and you must force a powerdown in order to reboot.

 

Am I missing something?

Share this post


Link to post

To my way of thinking the power button on the case has always been the "last resort"; used only when the OS is hung to a degree that nothing is working and you must force a powerdown in order to reboot.

 

Am I missing something?

No, you're not missing anything, you said it right.  Some people though like to go straight for the power button.

 

Share this post


Link to post

I am a little curious as to why emhttp issued that "hdparm -S0" command on disk hdj.  

Is that hdj outside your protected array by any chance?  Is that unRAID's way of waking up sleeping disks?

I had enabled a cache drive for the test. /dev/hdj was the cache drive.

apparently, it was spinning up the cache drive.

Share this post


Link to post

I am a little confused.  Lets assume a system with keyboard and monitor. Why would anyone want to use the power button to shutdown the machine.  Why not always be safe and first stop the array from the web page and the powerdown the system from the web page.

If the web-page is not responding, and the machine is being run headless perhaps?

If we assume the webpage is not there because of a serious emhttp error and it can not be restarted then can the array be stopped from the keyboard and then the system shutdown from the keyboard?

If there is a keyboard, and if you know how to cleanly stop the array, yes. But 99.99% of unRAID users will not know the commands to cleanly shut down from the keyboard.

To my way of thinking the power button on the case has always been the "last resort"; used only when the OS is hung to a degree that nothing is working and you must force a powerdown in order to reboot.

 

Am I missing something?

Nope, you are not.  It is nice that the power button might be able to shut down the server cleanly.  Since it is a last resort, I'd prefer it kills processes holding disks busy and shut the array down as cleanly as it can otherwise.

 

Joe L.

Share this post


Link to post

I just duplicated what Joe L suggested as a test procedure to ascertain what is going on with power down. I saw exactly the smae displays he did.  When I tried the power down by the power button, I got two beeps, and then nothing for a while and then the console started to indicate that things were shutting down, and it shut down properly total elapsed time was about 45seconds.  As far as I could tell there was no disk activity when I asked for the shutdown.

 

On the Mobo Bios I have ACPI turned on with the option that a short press is to put it in state S3 and a long press (4 sec+) is to power down.  The short press definately shuts it down properly and since that is what I want I will leave things be.

 

Doesn't the command powerdown typed in at the root just cleanly powerdown the system, I sort of remember seeing that.  I believe the console command shutdown will power down the system but will not stop the array hence a "dirty" shutdown.

Share this post


Link to post

Doesn't the command powerdown typed in at the root just cleanly powerdown the system, I sort of remember seeing that. 

The answer to your question is: 

 

maybe.

 

WeeboTech wrote "powerdown" quite a long time ago.  He installs it in /sbin  It was much later that lime-tech wrote their version of "powerdown" and installed it in /usr/local/sbin

 

Now, if you do not specify an explicit path to "powerdown" you'll get the one that is found first in the search path.  That is the one that lime-tech supplied, as unRAID has /usr/local/sbin in the $PATH before /sbin.

 

It, as we have learned, will cleanly shut down the server IF no disks are busy.  If a disk is busy, it will loop and wait forever until the disk is not busy.  Therefore, your server might not shut down at all. (and why I said "maybe")

 

I believe the console command shutdown will power down the system but will not stop the array hence a "dirty" shutdown.

maybe.

    (you are really going to hate me, but I'll explain)

 

We learned that if if the powerdown command that WeeboTech is installed, it will be invoked by /etc/rc.d/rc.0.

rc.0 invokes "/etc/rc.d/rc.local_shutdown stop"  It, in turn, invokes "/etc/rc.d/rc.unRAID stop"

 

rc.unRAID (WeeboTech's command) will attempt to cleanly stop the array.

 

I've not tested this, but I think it might just cleanly stop the array.  (And why I said "maybe")

 

Joe L.

Share this post


Link to post

I'm confused by this discussion because it says that 4.5 introduced "maybe" clean powerdown by pressing the power button. I have two issues.

 

First, I'm 99% sure that I used to have Weebotech's powerdown configured so that it would allow me to powerdown by pressing the power button. Handy in emergencies. However, that doesn't seem to work anymore and I think it corresponds to when I started installing it with unMenu instead of manually. I'm not sure how to get that functionality back with it in unMenu since there are no configuration options for the package.

 

Second, if it was added natively to 4.5 and I'm running 4.5.6, then why do I get nothing when I press the power button?

Share this post


Link to post

Also, I'm not sure if it is directly related, but I just installed a new MB and I'm getting the below text in my syslog, I don't think I used to get it with my old MB. Is it a problem, and could it be preventing the power button from working? Although I don't think I got the message with my old MB and the power button wasn't working there either.

 

Sep 16 02:29:21 WatchTower kernel: ACPI: I/O resource w83627ehf [0x295-0x296] conflicts with ACPI region HWRE [0x290-0x299]
Sep 16 02:29:21 WatchTower kernel: ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

Share this post


Link to post

Also, I'm not sure if it is directly related, but I just installed a new MB and I'm getting the below text in my syslog, I don't think I used to get it with my old MB. Is it a problem, and could it be preventing the power button from working? Although I don't think I got the message with my old MB and the power button wasn't working there either.

 

Sep 16 02:29:21 WatchTower kernel: ACPI: I/O resource w83627ehf [0x295-0x296] conflicts with ACPI region HWRE [0x290-0x299]
Sep 16 02:29:21 WatchTower kernel: ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

The procedure to use to test what happens when you momentarily press the power button is described in the first post in this thread.  If the BIOS does not initiate the event, nothing will happen.  If the script it invokes is the lime-technology supplied one that uses "wget" to press the button on the web-management console and you have a disk busy then it will NOT power down.

 

Only you can analyze your own system.  Follow the steps I did and you'll see how your server is configured.

 

Joe L.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.