apcupsd error


Recommended Posts

working on a new package for unmenu.. it will have the sleep fix included..

PACKAGE_INSTALLATION sed -i -e "s/\$0 stop/\$0 stop\n       sleep 3/" /etc/rc.d/rc.apcupsd

 

please try out and report back:

http://lime-technology.com/forum/index.php?topic=27051.msg253098#msg253098

I'm running the newer apcupsd on my newer server.  So far, it is working as expected.

 

just noticed the rollover works fine.

Jul 21 04:40:01 husky apcupsd[1516]: apcupsd exiting, signal 15
Jul 21 04:40:01 husky apcupsd[1516]: apcupsd shutdown succeeded
Jul 21 04:40:06 husky apcupsd[28940]: apcupsd 3.14.10 (13 September 2011) slackware startup succeeded
Jul 21 04:40:06 husky apcupsd[28940]: NIS server startup succeeded

 

Does for me also

Link to comment

Same here:

Jul 21 04:40:01 Tower apcupsd[12583]: apcupsd exiting, signal 15

Jul 21 04:40:01 Tower apcupsd[12583]: apcupsd shutdown succeeded

Jul 21 04:40:06 Tower apcupsd[10883]: apcupsd 3.14.10 (13 September 2011) slackware startup succeeded

Jul 21 04:40:06 Tower apcupsd[10883]: NIS server startup succeeded

 

apcupsd re-started just as expected.

Link to comment

hobbled together a little box so I could do testing with.

on 5.0 RC16c. Did some more testing today with the latest apcupsd package but with clean powerdown page NOT installed.

 

apcupsd setting to issue shutdown after 20 sec outage. pulled power and checked log and ups status:

(from /sbin/apcaccess status)
APC      : 001,037,0962
DATE     : 2013-07-23 10:30:28 -0500  
HOSTNAME : testtower
VERSION  : 3.14.10 (13 September 2011) slackware
UPSNAME  : testtower
CABLE    : USB Cable
DRIVER   : USB UPS Driver
UPSMODE  : Stand Alone
STARTTIME: 2013-07-23 10:25:43 -0500  
MODEL    : Back-UPS BR1500G 
STATUS   : SHUTTING DOWN
LINEV    : 000.0 Volts
LOADPCT  :   0.0 Percent Load Capacity
BCHARGE  : 100.0 Percent
TIMELEFT : 279.0 Minutes
MBATTCHG : 5 Percent
MINTIMEL : 5 Minutes
MAXTIME  : 20 Seconds
SENSE    : Medium
LOTRANS  : 088.0 Volts
HITRANS  : 147.0 Volts
ALARMDEL : No alarm
BATTV    : 26.1 Volts
LASTXFER : Unacceptable line voltage changes
NUMXFERS : 1
XONBATT  : 2013-07-23 10:28:58 -0500  
TONBATT  : 94 seconds
CUMONBATT: 94 seconds
XOFFBATT : N/A
SELFTEST : NO
STATFLAG : 0x07160210 Status Flag
SERIALNO : 4B1120P04730  
BATTDATE : 2011-05-09
NOMINV   : 120 Volts
NOMBATTV :  24.0 Volts
NOMPOWER : 865 Watts
FIRMWARE : 865.L2 .D USB FW:L2
END APC  : 2013-07-23 10:30:32 -0500  

 

Jul 23 10:25:44 Tower apcupsd[8781]: NIS server startup succeeded
Jul 23 10:25:44 Tower apcupsd[8781]: apcupsd 3.14.10 (13 September 2011) slackware startup succeeded
Jul 23 10:28:58 Tower apcupsd[8781]: Power failure. (Minor Issues)
Jul 23 10:29:04 Tower apcupsd[8781]: Running on UPS batteries.
Jul 23 10:29:25 Tower apcupsd[8781]: Reached run time limit on batteries.
Jul 23 10:29:25 Tower apcupsd[8781]: Initiating system shutdown!
Jul 23 10:29:25 Tower apcupsd[8781]: User logins prohibited

 

staring at the console currently. it did not shut down...

looking at: "/etc/apcupsd/doshutdown" I do see that the package did put in "/sbin/powerdown" then "exit 99"

I would have though it would on 5.x since the powerdown package was really created for 4.x...

I now see that '/sbin/powerdown' doesnt exist on 5.x.. its actually '/usr/local/sbin/powerdown'. Ran this manually and sure enough the box shutdown.

 

So do we detect '/sbin/powerdown' and '/usr/local/sbin/powerdown' then use whichever one they have... what if they have both. what about 4.x users.

 

 

Link to comment

hobbled together a little box so I could do testing with.

on 5.0 RC16c. Did some more testing today with the latest apcupsd package but with clean powerdown page NOT installed.

 

 

I now see that '/sbin/powerdown' doesnt exist on 5.x.. its actually '/usr/local/sbin/powerdown'. Ran this manually and sure enough the box shutdown.

 

So do we detect '/sbin/powerdown' and '/usr/local/sbin/powerdown' then use whichever one they have... what if they have both. what about 4.x users.

it is the same on any version of unRAId with a powerdown script at all.

 

Ideally,  the stock power down script would have an option to NOT wait for all disks to be idle, but to kill processes as needed in a power failure.  It has no such option.

 

you could probably use the stock powerdown if it exists (the install of the clean powerdown package renames the original to "/usr/local/sbin/unraid_powerdown"

 

Problem was, at one time there was NO powerdown script.  The "clean-powerdown" was created and named /sbin/powerdown.

 

Then, Tom created /usr/local/sbin/powerdown. (which invokes a URL from emhttp@localhost )  But... /usr/local/sbin is FIRST in the $PATH, so I was forced to rename the lime-tech supplied command so the other would be executed when someone just typed "powerdown"

 

Best solution on 5.X would be to invoke the original, which should invoke all the "event" scripts to stop everything, and the array will stop as needed and then finally power off the box.  Problem is... not all add-ons have event related triggers to stop themselves.  The disk could be busy from a "telnet" session, or a process running from it as the current-working-directory

 

We basically need a parallel process (in addition to /usr/local/sbin/powerdown) that if it sees the array is waiting for a disk to become non-idle will after an appropriate length of time take over and start terminating processes holding disks busy.  That parallel process cannot even be certain emhttp will be listening for the /sbin/powerdown command.  (It just invokes the URL equivalent on emhttp to the powerdown button)

 

for now...I'd see what happens with something like this:

/usr/local/sbin/powerdown &

test -f /usr/local/sbin/powerdown && sleep 60

/sbin/powerdown

 

 

Link to comment

So when the cleanpower down package is installed.. there is a '/sbin/powerdown'. I just though from the 5.0rc16 thread that '/sbin/powerdown' existed on 5.x from Tom.. but it turns out that its in the usr/local.

 

Anyways, on updating the apcupsd package:

 

So.. on 4.x we deff should tell people to install the clean powerdown package. -- like it does currently. We don't force it or require it as a dependency.

On 5.x. We should tell people to install the clean powerdown package.. (and use it if installed) and if its NOT installed then we should just use the /usr/local/sbin/powerdown instead.. that way it at least does something?

 

I've seen people saying that the powerdown package needs updating for 5.x... what is that about? It seems to work fine.

 

 

Link to comment

ok, on 5.0 rc16c. installed clean power down and apcupsd. Set it to shutdown after 20 sec outage and then issue command to shutdown ups.

Unplugged ups from wall. It shutdown unraid (array was running) then about 1min later the actual ups shutdown just like it was supposed to.

 

Relevant syslog entries:

Jul 23 17:14:36 testtower apcupsd[1909]: Power failure. (Minor Issues)
Jul 23 17:14:42 testtower apcupsd[1909]: Running on UPS batteries.
Jul 23 17:15:03 testtower apcupsd[1909]: Reached run time limit on batteries.
Jul 23 17:15:03 testtower apcupsd[1909]: Initiating system shutdown!
Jul 23 17:15:03 testtower apcupsd[1909]: User logins prohibited
Jul 23 17:15:03 testtower logger: Powerdown initiated
Jul 23 17:15:03 testtower rc.unRAID[2129]: Stopping unRAID.

 

Notice that the apcupsd killups command never makes it to the syslog (since its issued AFTER unraid shuts down).

 

Side note, when shutting down via the clean powerdown script it does a 'status' report which sends a whole bunch of stuff to the syslog.

 

Joe, Several things in that show up as 'errors' because of the regex matching, heres the relevant syslog messages if you want to go exclude them:

Jul 23 17:15:03 testtower mounts[2137]: /dev/sda1 /boot vfat rw,noatime,nodiratime,fmask=0000,dmask=0000,allow_utime=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro 0 0 (Errors)
Jul 23 17:15:03 testtower hdparm[2142]: ^I   *^ISMART error logging (Errors)
Jul 23 17:15:03 testtower smartctl[2149]:                                         without error or no self-test has ever  (Errors)
Jul 23 17:15:03 testtower smartctl[2149]: Error logging capability:        (0x01)        Error logging supported. (Errors)
Jul 23 17:15:03 testtower smartctl[2149]: 184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0 (Errors)
Jul 23 17:15:03 testtower smartctl[2149]: SMART Error Log Version: 1 (Errors)
Jul 23 17:15:03 testtower smartctl[2149]: No Errors Logged (Errors)
Jul 23 17:15:03 testtower ifconfig[2160]:           RX packets:993 errors:0 dropped:0 overruns:0 frame:0 (Errors)
Jul 23 17:15:03 testtower ifconfig[2160]:           TX packets:1225 errors:0 dropped:0 overruns:0 carrier:0 (Errors)
Jul 23 17:15:03 testtower ethtool[2165]:      tx_errors: 0 (Errors)
Jul 23 17:15:03 testtower ethtool[2165]:      rx_errors: 0 (Errors)
Jul 23 17:15:03 testtower ethtool[2165]:      align_errors: 0 (Errors)

 

Let me know if you want me to provide the full syslog if you need to see the context they are in.

 

 

Link to comment

ok, on 5.0 rc16c. installed clean power down and apcupsd. Set it to shutdown after 20 sec outage and then issue command to shutdown ups.

Unplugged ups from wall. It shutdown unraid (array was running) then about 1min later the actual ups shutdown just like it was supposed to.

 

Relevant syslog entries:

Jul 23 17:14:36 testtower apcupsd[1909]: Power failure. (Minor Issues)
Jul 23 17:14:42 testtower apcupsd[1909]: Running on UPS batteries.
Jul 23 17:15:03 testtower apcupsd[1909]: Reached run time limit on batteries.
Jul 23 17:15:03 testtower apcupsd[1909]: Initiating system shutdown!
Jul 23 17:15:03 testtower apcupsd[1909]: User logins prohibited
Jul 23 17:15:03 testtower logger: Powerdown initiated
Jul 23 17:15:03 testtower rc.unRAID[2129]: Stopping unRAID.

 

Notice that the apcupsd killups command never makes it to the syslog (since its issued AFTER unraid shuts down).

 

Side note, when shutting down via the clean powerdown script it does a 'status' report which sends a whole bunch of stuff to the syslog.

 

Joe, Several things in that show up as 'errors' because of the regex matching, heres the relevant syslog messages if you want to go exclude them:

Jul 23 17:15:03 testtower mounts[2137]: /dev/sda1 /boot vfat rw,noatime,nodiratime,fmask=0000,dmask=0000,allow_utime=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro 0 0 (Errors)
Jul 23 17:15:03 testtower hdparm[2142]: ^I   *^ISMART error logging (Errors)
Jul 23 17:15:03 testtower smartctl[2149]:                                         without error or no self-test has ever  (Errors)
Jul 23 17:15:03 testtower smartctl[2149]: Error logging capability:        (0x01)        Error logging supported. (Errors)
Jul 23 17:15:03 testtower smartctl[2149]: 184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0 (Errors)
Jul 23 17:15:03 testtower smartctl[2149]: SMART Error Log Version: 1 (Errors)
Jul 23 17:15:03 testtower smartctl[2149]: No Errors Logged (Errors)
Jul 23 17:15:03 testtower ifconfig[2160]:           RX packets:993 errors:0 dropped:0 overruns:0 frame:0 (Errors)
Jul 23 17:15:03 testtower ifconfig[2160]:           TX packets:1225 errors:0 dropped:0 overruns:0 carrier:0 (Errors)
Jul 23 17:15:03 testtower ethtool[2165]:      tx_errors: 0 (Errors)
Jul 23 17:15:03 testtower ethtool[2165]:      rx_errors: 0 (Errors)
Jul 23 17:15:03 testtower ethtool[2165]:      align_errors: 0 (Errors)

 

Let me know if you want me to provide the full syslog if you need to see the context they are in.

 

an easy one to ignore (would eliminate 3 of those above) looks to be:

_errors:

and then this one would take care 4 more (if case insensitive matching):

\wErrors?\wLog

 

Link to comment

Apparently, if disks are busy, the unRAID gets upset if disks are un-mounted by clean-powerdown.

 

Try this

log in via telnet

cd to /mnt/disk1/

 

then pull power and see what happens when the disk remains busy because of your telnet session.

 

Regarding false error highlighting: You can look at /boot/unmenu/syslog_match.conf  (I'm not near my server right now ... actually about 775 miles from it, I think that is the name of the file with the regex expressions. )

 

Easiest is to put specific matches for lines that are not errors and color them "black"  near the top of that file.  first match is used. so that will get rid of the false hits.

 

Joe L.

Link to comment

so far these take care of all the ones I see... cant be for certain they wont undo any legit ones though. do you have a sample syslog you use to test with ?

# testing
match_case||"SMART error logging"||black
match_case||" End-to-End_Error"||black
match_case||" errors:0 dropped:"||black
match_case||",errors=remount-ro"||black
match_case||"_errors:"||black
match_case||"without error or no self-test has ever"||black
any_case||" Errors? Log"||black

Link to comment

Apparently, if disks are busy, the unRAID gets upset if disks are un-mounted by clean-powerdown.

 

Try this

log in via telnet

cd to /mnt/disk1/

 

then pull power and see what happens when the disk remains busy because of your telnet session.

 

Regarding false error highlighting: You can look at /boot/unmenu/syslog_match.conf  (I'm not near my server right now ... actually about 775 miles from it, I think that is the name of the file with the regex expressions. )

 

Easiest is to put specific matches for lines that are not errors and color them "black"  near the top of that file.  first match is used. so that will get rid of the false hits.

 

Joe L.

 

ok did that.. had a 'ping -c 999 www.google.com' running as well:

Jul 23 20:40:26 testtower in.telnetd[1869]: connect from 192.168.0.2 (192.168.0.2)
Jul 23 20:40:28 testtower login[1870]: ROOT LOGIN  on '/dev/pts/0' from '192.168.0.2'
Jul 23 20:41:51 testtower apcupsd[1837]: Power failure.
Jul 23 20:41:57 testtower apcupsd[1837]: Running on UPS batteries.
Jul 23 20:42:18 testtower apcupsd[1837]: Reached run time limit on batteries.
Jul 23 20:42:18 testtower apcupsd[1837]: Initiating system shutdown!
Jul 23 20:42:18 testtower apcupsd[1837]: User logins prohibited
Jul 23 20:42:18 testtower logger: Powerdown initiated
Jul 23 20:42:18 testtower rc.unRAID[1920]: Stopping unRAID.
...
Jul 23 20:42:19 testtower status[1967]: ACTIVE PIDS on the array
Jul 23 20:42:19 testtower status[1967]: root      1870  1869  0 20:40 pts/0    00:00:00 -bash
Jul 23 20:42:19 testtower status[1967]: root      1901  1870  0 20:41 pts/0    00:00:00 ping -c 9999 www.google.com
Jul 23 20:42:20 testtower rc.unRAID[1995]: Killing active pids on the array drives
Jul 23 20:42:20 testtower rc.unRAID[1998]: root      1870  1869  0 20:40 pts/0    00:00:00 -bash
Jul 23 20:42:20 testtower rc.unRAID[1998]: root      1901  1870  0 20:41 pts/0    00:00:00 ping -c 9999 www.google.com
Jul 23 20:42:21 testtower rc.unRAID[1998]: root      1870  1869  0 20:40 pts/0    00:00:00 -bash
Jul 23 20:42:22 testtower rc.unRAID[1998]: root      1870  1869  0 20:40 pts/0    00:00:00 -bash
Jul 23 20:42:23 testtower rc.unRAID[2018]: Umounting the drives
Jul 23 20:42:23 testtower rc.unRAID[2022]: /dev/md1 umounted
Jul 23 20:42:24 testtower rc.unRAID[2026]: Stopping the Array
Jul 23 20:42:24 testtower kernel: mdcmd (31): stop 
Jul 23 20:42:24 testtower kernel: md1: stopping
...
Jul 23 20:42:27 testtower logger: Initiating Shutdown with 
Jul 23 20:42:27 testtower shutdown[2046]: shutting down for system halt
Jul 23 20:42:27 testtower init: Switching to runlevel: 0
Jul 23 20:42:29 testtower rc.unRAID[2057]: Stopping unRAID.
...
Jul 23 20:42:32 testtower rc.unRAID[2130]: Killing active pids on the array drives
Jul 23 20:42:32 testtower rc.unRAID[2146]: Umounting the drives
Jul 23 20:42:32 testtower rc.unRAID[2150]: umount: /mnt/disk1: not mounted
Jul 23 20:42:32 testtower rc.unRAID[2150]: Could not find /mnt/disk1 in mtab
Jul 23 20:42:32 testtower rc.unRAID[2152]: Stopping the Array
Jul 23 20:42:32 testtower kernel: mdcmd (32): stop 
Jul 23 20:42:32 testtower kernel: md: stop_array: not started

 

so it looks like it worked?

Link to comment

modified the unmenu apcupsd package a little bit.

- if the user has the clean powerdown pacakage we use that for the apcupsd 'doshutdown'. (same as before)

- if the user doesn't have the clean powerdown but does have the unraid one (5.x only?), we use that at least.

 

so, installed apcupsd without the clean powerdown. pulled power. it shutdown much quickier than the clean powerdown package did..

related syslog messages:

Jul 24 18:10:23 testtower apcupsd[1541]: Power failure. (Minor Issues)
Jul 24 18:10:29 testtower apcupsd[1541]: Running on UPS batteries.
Jul 24 18:10:50 testtower apcupsd[1541]: Reached run time limit on batteries.
Jul 24 18:10:50 testtower apcupsd[1541]: Initiating system shutdown!
Jul 24 18:10:50 testtower apcupsd[1541]: User logins prohibited
Jul 24 18:10:50 testtower emhttp: shcmd (33): beep -r 2 (Other emhttp)
Jul 24 18:10:51 testtower emhttp: shcmd (34): /usr/local/sbin/emhttp_event stopping_svcs (Other emhttp)
Jul 24 18:10:51 testtower kernel: mdcmd (31): nocheck  (unRAID engine)
Jul 24 18:10:51 testtower kernel: md: nocheck_array: check not active (unRAID engine)
Jul 24 18:10:51 testtower emhttp_event: stopping_svcs (Other emhttp)
Jul 24 18:10:51 testtower emhttp: Stop AVAHI... (Other emhttp)
Jul 24 18:10:51 testtower emhttp: shcmd (35): /etc/rc.d/rc.avahidaemon stop |$stuff$ logger (Other emhttp)
Jul 24 18:10:51 testtower logger: Stopping Avahi mDNS/DNS-SD Daemon: stopped
Jul 24 18:10:51 testtower avahi-daemon[1100]: Got SIGTERM, quitting.
Jul 24 18:10:51 testtower avahi-dnsconfd[1114]: read(): EOF
Jul 24 18:10:51 testtower avahi-daemon[1100]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.0.117. (Network)
Jul 24 18:10:51 testtower avahi-daemon[1100]: avahi-daemon 0.6.31 exiting.
Jul 24 18:10:51 testtower emhttp: shcmd (36): /etc/rc.d/rc.avahidnsconfd stop |$stuff$ logger (Other emhttp)
Jul 24 18:10:51 testtower logger: Stopping Avahi mDNS/DNS-SD DNS Server Configuration Daemon: stopped
Jul 24 18:10:51 testtower emhttp: shcmd (37): ps axc | grep -q rpc.mountd (Other emhttp)
Jul 24 18:10:51 testtower emhttp: _shcmd: shcmd (37): exit status: 1 (Other emhttp)
Jul 24 18:10:51 testtower emhttp: Stop SMB... (Other emhttp)
Jul 24 18:10:51 testtower emhttp: shcmd (38): /etc/rc.d/rc.samba stop |$stuff$ logger (Other emhttp)
Jul 24 18:10:51 testtower emhttp: shcmd (39): rm /etc/avahi/services/smb.service $stuff$> /dev/null (Other emhttp)
Jul 24 18:10:51 testtower emhttp: Spinning up all drives... (Other emhttp)
Jul 24 18:10:51 testtower kernel: mdcmd (32): spinup 1 (Routine)

 

so from the look of that, it seems to work fine as well.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.