APC UPS Issues and Fixes


Recommended Posts

I've been using unRAID for several years now, and have been extremely happy with it.

 

I moved house a few years ago however, and my utility power supply, while not exactly poor, does have a few outages each year; where I was living previously didn't have any outages at all in the three and a half years I was there.

 

In any case, I decided that it was time to get a UPS, and decided on an APC BackUPS Pro 1000 (BR1000G). The primary reason for choosing this model was run-time. From 1000VA and upwards, the APC BackUPS units have two batteries, which equates to nearly double the run time compared to their competitors such as CyberPower.

 

After setting up the UPS, it was time to add the apcupsd plugin; this plugin uses the apcupsd package 3.14.10. Installation went fine, but I felt that the default plugin script behaviour should be modified so that the unRAID machine did not power off when shut down. Depending on the motherboard used, it may not be possible to set the system to boot when power is supplied, but 'last state' is a very common option. By removing the references to /sbin/poweroff in the plugin script, this issue was solved for me.

 

After I'd modified the apcupsd plugin file, I set about testing the behaviour of the UPS, and ran through a series of tests to simulate power outages and restorations at different stages of system shutdown.

 

To cut a long story short, I discovered a bug in the firmware of the BackUPS BR (and possibly BX) series UPS products from APC, which results in the UPS getting stuck in an endless loop of un-clean shutdowns. The results of my early testing, including demonstration videos (which I sent to APC) can be found here: https://app.box.com/s/x5wmvyr0ntz4vjm8dyxr.

 

By working with the lead developer of apcupsd, and other members of the apcupsd mailing list, we were able to track down the problem and work around it.

 

I'll post a summary of the problem and fix in the following post to this thread, but the important thing is that the apcupsd version be updated to the latest one (3.14.12) in order to included the required patch for these UPS products.

 

Since 3.14.12 is so new (only 1 day old), there doesn't appear to be a Slackware package prepared for it yet. I am not confident to build from source myself, and wondered if someone else would be able to do this, and then update the plugin to include this new package.

 

Failing that, if someone could point me at some information on how I could go about creating the package myself, ideally using my unRAID installation to compile from the source code, I can have a look at doing that myself.

 

Thanks in advance for the help!

 

Neil.

Link to comment

Full details of the issue and fix:

 

After a lot of testing, I discovered that my UPS (a BackUPS Pro 1000G [bR1000G]) would get stuck in a loop where after being issued a ‘killpower’ command by apcupsd, and power cycling the protected inputs, any USB enumeration (the initialisation that a USB port does in any and every OS, as well as the BIOS) would cause the ‘killpower’ command to be repeated, resulting in a potentially continuous power cycling every 60 seconds. In a real world environment, this would be every time the connected PC booted up!

 

I noticed this bug in the first days of owning my BR1000G, and found a thread from 2010 on the APC forum where a BX1500G user had experienced the same issue. I can therefore only assume that the affected products are those from the BackUPS BRxxxxG or BXxxxxG range. During my investigation, I did receive a replacement BR1000G unit from APC, and this showed exactly the same unusual behaviour.

 

This behaviour was consistent using apcupsd and apctest across two flavours of Linux, Mac OS X, as well as the Windows version of apctest. In fact the only version of apcupsd that resulted in correct behaviour was the Windows version of apcupsd running with the -p switch.

 

The unique thing about the Windows implementation of apcupsd, is that Windows does not allow for the file system to be re-mounted read-only just before halting the system. The only way to power off the UPS therefore is to issue the ‘killpower’ command just before Windows starts shutting down, and hoping that the OS is shut down before the UPS shutdown sequence reaches the end of it’s timer and cuts power to its protected outlets (60 seconds when called by apcupsd / 120 seconds when called by PowerChute Personal). During Windows shutdown, the apcupsd daemon continues to run for a few seconds before it is stopped, but during this time it continues to request status information from the UPS after the ‘killpower’ signal was sent. The ‘killpower’ signal is therefore not the last command that the UPS sees before it cycles power.

 

In all other OS situations, the ‘killpower’ command is the absolute last command sent to the UPS, just before the OS halts. All versions of apctest, including Windows, also only issue a single command to ‘killpower’ with no other commands following it, and this also results in the unusual UPS behaviour.

 

It seems that the UPS firmware bug causes the UPS to store the last received command, and issue it again when the USB connection on the connected computer is enumerated. If this last command was the ’killpower’ command, the UPS will go through it’s shutdown and restart sequence. The UPS will actually end up stuck in this loop until one of two things happen:-

 

- The UPS receives some other command from apcupsd or PowerChute Personal:

This different command will presumably replace the ‘killpower’ command stored in the UPS, and it is this command that will presumably end up being repeated at enumeration. If the connected PC is able to boot up to the point where apcupsd starts before the UPS cycles power, the loop will be broken (although an un-clean power down will still take place). If the PC takes too long to boot up, the UPS will continue rebooting the connected PC indefinitely!

 

- The UPS is allowed to power down completely:

The only situation where this happens, is when utility power is lost long enough for the UPS to totally shut down, this is about 30 seconds after the UPS cuts power to it’s protected outlets, and is indicated by the LEDs on the UPS buttons turning off.

 

The fix that Adam was able to implement, was to have apcupsd and apctest updated so that whenever a ‘killpower’ command is sent to the UPS, it is followed by a benign additional read data command. Presumably this benign command ends up being repeated, but doesn’t cause any problems.

 

Like the Windows version of apcupsd (with the -p switch), PowerChute Personal probably continues to monitor the UPS after it’s issued the ‘killpower’ command to the UPS, and the firmware bug therefore never manifests itself. The bug is still present of course, but it seems that it has gone un-noticed for several years.

 

Neil.

Link to comment

So, if I understand correctly, this 'bug' only becomes apparent if utility power returns between the killpower command being issued, and the UPS actually shutting down?

 

Almost Peter, but not quite...

 

The 'bug' becomes apparent if utility power returns between the start of the connected PC shutdown (which on my unRAID system is nearly 120 seconds before the 'killpower' command is issued), and the UPS fully shutting down.

 

The standard process for a shutdown sequence in the event of a utility power outage is something like:

  • PC starts to shut down <- 0 seconds
  • At the end of the PC shutdown, 'killpower' command issued <- ~120 seconds
  • UPS waits to ensure all connected devices are really truly shut down, then kills power to protected outlets <- 60 seconds
  • UPS goes through its own shutdown sequence, and powers off fully (presumably emptying the command buffer) <- 30 seconds

This gives a window of 3 and a half minutes (~210 seconds). Although utility power being restored in this relatively short window could perhaps be considered un-likely, the result if this does happen is quite nasty - continuous un-clean shut-downs every ~60 seconds, during the boot process!

 

The UPS is only fully shut down when the button LEDs have gone out; not simply when power is cut from the protected outlets.

Link to comment

I managed to compile the 64 bit version of APCUPSD to 3.14.12 (see thread here).

 

PLEASE NOTE: this is the 64 bit version and tested to work for Unraid 6 BETA and will not work for Unraid 5.x

 

Edit:

Since I don't have an Unraid 5 system any more if anyone is interested in trying to compile the 32bit version here are the instructions:

 

To compile you need the APCUPSD source code and also the customized compilation file from SlackBuilds.org to tailor APCUPSD for Slackware.

 

1. Install Slackware version matching Unraid 5.x (?Is it still 13.1?). Do a full install including X-Windows. Apparently some dependencies in APCUPSD depends on libraries in X-windows.

2. Download the correct APCUPSD version of from http://slackbuilds.org/result/?search=Apcupsd&sv= 

3. Download current source code for APCUPSD: http://www.apcupsd.com/

4. Unzip the file you got from SlackGuilds ( tar xvfz {file name} )

5. Copy the APUCUPSD source code into the apcupsd directory created by step 4.

6. Modify the apcupsd.SlackBuild to reflect the new APCUPSD version

7. run ./apcupsd.SlackBuild  and hopefully it will compile without errors. If all goes well it will make a tgz package file

 

If compile gives errors maybe try different versions of apcupsd  from SlackBuilds Repository (I did not have to change anything for the Slackware 14.1 version [unraid 6])

 

Link to comment

In the release notes for 3.14.12 there is:

 

  * Fix issue with certain Back-UPS USB models repeatedly cycling power on/off

    after killpower is issued

 

Is this the fix you are talking about, that is, is it only necessary to build 3.14.12 and no other patches need to be applied?

Link to comment

In the release notes for 3.14.12 there is:

 

  * Fix issue with certain Back-UPS USB models repeatedly cycling power on/off

    after killpower is issued

 

Is this the fix you are talking about, that is, is it only necessary to build 3.14.12 and no other patches need to be applied?

 

Yes, that is the fix that I am talking about.

 

I have now created a 3.14.12 package using a clean Slackware 13.1 install and the SlackBuilds script. Simply updating the plugin to call for the new build would be very simple, but I'm thinking that there might be some other updates to the plugin from the testing that has been done in the x64 thread that could be incorporated at the same time. I have posted in that thread and am waiting to hear from them about the changes that have been made over the 3.14.10 plugin.

Link to comment

In the release notes for 3.14.12 there is:

 

  * Fix issue with certain Back-UPS USB models repeatedly cycling power on/off

    after killpower is issued

 

Is this the fix you are talking about, that is, is it only necessary to build 3.14.12 and no other patches need to be applied?

 

It is Robby Workman's build of 3.14.10 which has been used in recent plugins.

 

I have exchanged emails with Robby - he doesn't have time to do the builds right now, but he has had a chance to create the new build script, which I attach here.  Robby seems to add some patches to the standard build.

apcupsd.tar.gz.txt

Link to comment
The 'bug' becomes apparent if utility power returns between the start of the connected PC shutdown (which on my unRAID system is nearly 120 seconds before the 'killpower' command is issued), and the UPS fully shutting down.

 

I had a few moments of concern earlier today.  We had a short powercut, with the power returning after the unRAID server had powered off, but before the UPS cut power to its outputs.  However, there was no problem and I restarted the system without any problem.  I'm using a Back-UPS BX1100CI-MS.

 

  • UPS waits to ensure all connected devices are really truly shut down, then kills power to protected outlets <- 60 seconds

 

Is this your configuration of KILLDELAY?

 

  • UPS goes through its own shutdown sequence, and powers off fully (presumably emptying the command buffer) <- 30 seconds

 

Both my BX1100 and my CS650 turm off almost instantly after cutting the power to the output terminals.

Link to comment

I had a few moments of concern earlier today.  We had a short powercut, with the power returning after the unRAID server had powered off, but before the UPS cut power to its outputs.  However, there was no problem and I restarted the system without any problem.  I'm using a Back-UPS BX1100CI-MS.

 

Where are you located Peter? I did a quick Google search for the BackUPS BX1100CI-MS and found that most images pointed to a product that looks like this, with what appear to be UK type plug sockets on the rear, and 240v rather than 110v:

 

6DC217325AF54BD849257B4B0026C454_EWAR_96WSW6_f_v_500x500.jpg

 

I apologise if my assumption of affected UPS models incorrectly included yours (as I think it might have). I personally have identified the problem in my (and one other replacement) BR1000G model, and I noted the issue being identified by someone with a BX1500G on the APC forum, as well as another apcupsd user on their mailing list with a BR700G unit.

 

Perhaps it would be better to say that the fault most likely affects BRxxxG and BXxxxG units. Yours being a BXxxxCI may therefore very well not be affected.

 

For reference, the BRxxxG and BXxxxG models all look similar to this:

 

D66C85EA1D49F7048525775B006E4E6F_SLIE_877RPK_f_v_500x500.jpg

 

Is this your configuration of KILLDELAY?

 

Possibly, but I believe the BackUPS does not take notice of the KILLDELAY setting, and this is only used for SmartUPS units. I might be wrong about this, and I didn't change the setting from default.

 

Both my BX1100 and my CS650 turm off almost instantly after cutting the power to the output terminals.

 

Your model looks significantly different from mine, and the other BRxxxG / BXxxxG models, so that doesn't really surprise me at all. These newer models have a display lifted from the newer SmartUPS range, and some energy saving modes which switch off some outlets based on the current drain on others (configurable). Since these units are more complex, it stands to reason that the shutdown delay might be longer.

 

If you would like to see a more accurate demonstration of the fault, and the timing etc. Please take a look at the videos that I made and uploaded to Box here: https://app.box.com/s/x5wmvyr0ntz4vjm8dyxr

Link to comment
Possibly, but I believe the BackUPS does not take notice of the KILLDELAY setting, and this is only used for SmartUPS units. I might be wrong about this, and I didn't change the setting from default.

 

I've just re-read the apcupsd documentation, and I think I was wrong about this being ignored by BackUPS units. There is no reason why I need this set at 60 seconds (the default in apcupsd.conf), so I think I'll change it and see how it behaves.

Link to comment
I've just re-read the apcupsd documentation, and I think I was wrong about this being ignored by BackUPS units. There is no reason why I need this set at 60 seconds (the default in apcupsd.conf), so I think I'll change it and see how it behaves.

 

Scratch that... I've just taken a look at my apcupsd.conf and seen that KILLDELAY is already set at 0, so it's my UPS that is presumably 'adding' 60 seconds on to that.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.