Jump to content

[Plugin] NUT v2 - Network UPS Tools


dmacias

Recommended Posts

40 minutes ago, Rysz said:

 

So I've been able to reproduce the problem on my test server at last, thanks for your patience and testing. The culript is that the UPS seems to send an "OFF" event during the self-testing, which basically says "Hey, I'm offline and no longer providing power to your server" and NUT acts on that and starts a shutdown sequence because it requires at least one functional, online UPS.

 

This is the line where this happens:

Oct  8 17:30:11 UNRAID upsmon[26302]: UPS [email protected]: administratively OFF or asleep

 

What's even stranger is the UPS seems to be "OL" (Online) and "OFF" (Offline) at the same time, these two events shouldn't be able to exist at the same time. I'm guessing this is an APC driver issue with NUT, so I'll have to run this up to the NUT backend developers, the UNRAID plugin is basically just a frontend for the NUT backend (which is developed for more systems than UNRAID). What is curious is that I've found no report from other APC users where this happens, this makes me curious if that is something that is just happening on your UPS or UPS series.

 

What you can try in the meantime:
Change the NUT backend to "legacy (2.7.4)" in NUT Settings

Change the NUT backend to "stable (2.8.0)" in NUT Settings

And please report back if the problem also happens on the different backends.

 

If it doesn't work with the other backends, I might have one more idea you could try.

It's not an ideal one, so I'll keep this as a last "solution" in case all else fails for now.

 

Please also let me know the exact UPS vendor and model that you have!

 

 

Tested everything now:

 

Change the NUT backend to "legacy (2.7.4)" in NUT Settings Kill UPS Power off
Worked perfect

 

Change the NUT backend to "stable (2.8.0)" in NUT Settings, Kill UPS Power off
Worked perfect

 

Change the NUT backend to "stable (2.8.0)" in NUT Settings, Kill UPS Power on
Worked perfect

 

So seems to have be something with backend then.  And i can run on stable i guess.

 

I REALLY want to thank you for the time you put into this. And for all the good info.


UPS Model

Back-UPS RS 900MI
https://www.apc.com/se/sv/product/BR900MI/apc-backups-pro-900va-230v-avr-lcd-6-iecuttag/

 

Not to be confused with below(mine has IEC outlets and below has Schuko outlets). There are other diffrences aswell.
https://www.apc.com/se/sv/product/BR900G-GR/powersaving-backups-pro-900-230-v-schuko/?%3Frange=61888-backups-pro&parent-subcategory-id=88975&selected-node-id=27590292604

 


 

Edited by ABEIDO
  • Like 1
Link to comment
10 minutes ago, ABEIDO said:

 

Tested everything now:

 

Change the NUT backend to "legacy (2.7.4)" in NUT Settings Kill UPS Power off
Worked perfect

 

Change the NUT backend to "stable (2.8.0)" in NUT Settings, Kill UPS Power off
Worked perfect

 

Change the NUT backend to "stable (2.8.0)" in NUT Settings, Kill UPS Power on
Worked perfect

 

So seems to have be something with backend then. 


UPS Model

Back-UPS RS 900MI
https://www.apc.com/se/sv/product/BR900MI/apc-backups-pro-900va-230v-avr-lcd-6-iecuttag/

 

Not to be confused with below(mine has IEC outlets and below has Schuko outlets). There are other diffrences aswell.
https://www.apc.com/se/sv/product/BR900G-GR/powersaving-backups-pro-900-230-v-schuko/?%3Frange=61888-backups-pro&parent-subcategory-id=88975&selected-node-id=27590292604

 


 

 

Thanks a lot for your testing and patience with this, I'm very glad that this is working for you at last.

 

Unfortunately some UPS don't really play well with the newer backends, but I'll raise an issue with the NUT developers so this can be fixed in newer versions (hopefully).

 

In the meantime you can revert all your settings to your preferred values now, just make sure to keep the backend on one that is working for you and also keep the usernames different for monitor and slave. 

 

Edited by Rysz
  • Thanks 2
Link to comment

Thanks for all of the testing and updates here.

 

I've had two random NUT-related shutdowns in the last month, so I'm thinking I may have something similar, however I don't have enough logging enabled to verify everything.

 

That my system came back online without a parity check would indicate a safe shutdown (and both my main server that monitors the UPS, and my backup server which monitors through NUT both did clean shutdowns at the same time, so I'm pretty sure that's where the issue is.

 

Also switched my config to stable 2.8.0 to see if the problem is solved.

 

device.model Back-UPS RS 1350MS

Edited by Nogami
Link to comment
9 hours ago, Nogami said:

Thanks for all of the testing and updates here.

 

I've had two random NUT-related shutdowns in the last month, so I'm thinking I may have something similar, however I don't have enough logging enabled to verify everything.

 

That my system came back online without a parity check would indicate a safe shutdown (and both my main server that monitors the UPS, and my backup server which monitors through NUT both did clean shutdowns at the same time, so I'm pretty sure that's where the issue is.

 

Also switched my config to stable 2.8.0 to see if the problem is solved.

 

device.model Back-UPS RS 1350MS

 

Thanks for the information, it's very likely the same problem as the user above experienced with that APC series and the two week self test (the time frame fits) - so using the legacy or stable backend should solve the problem for you too. I have reported this issue to the NUT (backend) developers in the meantime so hopefully we will see a fix before too long.

 

Edited by Rysz
  • Like 1
Link to comment

hey folks, really appreciate the work to make this plugin exist! not sure if this is a feature request of perhaps evidence I need to debug my issue further, but my log is filled with the following repeating error:

 

Quote

Oct 14 16:29:38 Nostromo root: nut_libusb_get_string: Pipe error

Oct 14 16:29:38 Nostromo root: nut_libusb_get_string: Input/Output Error

Oct 14 16:29:38 Nostromo root: nut_libusb_get_report: Input/Output Error

 

From what I can tell, everything is working properly despite these errors, and after searching forums, the only advice I can find related to these errors is to ignore them. I believe I have verified that I am using the correct driver for my USP (CyberPower CP1500PFCRM2U)

 

Would it be possible to have an option to suppress these errors in the syslog at the plugin level, if they are indeed 'ignorable' ? 

 

Also, on my other Unraid server that is set as slave, again.... I am receiving UPS metrics and looks good across the board, however, my log is flooded with intermittent entries like this:

 

Quote

Oct 14 16:46:09 Volta upsmon[3885]: UPS [[email protected]]: connect failed: Connection failure: Connection refused

 

Is there a value I should be changing to address this? maybe it's polling too frequently and failing intermittently due to that?

Edited by Rusty6285
Link to comment
12 hours ago, Rusty6285 said:

hey folks, really appreciate the work to make this plugin exist! not sure if this is a feature request of perhaps evidence I need to debug my issue further, but my log is filled with the following repeating error:

 

 

From what I can tell, everything is working properly despite these errors, and after searching forums, the only advice I can find related to these errors is to ignore them. I believe I have verified that I am using the correct driver for my USP (CyberPower CP1500PFCRM2U)

 

Would it be possible to have an option to suppress these errors in the syslog at the plugin level, if they are indeed 'ignorable' ? 

 

Also, on my other Unraid server that is set as slave, again.... I am receiving UPS metrics and looks good across the board, however, my log is flooded with intermittent entries like this:

 

 

Is there a value I should be changing to address this? maybe it's polling too frequently and failing intermittently due to that?

 

I've read about this a couple of times in the bug tracker of NUT (backend) so far and, as I understood it, this behaviour essentially boils down to the USB driver's implementation in the Linux kernel in combination with that specific USB controller on either the server, the UPS device or both - so this is something that can't really be influenced much from the NUT side of things.

 

What's always worth attempting is changing the USB port or USB cable, some users have reported connection losses and such I/O-errors because they were using a bad USB port or USB cable.

 

Over-polling is indeed sometimes an issue with some UPS, there are some ups.conf settings you can tune:

grafik.png.0ce764692488bfb4517954a704fe760d.png

 

Just make sure to put them in the safe-zone (below line 12, marked with the red arrow):

  • pollinterval = The status of the UPS will be refreshed after a maximum delay which is controlled by this setting. This is normally 2 seconds. This setting may be useful if the driver is creating too much of a load on your monitoring system or network.
  • pollfreq = Set polling frequency for full updates, in seconds. Compared to the quick updates performed every "pollinterval" (the latter option is described in ups.conf(5)), the "pollfreq" interval is for polling the less-critical variables. The default value is 30 (in seconds).
  • synchronous = By enabling the synchronous flag (value = yes), the driver will wait for data to be consumed by upsd, prior to publishing more. This can be enabled either globally or per driver. The default of auto acts like no (i.e. asynchronous mode) for backward compatibility of the driver behavior, until communications fail with a "Resource temporarily unavailable" condition, which happens when the driver has many data points to send in a burst, and the server can not handle that quickly enough so the buffer fills up.

I've used the default values in my screenshot, you'll need to play around with different values to see the difference - please let us know if it had any positive effect on your problem.

 

Edited by Rysz
Link to comment
12 hours ago, Rysz said:

 

I've read about this a couple of times in the bug tracker of NUT (backend) so far and, as I understood it, this behaviour essentially boils down to the USB driver's implementation in the Linux kernel in combination with that specific USB controller on either the server, the UPS device or both - so this is something that can't really be influenced much from the NUT side of things.

 

What's always worth attempting is changing the USB port or USB cable, some users have reported connection losses and such I/O-errors because they were using a bad USB port or USB cable.

 

Over-polling is indeed sometimes an issue with some UPS, there are some ups.conf settings you can tune:

grafik.png.0ce764692488bfb4517954a704fe760d.png

 

Just make sure to put them in the safe-zone (below line 12, marked with the red arrow):

  • pollinterval = The status of the UPS will be refreshed after a maximum delay which is controlled by this setting. This is normally 2 seconds. This setting may be useful if the driver is creating too much of a load on your monitoring system or network.
  • pollfreq = Set polling frequency for full updates, in seconds. Compared to the quick updates performed every "pollinterval" (the latter option is described in ups.conf(5)), the "pollfreq" interval is for polling the less-critical variables. The default value is 30 (in seconds).
  • synchronous = By enabling the synchronous flag (value = yes), the driver will wait for data to be consumed by upsd, prior to publishing more. This can be enabled either globally or per driver. The default of auto acts like no (i.e. asynchronous mode) for backward compatibility of the driver behavior, until communications fail with a "Resource temporarily unavailable" condition, which happens when the driver has many data points to send in a burst, and the server can not handle that quickly enough so the buffer fills up.

I've used the default values in my screenshot, you'll need to play around with different values to see the difference - please let us know if it had any positive effect on your problem.

 

 

Thanks for the suggestions @Rysz, I tried a variety of settings there and a different USB socket, but no joy - I did however confirm that the "nut_libusb_get_string: Input/Output Error" error appears at whatever frequency the polling rate is set to. 

 

For next steps, I will try a PCIe USB card to see if the controller on my motherboard is possibly at fault, and I can also try using my 2nd unraid server as the master. As a last option, I can try making my TrueNAS scale server the master, however I'm not sure if that can act as a NUT master.

 

Will report back with any progress made, thanks again

Link to comment

Great work so far!

I have a question in regards to the customizability of the new plugin. I have a  Liebert Vertiv GXT5 3000VA connected via USB to a raspberry pi 3B running nut server. 2.8.0. The generic UPS driver does give the important info (Battery charge level, UPS state, UPS runtime available). However, the UPS runtime reports incorrectly in both Nut server & Unraid nut client. From my best guess standard UPS runtime is given in seconds and the clients convert to minutes. However, the Vertiv GXT5 reports raw runtime in minutes so my unraid servers report runtime in seconds. (IE: 40mins runtime reported by the UPS shows as 40 seconds in NUT client) This makes it impossible to configure safe shutdowns based on runtime remaining.

Is there anywhere in the plugin files that I can force NUT to accurately report my runtime.

Link to comment
3 hours ago, STGMavrick said:

Great work so far!

I have a question in regards to the customizability of the new plugin. I have a  Liebert Vertiv GXT5 3000VA connected via USB to a raspberry pi 3B running nut server. 2.8.0. The generic UPS driver does give the important info (Battery charge level, UPS state, UPS runtime available). However, the UPS runtime reports incorrectly in both Nut server & Unraid nut client. From my best guess standard UPS runtime is given in seconds and the clients convert to minutes. However, the Vertiv GXT5 reports raw runtime in minutes so my unraid servers report runtime in seconds. (IE: 40mins runtime reported by the UPS shows as 40 seconds in NUT client) This makes it impossible to configure safe shutdowns based on runtime remaining.

Is there anywhere in the plugin files that I can force NUT to accurately report my runtime.

 

Where and how does the NUT server (the one not on UNRAID) show you the wrong runtime? Usually the NUT server just reports back the raw values from the UPS and doesn't interpret them somehow (on other Linux distributions), so where are you seeing the wrong value on your RPi?

 

Regarding UNRAID, since a few users have asked for this already I'll implement a switch between seconds and minutes reported for UPS runtime in the next update.

 

Edited by Rysz
Link to comment
20 hours ago, Rysz said:

 

Where and how does the NUT server (the one not on UNRAID) show you the wrong runtime? Usually the NUT server just reports back the raw values from the UPS and doesn't interpret them somehow (on other Linux distributions), so where are you seeing the wrong value on your RPi?

 

Regarding UNRAID, since a few users have asked for this already I'll implement a switch between seconds and minutes reported for UPS runtime in the next update.

 

I suppose saying NUT server reporting incorrect is a bad statement. It reports exactly what the UPS itself says is the estimated runtime. But that's awesome to hear there's a fix for it down the road!

Link to comment
20 hours ago, STGMavrick said:

 

I suppose saying NUT server reporting incorrect is a bad statement. It reports exactly what the UPS itself says is the estimated runtime. But that's awesome to hear there's a fix for it down the road!

 

I've just updated the plugin, the new version includes that setting. 🙂

Link to comment

As some others I also suffer from the "data stale" problem after some hours of NUT-ting.. 😞

 

It is obviously bound to the USB Nut Driver running. The only help (for some short time) is to restart the driver.

 

Since this is no real option on my UNRAID, I've installed a strange but currently "stable" workaround:

 

Instead on UNRAID I attach the UPS (btw, Amazon Basic one) to a PI nearby. It suffers the same "data stale" problem, but here I can restart the driver via cron regulary.

 

On UNRAID I've installed NUT in slave mode, asking the PI for the data. This works almost well.

 

Almost, because there is a small chance that just in the moment UNRAID asks, the PI is restarting the driver and does not answer fast enough then. This gives a warning logged but UNRAID recovers.

 

Looks to me as if the polls are every 30s? this is a bit bad because cron can only work on full minutes, so the crash is to be expected.

 

Would be good if I can tell UNRAID just to check every 29 or 31 seconds...

 

Any idea how to change the poll rate in the docker???

 

Link to comment
53 minutes ago, MAM59 said:

As some others I also suffer from the "data stale" problem after some hours of NUT-ting.. 😞

 

It is obviously bound to the USB Nut Driver running. The only help (for some short time) is to restart the driver.

 

Since this is no real option on my UNRAID, I've installed a strange but currently "stable" workaround:

 

Instead on UNRAID I attach the UPS (btw, Amazon Basic one) to a PI nearby. It suffers the same "data stale" problem, but here I can restart the driver via cron regulary.

 

On UNRAID I've installed NUT in slave mode, asking the PI for the data. This works almost well.

 

Almost, because there is a small chance that just in the moment UNRAID asks, the PI is restarting the driver and does not answer fast enough then. This gives a warning logged but UNRAID recovers.

 

Looks to me as if the polls are every 30s? this is a bit bad because cron can only work on full minutes, so the crash is to be expected.

 

Would be good if I can tell UNRAID just to check every 29 or 31 seconds...

 

Any idea how to change the poll rate in the docker???

 

 

Sorry, no idea about the Docker version of NUT - that one's not maintained by me.

Link to comment
2 hours ago, Rysz said:

Sorry, no idea about the Docker version of NUT - that one's not maintained by me.

Oops, sorry, my fault. No Docker, YOUR Version :-)))

(I guess I once had a docker but then switched it someday)

 

So once again: where can I change the poll frequency?

Edited by MAM59
Link to comment
3 minutes ago, MAM59 said:

Oops, sorry, my fault. No Docker, YOUR Version :-)))

(I guess I once had a docker but then switched it someday)

 

So once again: where can I change the poll frequency?

 

See this post for the values:

 

Edited by Rysz
Link to comment
4 minutes ago, RichardIstSauer said:

After some time, NUT'S Plugin not detecting the USP anymore. And the syslog gets flooded by this:

 

Oct 19 15:56:26 SauerRaid upsmon[2015]: Poll UPS [[email protected]] failed - Data stale

Oct 19 15:56:26 SauerRaid usbhid-ups[1972]: libusb1: Could not open any HID devices: insufficient permissions on everything

 

What can I do to fix it?

 

I'd try another USB port and see if the problem persists, such logs have been occurring for other users where USB ports or cables died. 

Link to comment

I've changed the values to 5s and 31s and enabled synchronous (for ever it may be good). But only on the PI so far, not sure where to to put them in UNRAID.

Everytime I try to add them into the "NUT Configuration Editor" of the Plugin it tells me "configuration error" "/usr/local/emhttp" "Error saving configuration file"

:(

 

Link to comment
10 minutes ago, MAM59 said:

I've changed the values to 5s and 31s and enabled synchronous (for ever it may be good). But only on the PI so far, not sure where to to put them in UNRAID.

Everytime I try to add them into the "NUT Configuration Editor" of the Plugin it tells me "configuration error" "/usr/local/emhttp" "Error saving configuration file"

:(

 

 

POLLFREQ 5

 

This is what you'd put in upsmon.conf on UNRAID via the NUT configuration editor in the GUI. You might need to "Reset Config" since yours seems broken somehow.

 

The other lines (from the post) are only for the server the UPS is physically connected to.

 

Edited by Rysz
Link to comment

Thank you for the recent updates! I've never had issues with my UPS, or energy provider, but twice now my Unraid machine turned off, and it took down (gracefully) my router as well which is a "slave" via NUT. The first time it did it, I was in another country, so that was frustrating.

 

Luckily I happened to setup remote syslog yesterday, and NUT just did it again. I captured:

UPS [email protected]: administratively OFF or asleep

 

I googled that error and the first link was this thread.

 

I was sitting about 2 meters from the UPS and never heard an alarm or anything, but my battery wasn't full so it had definitely been used. Super weird. I didn't even know it did a self test :)

 

Anyway, I have now switched to stable (2.8.0), rebooted, and will hope for the best!

Link to comment
1 minute ago, Chunks said:

Thank you for the recent updates! I've never had issues with my UPS, or energy provider, but twice now my Unraid machine turned off, and it took down (gracefully) my router as well which is a "slave" via NUT. The first time it did it, I was in another country, so that was frustrating.

 

Luckily I happened to setup remote syslog yesterday, and NUT just did it again. I captured:

UPS [email protected]: administratively OFF or asleep

 

I googled that error and the first link was this thread.

 

I was sitting about 2 meters from the UPS and never heard an alarm or anything, but my battery wasn't full so it had definitely been used. Super weird. I didn't even know it did a self test :)

 

Anyway, I have now switched to stable (2.8.0), rebooted, and will hope for the best!

 

This is unfortunately a known bug in the NUT backend that happens when some APC devices do their bi-monthly battery self tests. Switching to stable or legacy backend has fixed the problem for the other users affected and a fix for this from the NUT backend developers is on the way. I'm guessing you also have an APC device?

Link to comment
4 minutes ago, Chunks said:

Indeed.

device.model: Back-UPS RS 900G

 

Never had issues with "old Nut" but I guess that's because it was using a (much?) older back end.

 

Yes, that's the affected series. The old NUT used the legacy 2.7.4 backend (and had a range of problems on its own unfortunately). If you choose the NUT stable or legacy backends you'll only get the GUI updates when updating NUT but the backend will always stay on the chosen backend (if that's what you want 🙂)

 

Edited by Rysz
  • Like 1
Link to comment

Saw this in my log from last night:

 

  • seems that UPS [ups] is in OL+DISCHRG state now. Is it calibrating or do you perhaps want to set 'onlinedischarge' option? Some UPS models (e.g. CyberPower UT series) emit OL+DISCHRG when offline.

 

This is using 2.8 stable.  Didn't trigger a shutdown this time, if it was the same issue.  Was only in that mode for 2 seconds, then switched back on and:

 

Link to comment
4 minutes ago, Nogami said:

Saw this in my log from last night:

 

  • seems that UPS [ups] is in OL+DISCHRG state now. Is it calibrating or do you perhaps want to set 'onlinedischarge' option? Some UPS models (e.g. CyberPower UT series) emit OL+DISCHRG when offline.

 

This is using 2.8 stable.  Didn't trigger a shutdown this time, if it was the same issue.  Was only in that mode for 2 seconds, then switched back on and:

 

 

Thanks for reporting back, this is indeed the battery self test, we've luckily been able to fix the APC shutdown issue in the "default (recent master)" backend also since your report. So you can stay on 2.8.0 stable if it works well for you or update NUT and switch back to "default (recent master)". 🙂

 

Edited by Rysz
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...