Jump to content

Powerdown package for unRAID v5 and v6 (DEPRECATED)


dlandon

Recommended Posts

  • Replies 678
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

  • 2 weeks later...

I have a problem with the powerdown plugin on: unradi v6b9.

When i ssh into the server as root and type: powerdown the server shutsdown. But after starting it up again its everytime doing a Parity-Check.

So i think the server doesnt do a clean powerdown, or there is something not stopping on the array.

 

Do i do something wrong? the command is just "powerdown" right?

 

Or do i need to add a script that is stopping the docker containers?

 

I have added the last piece of the server log below:

 

Sep 27 13:12:55 Tower powerdown[15626]: Powerdown initiated
Sep 27 13:12:55 Tower powerdown[15630]: Powerdown V2.08
Sep 27 13:12:55 Tower rc.unRAID[15632][15633]: Processing /etc/rc.d/rc.unRAID.d/ kill scripts.
Sep 27 13:12:55 Tower powerdown[15638]: Initiating Shutdown with ''
Sep 27 13:12:55 Tower shutdown[15639]: shutting down for system halt
Sep 27 13:13:14 Tower init: Switching to runlevel: 0
Sep 27 13:13:16 Tower rc.unRAID[16000][16001]: Powerdown V2.08
Sep 27 13:13:16 Tower rc.unRAID[16000][16002]: Stopping Plugins.
Sep 27 13:13:16 Tower rc.unRAID[16000][16010]: Running: "/etc/rc.d/rc.snap stop"
Sep 27 13:13:16 Tower rc.unRAID[16000][16014]: ... Snap stopped
Sep 27 13:13:18 Tower rc.unRAID[16000][16068]: Stopping unRAID.
Sep 27 13:13:19 Tower rc.unRAID[16000][16101]: umount2: Device or resource busy
Sep 27 13:13:19 Tower rc.unRAID[16000][16101]: umount: /var/lib/docker: device is busy.
Sep 27 13:13:19 Tower rc.unRAID[16000][16101]:         (In some cases useful info about processes that use
Sep 27 13:13:19 Tower rc.unRAID[16000][16101]:          the device is found by lsof( or fuser(1))
Sep 27 13:13:19 Tower rc.unRAID[16000][16101]: umount2: Device or resource busy
Sep 27 13:13:19 Tower rc.unRAID[16000][16107]: Killing active pids on the array drives
Sep 27 13:13:20 Tower rc.unRAID[16000][16111]: Sync filesystems
Sep 27 13:13:20 Tower rc.unRAID[16000][16128]: Umounting the drives
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: /dev/md1 has been unmounted
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: /dev/md2 has been unmounted
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: /dev/md3 has been unmounted
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: umount2: Device or resource busy
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: umount: /mnt/disk4: device is busy.
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]:         (In some cases useful info about processes that use
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]:          the device is found by lsof( or fuser(1))
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: umount2: Device or resource busy
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: /dev/md5 has been unmounted
Sep 27 13:13:20 Tower rc.unRAID[16000][16138]: Active pids left on the array drives
Sep 27 13:13:20 Tower rc.unRAID[16000][16141]:                      USER        PID ACCESS COMMAND
Sep 27 13:13:20 Tower rc.unRAID[16000][16141]: /dev/md1:            root       3017 F.... java
Sep 27 13:13:20 Tower rc.unRAID[16000][16141]: /dev/md4:            root     kernel mount /mnt/disk4
Sep 27 13:13:20 Tower rc.unRAID[16000][16141]:                      root       2706 f.... shfs
Sep 27 13:13:20 Tower rc.unRAID[16000][16142]: Stopping the Array
Sep 27 13:13:20 Tower kernel: mdcmd (417): stop 
Sep 27 13:13:20 Tower kernel: md: 9 devices still in use.
Sep 27 13:13:20 Tower rc.unRAID[16000][16148]: /root/mdcmd: line 11: echo: write error: Device or resource busy
Sep 27 13:13:25 Tower mdstatusdiff[16231]: --- /tmp/mdcmd.16000.1#0112014-09-27 13:13:20.832959946 +0200
Sep 27 13:13:25 Tower mdstatusdiff[16231]: +++ /tmp/mdcmd.16000.2#0112014-09-27 13:13:25.842934189 +0200
Sep 27 13:13:25 Tower mdstatusdiff[16231]: @@ -85,7 +85,7 @@
Sep 27 13:13:25 Tower mdstatusdiff[16231]:  rdevSize.4=1465138552
Sep 27 13:13:25 Tower mdstatusdiff[16231]:  rdevId.4=SAMSUNG_HD154UI_S1XWJ1BZ114191
Sep 27 13:13:25 Tower mdstatusdiff[16231]:  rdevNumErrors.4=0
Sep 27 13:13:25 Tower mdstatusdiff[16231]: -rdevLastIO.4=1411816400
Sep 27 13:13:25 Tower mdstatusdiff[16231]: +rdevLastIO.4=1411816405
Sep 27 13:13:25 Tower mdstatusdiff[16231]:  rdevSpinupGroup.4=0
Sep 27 13:13:25 Tower mdstatusdiff[16231]:  diskNumber.5=5
Sep 27 13:13:25 Tower mdstatusdiff[16231]:  diskName.5=md5

 

And a second question. What do i need to put in the GO script if i want to do a sceduled clean powerdown every night?

Link to comment

I have a problem with the powerdown plugin on: unradi v6b9.

 

Or do i need to add a script that is stopping the docker containers?

 

I have added the last piece of the server log below:

 

Sep 27 13:13:19 Tower rc.unRAID[16000][16101]: umount2: Device or resource busy
Sep 27 13:13:19 Tower rc.unRAID[16000][16101]: umount: /var/lib/docker: device is busy.
Sep 27 13:13:19 Tower rc.unRAID[16000][16101]:         (In some cases useful info about processes that use
Sep 27 13:13:19 Tower rc.unRAID[16000][16101]:          the device is found by lsof( or fuser(1))
Sep 27 13:13:19 Tower rc.unRAID[16000][16101]: umount2: Device or resource busy
Sep 27 13:13:19 Tower rc.unRAID[16000][16107]: Killing active pids on the array drives
Sep 27 13:13:20 Tower rc.unRAID[16000][16111]: Sync filesystems
Sep 27 13:13:20 Tower rc.unRAID[16000][16128]: Umounting the drives
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: /dev/md1 has been unmounted
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: /dev/md2 has been unmounted
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: /dev/md3 has been unmounted
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: umount2: Device or resource busy
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: umount: /mnt/disk4: device is busy.
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]:         (In some cases useful info about processes that use
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]:          the device is found by lsof( or fuser(1))
Sep 27 13:13:20 Tower rc.unRAID[16000][16132]: umount2: Device or resource busy

 

And a second question. What do i need to put in the GO script if i want to do a sceduled clean powerdown every night?

 

Docker is not being stopped when the cli 'powerdown' is being used.  You can get around this until LT fixes this.  Look earlier in this post and you'll see a way to do it with a powerdown K** script.  I'll also post this method in the OP.

 

Yes, you can set up a cron job to powerdown the array.  I'm not an expert on cron, so I can't suggest how to do that.

Link to comment

removed rc.cache_dirs

 

Curiosity: Why?

 

The proper way to shut down plugins is for the plugin to have an event/unmounting_disks script that is executed whenever the array disks are to be unmounted.  The way powerdown is currently shutting down plugins using the rc scripts directly is not the proper way to do it.

 

I am converting powerdown to execute event/unmounting_disks scripts to shut down plugins.  If cache_dirs needs to be stopped, then the plugin must provide an event/unmounting_disks script.

 

In the past powerdown was brute forcing its way through plugins to shut them down.  This was really a patch to force a shutdown and made up for plugins that did not provide a shutdown script in V5.

 

This needs to change for V6.  The strategy for powerdown will follow the exact strategy as unRAID when shutting down.

Link to comment

Version 2.11 of powerdown is available.  Powerdown was converted from walking through the /etc/rc.d scripts to executing all 'unmounting_disks' event scripts in the plugin directories.  This is the way that unRAID stops plugins and powerdown does the same thing.

 

Powerdown now uses the same strategy to powerdown that emhttp uses, except powerdown does not issue the stop command to emhttp to shutdown.  Powerdown is now doing the same shutdown procedure that emhttp does.

 

The core shutdown functionality works fine and should be used unless you need special features that powerdown offers.  These are:

  • 'K' and 'S' scripts for array start and stop event processing.
  • Historical logs that are kept on the flash drive.  Log location and how many to keep can be configured.
  • Additional logging of shutdown process so problems in shutdown can be identified.

 

I don't expect any more changes to powerdown before V6 final unless there are bugs to fix.

Link to comment

The first two power cuts of the day resulted in clean shutdowns but the third one didn't.  There have been no system modifications today.

Oct 10 16:27:33 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:33 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:33 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:33 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:34 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:34 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:35 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:35 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:35 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:35 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:36 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:36 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:36 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:36 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:37 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:37 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:37 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:37 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:38 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:38 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:38 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:38 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:39 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:39 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:40 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:40 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:40 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:40 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:41 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:41 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:41 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:41 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:42 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:42 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:42 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:42 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:43 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:43 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:43 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:43 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:44 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:44 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:45 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:45 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:45 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:45 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:46 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:46 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:46 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:46 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:47 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:47 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:47 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:47 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:48 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:48 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:48 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:48 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:49 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:49 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:49 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:49 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:50 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:50 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:51 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:51 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:51 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:51 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:52 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:52 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:52 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:52 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:53 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:53 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:53 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:53 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:54 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:54 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:54 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:54 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:55 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:55 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:56 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:56 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:56 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:56 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:57 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:57 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:57 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:57 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:58 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:58 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:58 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:58 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:59 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:59 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:27:59 Tower apcupsd[1871]: Power failure.
Oct 10 16:27:59 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:28:00 Tower apcupsd[1871]: Power failure.
Oct 10 16:28:00 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:28:01 Tower apcupsd[1871]: Power failure.
Oct 10 16:28:01 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:28:01 Tower apcupsd[1871]: Power failure.
Oct 10 16:28:01 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:28:02 Tower apcupsd[1871]: Power failure.
Oct 10 16:28:02 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:28:02 Tower apcupsd[1871]: Power failure.
Oct 10 16:28:02 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:28:03 Tower apcupsd[1871]: Power failure.
Oct 10 16:28:03 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:28:03 Tower apcupsd[1871]: Power failure.
Oct 10 16:28:03 Tower apcupsd[1871]: Power is back. UPS running on mains.
Oct 10 16:28:04 Tower apcupsd[1871]: Power failure.
Oct 10 16:28:10 Tower apcupsd[1871]: Running on UPS batteries.
Oct 10 16:33:11 Tower apcupsd[1871]: Reached run time limit on batteries.
Oct 10 16:33:11 Tower apcupsd[1871]: Initiating system shutdown!
Oct 10 16:33:11 Tower apcupsd[1871]: User logins prohibited
Oct 10 16:33:11 Tower powerdown[28852]: Powerdown initiated
Oct 10 16:33:11 Tower powerdown[28856]: Powerdown V2.12
Oct 10 16:33:11 Tower rc.unRAID[28858][28859]: Processing /etc/rc.d/rc.unRAID.d/ kill scripts.
Oct 10 16:33:11 Tower rc.unRAID[28858][28863]: Running: "/etc/rc.d/rc.unRAID.d/K00.sh"
Oct 10 16:33:11 Tower rc.unRAID[28858][28866]: stopping docker ...
Oct 10 16:33:12 Tower rc.unRAID[28858][28866]: 76fd469db8e4
Oct 10 16:33:13 Tower rc.unRAID[28858][28866]: 807cf60a0038
Oct 10 16:33:13 Tower rc.unRAID[28858][28866]: 06b4203382ef
Oct 10 16:33:14 Tower avahi-daemon[2999]: Withdrawing workstation service for vethf624.
Oct 10 16:33:14 Tower kernel: docker0: port 4(vethf624) entered disabled state
Oct 10 16:33:14 Tower kernel: device vethf624 left promiscuous mode
Oct 10 16:33:14 Tower kernel: docker0: port 4(vethf624) entered disabled state
Oct 10 16:33:15 Tower rc.unRAID[28858][28866]: 541f36c519ad
Oct 10 16:33:15 Tower avahi-daemon[2999]: Withdrawing workstation service for veth991e.
Oct 10 16:33:15 Tower kernel: docker0: port 3(veth991e) entered disabled state
Oct 10 16:33:15 Tower kernel: device veth991e left promiscuous mode
Oct 10 16:33:15 Tower kernel: docker0: port 3(veth991e) entered disabled state
Oct 10 16:33:15 Tower rc.unRAID[28858][28866]: 4b32e542da76
Oct 10 16:33:16 Tower avahi-daemon[2999]: Withdrawing workstation service for vethfbe2.
Oct 10 16:33:16 Tower kernel: docker0: port 1(vethfbe2) entered disabled state
Oct 10 16:33:16 Tower kernel: device vethfbe2 left promiscuous mode
Oct 10 16:33:16 Tower kernel: docker0: port 1(vethfbe2) entered disabled state
Oct 10 16:33:16 Tower rc.unRAID[28858][28866]: unmounting docker loopback
Oct 10 16:33:19 Tower powerdown[28980]: Initiating Shutdown with ''
Oct 10 16:33:19 Tower shutdown[28981]: shutting down for system halt
Oct 10 16:33:20 Tower init: Switching to runlevel: 0
Oct 10 16:33:22 Tower rc.unRAID[28997][28998]: Powerdown V2.12
Oct 10 16:33:22 Tower rc.unRAID[28997][29003]: Stopping Plugins.
Oct 10 16:33:22 Tower rc.unRAID[28997][29004]: Running: "/usr/local/emhttp/plugins/dockerMan/event/unmounting_disks"
Oct 10 16:33:22 Tower logger: Stopping docker.io
Oct 10 16:33:22 Tower logger: stopping docker ...
Oct 10 16:33:22 Tower rc.unRAID[28997][29012]: Running: "/usr/local/emhttp/plugins/dovecot/event/unmounting_disks"
Oct 10 16:33:22 Tower rc.unRAID[28997][29015]: Stopping dovecot...
Oct 10 16:33:22 Tower rc.unRAID[28997][29018]: Running: "/usr/local/emhttp/plugins/fan_speed/event/unmounting_disks"
Oct 10 16:33:22 Tower rc.fan_speed: WARNING: fan_speed called to stop with SERVICE not = disabled
Oct 10 16:33:22 Tower rc.unRAID[28997][29030]: Running: "/usr/local/emhttp/plugins/mpop/event/unmounting_disks"
Oct 10 16:33:23 Tower rc.unRAID[28997][29042]: Running: "/usr/local/emhttp/plugins/powerdown/event/unmounting_disks"
Oct 10 16:33:23 Tower rc.unRAID[28997][29047]: Running: "/usr/local/emhttp/plugins/tftp-hpa/event/unmounting_disks"
Oct 10 16:33:23 Tower rc.unRAID[28997][29050]: command /etc/rc.d/rc.tftp-hpa stop 
Oct 10 16:33:23 Tower rc.unRAID[28997][29052]: Running: "/usr/local/emhttp/plugins/xenMan/event/unmounting_disks"
Oct 10 16:33:24 Tower logger: Stopping XEN domains:  /etc/rc.d/rc.xendomains
Oct 10 16:33:24 Tower logger: Shutting down Xen domains:  [done] 
Oct 10 16:33:24 Tower rc.unRAID[28997][29073]: Stopping unRAID.
Oct 10 16:33:24 Tower avahi-daemon[2999]: Got SIGTERM, quitting.
Oct 10 16:33:24 Tower avahi-dnsconfd[3008]: read(): EOF
Oct 10 16:33:24 Tower avahi-daemon[2999]: Leaving mDNS multicast group on interface docker0.IPv4 with address 172.17.42.1.
Oct 10 16:33:24 Tower avahi-daemon[2999]: Leaving mDNS multicast group on interface br0.IPv4 with address 10.2.0.100.
Oct 10 16:33:24 Tower avahi-daemon[2999]: avahi-daemon 0.6.31 exiting.
Oct 10 16:33:24 Tower rpc.mountd[3770]: Caught signal 15, un-registering and exiting.
Oct 10 16:33:25 Tower rc.unRAID[28997][29097]: Killing active pids on the array drives
Oct 10 16:33:25 Tower kernel: nfsd: last server has exited, flushing export cache
Oct 10 16:33:25 Tower rc.unRAID[28997][29101]: Sync filesystems
Oct 10 16:33:25 Tower rc.unRAID[28997][29103]: Umounting the drives
Oct 10 16:33:25 Tower rc.unRAID[28997][29107]: /dev/md1 has been unmounted
Oct 10 16:33:25 Tower rc.unRAID[28997][29107]: umount2: Device or resource busy
Oct 10 16:33:25 Tower rc.unRAID[28997][29107]: umount: /mnt/disk2: device is busy.
Oct 10 16:33:25 Tower rc.unRAID[28997][29107]:         (In some cases useful info about processes that use
Oct 10 16:33:25 Tower rc.unRAID[28997][29107]:          the device is found by lsof( or fuser(1))
Oct 10 16:33:25 Tower rc.unRAID[28997][29107]: umount2: Device or resource busy
Oct 10 16:33:25 Tower rc.unRAID[28997][29107]: /dev/md3 has been unmounted
Oct 10 16:33:32 Tower rc.unRAID[28997][29107]: /dev/md4 has been unmounted
Oct 10 16:33:39 Tower rc.unRAID[28997][29107]: /dev/md5 has been unmounted
Oct 10 16:33:39 Tower rc.unRAID[28997][29141]: Active pids left on the array drives
Oct 10 16:33:39 Tower rc.unRAID[28997][29144]:                      USER        PID ACCESS COMMAND
Oct 10 16:33:39 Tower rc.unRAID[28997][29144]: /dev/md2:            root     kernel mount /mnt/disk2
Oct 10 16:33:39 Tower rc.unRAID[28997][29145]: Stopping the Array
Oct 10 16:33:39 Tower rc.unRAID[28997][29151]: /root/mdcmd: line 11: echo: write error: Device or resource busy
Oct 10 16:33:39 Tower kernel: mdcmd (32): stop 
Oct 10 16:33:39 Tower kernel: md: 1 devices still in use.
Oct 10 16:33:44 Tower kernel: docker0: port 2(veth4516) entered disabled state
Oct 10 16:33:44 Tower kernel: device veth4516 left promiscuous mode
Oct 10 16:33:44 Tower kernel: docker0: port 2(veth4516) entered disabled state

 

On the face of it, the log would suggest that a root user decided to mount disk2 just at the time disks were being dismounted.

 

What do I need to do to prevent this happening in the future?

 

I'm no longer autostarting a Xen VM ... the functions of that have been replaced by five dockers, which are autostarted.

 

Here's hoping that the power stays up long enough for the parity check to complete ... currently 10% through and forecast to run another 7 hours, 3 minutes, at 107MB/sec.

Link to comment

The first two power cuts of the day resulted in clean shutdowns but the third one didn't.  There have been no system modifications today.

Oct 10 16:33:39 Tower kernel: mdcmd (32): stop 
Oct 10 16:33:39 Tower kernel: md: 1 devices still in use.
Oct 10 16:33:44 Tower kernel: docker0: port 2(veth4516) entered disabled state
Oct 10 16:33:44 Tower kernel: device veth4516 left promiscuous mode
Oct 10 16:33:44 Tower kernel: docker0: port 2(veth4516) entered disabled state

 

On the face of it, the log would suggest that a root user decided to mount disk2 just at the time disks were being dismounted.

 

What do I need to do to prevent this happening in the future?

 

I'm no longer autostarting a Xen VM ... the functions of that have been replaced by five dockers, which are autostarted.

 

Here's hoping that the power stays up long enough for the parity check to complete ... currently 10% through and forecast to run another 7 hours, 3 minutes, at 107MB/sec.

 

Looks to me that docker was still running.  If that is the case, LT will have to review the rc.docker shutdown script and verify it is waiting for docker to be shutdown completely.

Link to comment
Here's hoping that the power stays up long enough for the parity check to complete ... currently 10% through and forecast to run another 7 hours, 3 minutes, at 107MB/sec.

 

Well, it was a vain hope.  The power went off again when the parity check had reached around 68%.

 

At least the shutdown was clean this time so the parity check didn't restart!

 

Argh ...it;s just gone off again!

 

 

Edit:

... and back again, almost an hour later, and it was a clean shutdown again. :)

Link to comment

Well, it was a vain hope.  The power went off again when the parity check had reached around 68%.

 

At least the shutdown was clean this time so the parity check didn't restart!

If the parity check is not allowed to run to completion, you risk a failed rebuild if you have a data drive go bad. A clean shutdown during a parity check should NOT result in the parity being trusted on the next startup. That in itself is a bug, the clean shutdown flag shouldn't be updated until a parity check has been successfully completed.
Link to comment

Damn dude.  You need a bigger ups, maybe a petrol powered one ????

 

I have a diesel-powered one, but the neighbours complain if I run it in the middle of the night - those last two cuts were around midnight and 1am!

 

Re the parity check - my parity drive is 3TB but my largest data drive is 2TB (also, the drive which didn't dismount is only 1TB) so, at 68%, the parity check was past the end of my data drives.  I will need to run a complete parity check before I install any 3TB data drives.  However, completing a parity check (even worse, completing a pre-clear on a 3TB drive) is difficult here.

Link to comment

Is easy to feel sorry for Peter, but look at it this way. Peter is our best hope of getting the clean power down fully tested and debugged.

LOL! Dang, that's pretty cold.  ;D

The poor guy is trying to keep a server running, and you promote (demote) him to guinea pig.

 

Seriously though, it's usually pretty easy to hack a UPS to use 1 or more 12V lead acid marine deep cycle or car batteries, and you can get some really good runtime that way. Daisy chaining UPS units doesn't usually work out too well, each conversion wastes a bunch of energy as heat, it's better to use a battery bank sized for the load on a single UPS.

Link to comment

Is easy to feel sorry for Peter, but look at it this way. Peter is our best hope of getting the clean power down fully tested and debugged.

LOL! Dang, that's pretty cold.  ;D

The poor guy is trying to keep a server running, and you promote (demote) him to guinea pig.

 

Ha-ha - I'm resigned to that role!

 

Seriously though, it's usually pretty easy to hack a UPS to use 1 or more 12V lead acid marine deep cycle or car batteries, and you can get some really good runtime that way. Daisy chaining UPS units doesn't usually work out too well, each conversion wastes a bunch of energy as heat, it's better to use a battery bank sized for the load on a single UPS.

 

Provided that I can guarantee a clean powerdown, I'm not too worried about extended run time.  The main functions for the server are of no interest when the TVs and other computers in the house are off!  What is much more important is ability to cope with multiple successive power cuts when the batteries don't have a chance to recharge inbetween.

 

Extended run time is only useful for:

1) Completing parity checks ... not too important if I can guarantee clean powerdown.

2) Pre-clears, but I don't need to do these very often.  However, a restartable pre-clear would be cool!

3) Keeping torrents and other downloads running.

 

I recently 'upgraded' my unRAID UPS from 650VA to 1100VA, in order to deal more effectively with multiple successive cuts.  It's unfortunate that it's taken APC several months to discover that it is a firmware fault in the new model which is, apparently, killing the batteries in about a month.  I'm awaiting yet another replacement unit - this time with revised firmware.  I think I'm a guinea pig for APC too!

 

The old 650VA unit is redeployed to keep the router running (together with a Raspberry Pi with apcupsd installed) and, at 3% of maximum load, that system can stay up for several hours.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...