WebGUI becomes unresponsive when docker hangs


139 posts in this topic Last Reply

Recommended Posts

created as per jonp:

 

I'm having an issue that followed me from 14b:

 

Sometimes one of my docker containers becomes unresponsive, in this case binhex's sabnzbd.  I'll try to stop the container from the webui and it hangs.  The webui becomes unreachable.  Attempting to do a docker stop/kill/rf -f from the CLI also just hangs indefinitely until I do a crtl+c .

 

syslog shows nothing of use, just mover operations and when I pulled the container logs then attempted to stop the container via webui.

 

the only useful info i could find is from the container logs:

 

**repeated a lot**
2015-04-23 19:47:31,435 DEBG 'sabnzbd' stderr output:
2015-04-23 19:47:31,435::INFO::[__init__:1060] Restarting crashed scheduler
2015-04-23 19:47:31,435::INFO::[scheduler:172] Setting schedule for midnight BPS reset
**

2015-04-23 19:48:44,699 WARN received SIGTERM indicating exit request
2015-04-23 19:48:44,699 DEBG killing sabnzbd (pid 7) with signal SIGTERM
2015-04-23 19:48:44,701 INFO waiting for sabnzbd to die
2015-04-23 19:48:44,702 DEBG 'sabnzbd' stderr output:
2015-04-23 19:48:44,701::WARNING::[__init__:172] Signal 15 caught, saving and exiting...

2015-04-23 19:48:47,705 INFO waiting for sabnzbd to die
2015-04-23 19:48:50,709 INFO waiting for sabnzbd to die
2015-04-23 19:48:53,712 INFO waiting for sabnzbd to die

 

Any idea what's going on here?  I'm going to have to manually stop the array and reboot to fix this issue :(

 

 

Additional info:

This has happened to me a few times, I'll noticed the webui for one of my container apps isn't responding and I try to stop/start it via the docker page and it causes the webui to hang.  I've also tried to stop the array when one of the containers is not responding and it gets caught up in the unmounting process forever and webui becomes unresponsive.

 

Either way the only way to resolve this has been to manually stop the array and reboot from ssh/console.

 

Docker itself isn't logging any sort of error to the syslog so it's really hard to determine what the issue is and why it can't forcefully kill containers in certain conditions.

Link to post
  • Replies 138
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Running into this issue with needo's sabnzbd docker where the container crashes and is unable to be stopped by docker.

 

When I did docker ps, I saw the sabnzbd container still running. Then I sent a kill pid. The process is still there, but the description just shows defunct. Here's the output of ps -ef | grep docker:

 

root@Tower:~# docker ps
CONTAINER ID        IMAGE                  COMMAND             CREATED             STATUS              PORTS                    NAMES
58b6f4155c2c        needo/sabnzbd:latest   "/sbin/my_init"     5 days ago          Up 42 hours         0.0.0.0:8081->8080/tcp   SABnzbd     
root@Tower:~# ps -ef | grep docker
root     15580 15579  0 15:43 ?        00:00:00 /bin/sh /usr/local/emhttp/plugins/dynamix.docker.manager/event/unmounting_disks unmounting_disks
root     15582 15580  0 15:43 ?        00:00:00 /bin/sh /etc/rc.d/rc.docker stop
root     15593 15582  0 15:43 ?        00:00:00 docker stop 8b7082948abd a58320b05b65 1f20158faba2 827d387e9439 0488dd923874 55775b1c673e 58b6f4155c2c 729e5f4ae1b3
root     18041 17973  0 16:01 pts/1    00:00:00 grep docker
root     18440     1  0 Apr23 ?        00:00:49 /usr/bin/docker -d -p /var/run/docker.pid --storage-driver=btrfs
root     18659 18440  0 Apr23 ?        00:00:06 [docker] <defunct>

 

pid 18659 used to be the sabnzbd container. sending a kill -9 does nothing either

 

Only after killing the docker process (pid 18440 above) solves the problem

 

In the meantime, the array cannot be stopped. If there was a power outage, I would have a problem once the UPS died.

 

I started noticing this after the upgrade to beta15, perhaps docker 1.5 is to blame. But it is happening pretty frequently. I'm tired of logging into the terminal every time sab crashes to kill all docker processes and start over.

Link to post

I've also been running into this issue - docker containers are causing the web UI to hang. I've also run into the exact same one just noted - needo's sabnzb docker froze, tried to stop it via the web UI and the web UI stopped responding.

 

Unfortunately it seems it also stops the powerdown plugin command working too, as running that doesn't appear to kill docker and so it locks up putting the array into a state where I can't stop it via the web UI, SSH or directly on the machine (tty1 is running powerdown, tty2-8 don't accept any input for whatever reason). This means the only option is to poweroff the machine without shutting down the array which results in the need for another parity check...

 

It's worth noting that I've only just upgraded to 6.0 b15, from 4.7 via 5, so no intermediate 6 betas have been run on this machine. I've been running 4.7 with the same hardware for about 2 years now with no issues.

 

If I can provide any extra info or logs to help track down this issue please let me know.

Link to post

I had the situation that Docker run out of space and had to great a larger image file.

 

Can you telnet into your system and do:

 

# df -h /var/lib/docker

Filesystem      Size  Used Avail Use% Mounted on

/dev/loop0      10G  2.9G  6.8G  31% /var/lib/docker

And post the result.

Link to post

Ok, let me ask this question for those experiencing this issue.  When you can't stop the container and you attempt to stop it from command line, what command are you using?  docker stop or docker kill?

 

If you haven't tried docker kill please try that and report back if it successfully stops the container.

Link to post

it just happened to me again after manually rebooting the server this weekend, same deal, sabnzbd is the culprit.  I was going to switch to needo's container but apparently it doesn't matter.

 

Neither stop or kill work.  I can host a webex or something if you need to investigate further.

 

 

 

 

Link to post

yes, I have a postprocessing script for movies:

 

#!/bin/bash
mkdir /mnt/cache/movies

 

basically it just ensures that /mnt/cache/movies exists when a movie has been downloaded so CP can copy from /mnt/cache/downloads to /mnt/cache/movies

Link to post

I tried both stop and kill and neither worked.

 

I don't have any postprocessing set up in sab, but I realized that around the same time I upgraded to beta 15 Ib also turned on the multi core processing in sab. I since turned it off and haven't experienced a crash yet. I'm hoping that was the problem. I'll let you guys know if it crashes again.

 

Link to post

Rebooted my server and set par2_multicore = 0 in sabnzbd.ini, I've got 60Gb of queue built up so we'll see what happens.  Also worth mentioning that I'm running two sabnzdb containers, the second one "sabnzbp" has never seen this issue, but it downloads far less.

 

 

**edit**

jonp - does docker automatically spread the container load across multiple host cores? I'm still seeing high load on all 4 of my unraid cores when sab does repair/unrar operations.

Link to post

Rebooted my server and set par2_multicore = 0 in sabnzbd.ini, I've got 60Gb of queue built up so we'll see what happens.  Also worth mentioning that I'm running two sabnzdb containers, the second one "sabnzbp" has never seen this issue, but it downloads far less.

 

 

**edit**

jonp - does docker automatically spread the container load across multiple host cores? I'm still seeing high load on all 4 of my unraid cores when sab does repair/unrar operations.

 

Docker can spread load across multiple cores unless you pin the containers to specific CPUs.  I have a thread in the Docker forum about how to do this (getting fancy with Docker and CPU pinning).

Link to post

So in theory, we shouldn't need to invoke any sort of forced multithreadedness in the application to get good performance, right?  or can docker only spread multiple threads across cores, in the case of par2 in sab, can docker load balance that operation or should we be looking at using multicore par2 and cpu pinning?

Link to post

So in theory, we shouldn't need to invoke any sort of forced multithread in the application to get good performance, right?

 

That's theory only.  The only fact I can tell you is that if you use --cpuset=#,#... to define the CPUs to bind the container to, then the container will be restricted to only using those CPUs.  How it uses those CPUs, however, could be controlled inside the container.

Link to post

beat me to my edit, i'll have to investigate using pinning and mc par2.  now with the app issues solved... any luck decoupling the webui from a hung docker process?

 

Not yet.  This could be a Docker bug, but honestly not sure just yet.  This is the first time we're seeing host instability be impacted by a container.  The fact that you aren't giving the container any privileged access or host networking and the issue still persists is a little concerning.  We are actively investigating.

Link to post

i forked needo's sabnzbd container a while ago when i didn't know that you could just put in your own volumes direct from the template.

 

my docker of it now is using base image phusion/baseimage:0.9.16 and i've had no issues with it crashing anything.

 

i download pretty much constantly too, have 12 or so dockers in total at any one time.

 

FWIW, my dockerfile.

 

FROM phusion/baseimage:0.9.16

ENV DEBIAN_FRONTEND noninteractive

# Set correct environment variables
ENV HOME /root
ENV TERM xterm
# Use baseimage-docker's init system
CMD ["/sbin/my_init"]

# Add local files
ADD sabnzbd.sh /root/sabnzbd.sh

# Fix a Debianism of the nobody's uid being 65534
RUN usermod -u 99 nobody && \
usermod -g 100 nobody && \

add-apt-repository ppa:jcfp/ppa && \
add-apt-repository "deb http://us.archive.ubuntu.com/ubuntu/ trusty universe multiverse" && \
add-apt-repository "deb http://us.archive.ubuntu.com/ubuntu/ trusty-updates universe multiverse" && \
add-apt-repository ppa:jon-severinsson/ffmpeg && \
apt-get update -q && \
apt-get install -qy unrar par2 sabnzbdplus wget ffmpeg sabnzbdplus-theme-mobile curl && \

# Install multithreaded par2
apt-get remove --purge -y par2 && \
wget -P /tmp http://www.chuchusoft.com/par2_tbb/par2cmdline-0.4-tbb-20100203-lin64.tar.gz && \
tar -C /usr/local/bin -xvf /tmp/par2cmdline-0.4-tbb-20100203-lin64.tar.gz --strip-components 1 && \



# Download folders
mkdir -p /mnt/Downloads /mnt/XBMC-Media /mnt/Incomplete && \



# Add sabnzbd to runit
mkdir /etc/service/sabnzbd && \
mv /root/sabnzbd.sh /etc/service/sabnzbd/run && \
chmod +x /etc/service/sabnzbd/run

EXPOSE 8080
EXPOSE 9090

# Path to a directory that only contains the sabnzbd.conf
VOLUME /config

 

 

and my volume mappings etc...

 

 

FTJgd7R.png

 

network mode is bridge and it is not privileged.

 

no postprocessing scripts.

Link to post

sab crashed again. This time with the multicore option turned off. It was in the middle of downloading (a crappy one, please don't judge :-)

 

The docker is still alive.  Attached is what was in the log from today.

 

Attempting to stop from webgui results in emhttp hanging.

 

here is the terminal ps -ef output:

 

root@Tower:~# ps -ef | grep SABnzbd
root     11836  1696  0 20:08 ?        00:00:00 sh -c cd /usr/local/emhttp; /usr/bin/php /usr/local/src/wrap_post.php update.php '' '%23command=%2Fusr%2Fbin%2Fdocker+stop+SABnzbd'
root     11837 11836  0 20:08 ?        00:00:00 /usr/bin/php /usr/local/src/wrap_post.php update.php  %23command=%2Fusr%2Fbin%2Fdocker+stop+SABnzbd
root     11838 11837  0 20:08 ?        00:00:00 /usr/bin/docker stop SABnzbd
root     14798 14333  0 20:13 pts/0    00:00:00 grep SABnzbd

 

Only after I kill pid's 11836 and 11837, I get emhttp responsive again, but sab docker is still active.

 

"docker kill SABnzbd" seems to hang forever

 

***Interesting find***

After I issued the docker stop command, the sab log was appended with the following:

 

*** Shutting down runit daemon (PID 10)...
2015-04-28 20:08:08,917::WARNING::[__init__:172] Signal 15 caught, saving and exiting...

 

But sab webgui is still not responding and the docker container is still active and cannot be stopped

log.txt

Link to post

We are actively looking into this and testing with Docker 1.6 to see if this problem exists there. If not, phew. If so, we will need to open a bug report with docker. Bottom line: you should ALWAYS be able to kill a running container. If that doesn't work, something is broken.

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.