WebGUI becomes unresponsive when docker hangs


139 posts in this topic Last Reply

Recommended Posts

Figured I'd add to this. Just upgraded to 6.1 from 6.0.1.

 

 

Attempted to install a docker - the webgui hung. The docker never installed, just left an orphan image. Had to hard reset the server.

 

Please post contents of this file on your usb flash:

 

config/plugins/dynamix/dynamix.cfg

Link to post
  • Replies 138
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Still happens to me in 6.1.2.  Thought I'd solved it by moving the Plex housekeeping to a different time from the mover, but it's started happening again in 6.1.2

So for you too, the crash happens when the mover hits?  What if you manually invoke the mover through the webgui? Does it crash every time?

Link to post
  • 4 weeks later...

Bump.  Curious for an update on this, with three weeks since last post.  Is there a bug thread besides this one that seems to be petering out unresolved.    I'm monitoring a server 8000 miles away so a hard restart requires a neighbor to come in to push buttons for me. 

 

I'm still on 6.0.0 and updating from remote is iffy.  My WebUI freezes when SAB crashes, symptoms just like Veezer's.

 

Is it certain that it happens during Mover?

 

Dennis

Link to post
  • 2 weeks later...

I can also confirm that the webUI becomes unresponsive on any type docker install. The docker install seems to also seems to fail. I've tried upgrading a filebot docker, and installing a puddletag docker. Instead of the other window giving me a status, it just hangs forever. I am on the newest unRAID as well.

Link to post
  • 2 weeks later...

I am also experiencing this problem.  I have been following some of the suggestions in this thread however - to no avail.

 

1.  Web GUI becoming unresponsive

2.  Appears linked to Docker - I think it is my DelugeVPN that is crashing

3.  Telnet however all commands hang

    a:- "Diagnostics" does not run - starts but does not finish

    b:- All other console commands hang

    c:- I can cancel the telnet session and telnet back in - however all commands continue to hang

    d:- "powerdown" also hangs

 

It appears Docker related - however I am unable to work out definitely what the problem is as all console commands hang.  The only option I have is to hard restart the server. 

 

Open to all and any further suggestions.  This is happening now - between every 8 and 12 days.

 

Link to post
  • 2 weeks later...

This has randomly started happening to me every morning..

 

- Web GUI is unresponsive

- commands like `lsof` and `diagnostics` hangs the terminal

  - If I try to kill -9 them, I end up with D processes

- All Dockers still work - sabnzbd/crashplan/etc is accessible

- Can't connect via SMB

- Can't connect via AFP

 

Here's output of `ps aux` -> https://gist.github.com/JustinAiken/cba34143ed352f2c7ce6

 

Link to post

Update. I am still using the same version of unRAID 6.1.3, and dockers DO install, they just post no status screen and take an extremely long time comparatively to before. I just installed the musicbrainz one from Sparkly's repo and it took over 15 minutes before I had the GUI back and responsive again (I'm aware it's a large docker). I didn't lose anything else in the process, all other dockers were reachable, just no status of the installation of the docker, or update of a docker, and the UI hangs the entire time until the install is finished, which takes a long time. I believe this is why users think it's completely unresponsive, because the first 2 times it happened to me, after 10 minutes I was already trying to reboot and stop dockers (which I could not) and the array. Just waiting it out seems to work, but that's way too long to not have access to the unRAID UI, so something is definitely going on. Is there still supposed to be a status popup screen showing everything it downloads etc?

Link to post

I am also experiencing this problem.  I have been following some of the suggestions in this thread however - to no avail.

----------

Open to all and any further suggestions.  This is happening now - between every 8 and 12 days.

 

OK - I think I have rectified my problem and prevented it from hanging.

 

I followed the previous advice by setting my dockers to use specific CPUs as per this thread - http://lime-technology.com/forum/index.php?topic=36257.0

 

As of 5 mins ago I had the same exact prelude to the routine 12 or so day hang.  Deluge appeared to crash (which what has happened each other hang) I was able to restart deluge from the Webgui without the usual hang of the whole system.  If you are still experiencing this problem - give the CPU pinning a try.

 

Thanks

Link to post
  • 4 weeks later...

Welp, the sabnzbd docker has just caused WebGUI to hang. Trying to remotely solve this issue. First time it's ever happened to me.

 

Running unRAID 6.1.4

Running binhex's sabnzbd docker

Had a queue of 11 items of varying sizes (10mb to 500mb). Wondering if post processing caused it to hang??

 

Is a hard reboot the only fix?

 

Update:

ran the command ps -ef | grep docker

used the kill command to kill the two processes below

root     20937  9784  0 09:49 ?        00:00:00 /usr/bin/php /usr/local/src/wrap_post.php update.php  %23command=%2Fplugins%2Fdynamix.docker.manager%2Fscripts%2Fdocker&%23arg%5B1%5D=restart&%23arg%5B2%5D=binhex-sabnzbd
root     20938 20937  0 09:49 ?        00:00:00 /usr/bin/docker restart binhex-sabnzbd

 

WebGUI is now responding, but docker tab shows sabnzbd still running. Can't access sabnzbd WebUI though.

 

Docker stop and docker kill just hang.

 

I tried to kill the process

root     10824 10382  0 Dec01 ?        00:00:09 docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 9090 -container-ip 172.17.0.3 -container-port 8080

 

But it returns as

root     10824 10382  0 Dec01 ?        00:00:09 [docker] <defunct>

 

 

Update 2:

In WebGUI I went to Settings -> Docker and noticed this nice little banner above the settings

"Your existing Docker image file needs to be recreated due to an issue from an earlier beta of unRAID 6. Failure to do so may result in your docker image suffering corruption at a later time. Please do this NOW!"

 

I went ahead and shut off Docker from the WebGUI in Settings -> Docker.

 

Attempting to stop the array and stuck at "Retry unmounting disk share(s)." Doing a ps -ef still shows a <defunct> process stuck open

nobody   10926 10842  0 Dec01 ?        00:22:08 [python2] <defunct>

The PPID is of the <defunct> process is 10842, which is [supervisord] process. I can't seem to kill it either.

 

Update 3:

Nothing else to do but:

shutdown now -r

Link to post
  • 1 month later...

 

Update 3:

Nothing else to do but:

shutdown now -r

 

Lucky you.  My sabnzbd hang even prevents a remote reboot (powerdown -r) I have to get my neighbor to come in and push the power button.

 

I've been managing by re-starting sabnzbd from the command line about once per week.  It seems to crash for me if sabnzbd is busy.  It doesn't seem related to mover activity, since I disable mover and it has crashed twice since.

 

 

I'm still on 6.01 since I live a continent away and don't want to upgrade while remote.

 

Link to post
  • 1 month later...
  • 2 weeks later...

I have having the exact same issues as everyone is describing above. I am running 6.1.7. Are there any fixes yet? It is happening with a few of my dockers (Deluge, Plex, MineOS)

 

I ran into the same issue as well with 6.1.8. The Plex Media Server docker caused the WebGUI to become extremely slow and eventually completely unresponsive. I was able to SSH in and manually kill the docker process to get things back up and running. CPU and memory usage during this time was extremely low (monitoring with HTOP and free -m).

 

For now, my PMS, NZBGet, and Sonarr are installed on a VM to avoid this problem again.

Link to post

I'm having the same issue, but a step beyond.

 

It even manages to crash Kodi on another computer (which would be kinda impressive if it wasn't so annoying). At first I thought it was my weak hardware. I had a single core semptron with a Passmark score of less than 400 and 2GB of RAM. I upgraded to an i5-4440 and 16GB or RAM and I'm still having this issue.

 

The crashes stopped once I disabled the dockers on the all hardware and I'm getting them again now on the new hardware, which should be more than good enough to run Sabnzbd and Sonarr.

 

I thought the whole point of docker was that the applications would be isolated on their own little bubble and not affect the whole array, but the way it's working is completely broken. What's even the point?

 

Should we just go back to the old plug ins and forget about dockers? I never had these kind of issues with plug-ins before, so I'm wondering how dockers became the more accepted method to run apps if they're this broken.

 

Link to post

For those with dockers that use Python and https (SSL).

 

I was getting the almost daily UI "crashes", dockers GUI sometimes available sometimes not, rdp connections to VM's dropping, unable to telnet in, shares unavailable. All seemed quite intermittent though too with things suddenly being available again (even if for a few seconds before being unavailable again). Went through all sorts of things. Found weirdly that when I sometimes jacked the cat6 from the back of the box and plugged it back in it seemed to free the server up again. This told me that the GUI wasnt truly crashing. I also played with setting CRON to daily restart network interface and hourly restart docker. Seemed to work but was not a long term solution.

 

Thoroughly tested network h/w, replaced switches and tried a different router. Issue persisted.

 

In the syslog I started to note Python segfaults being made by "something" but couldn't easily see by what. Run mem test, all perfect. So after a few weeks this Feb I've isolated the issue to Sabnzbd Docker. I went on further to see what was happening in there.

 

I have been debugging an issue with this for a while now and I have all but tied it to the number of SSL connections the docker were making. I tried all manor of different config settings. I use Usenet in my own business to share non sensitive 3D drawing files, images and survey info between 5 companies so it can run 24x7 some days (as the files are BIG) so the crashes (and the debugging) was annoying me let alone disrupting business.

 

Then I came across a post on a Linux forum tying Python segfaults to SSL. Then I went to look at how Sabnzbd was using SSL.

 

I make SSL connections using Sabnzbd and had set Sabnzbd to default to the max connections my accounts allow. 30+30+30 = 90 always on connections. My connection is 25Mb/s so I don't really need all those connections but it was a case of I'm paying for them so I'll use them. Seems that was a mistake.

 

I can saturate my connection by having 4 connections on each server totalling just 12 now not 90 so I made the change. It has been running 5 days now with no issue. No segfaults, no GUI responsive issues. All good.

 

Not sure if this is the same / similar issue but hopefully it helps some. It's taken me a while and some pain to debug.

Link to post
  • 2 weeks later...

Isue not solved really anoying freezes.  Ver 6.1.9 not resolve problem. I have frezes on SABnzbd from needo on large files and high speeds download. Now I have nzbGet from binhex and the problem is the same... Only reduce problem to lower connections but even 1 connection and 60GB of data kiling my unraid after fhew hours or less.... Anybody try Install SAB or nzbGet outside docker ?

Link to post

Anyone experiencing this issue that upgrades to the 6.2 beta, please report back if it continues for you.

 

I just noticed that I'm having this issue on the 6.2 beta. Previously had no issues on 6.1.9. Any help would be greatly appreciated!It seems to lock up sab when it's downloading and unpacking downloads. keeps my whole system pegged at 50% cpu usage. When I try to stop the sab docker, the webui becomes unresponsive.

 

Also, to echo danioj post about the SSL issues with sabnzbd; I just looked in the system log as I tried to stop the Sab docker, and I see a line that say "TCP: request_sock_TCP: Possible SYN flooding on port 9090. Sending cookies. Check SNMP counters". Sabnzbd is the only program I have that is running on port 9090.

Link to post

Anyone experiencing this issue that upgrades to the 6.2 beta, please report back if it continues for you.

 

I just noticed that I'm having this issue on the 6.2 beta. Previously had no issues on 6.1.9. Any help would be greatly appreciated!It seems to lock up sab when it's downloading and unpacking downloads. keeps my whole system pegged at 50% cpu usage. When I try to stop the sab docker, the webui becomes unresponsive.

 

Also, to echo danioj post about the SSL issues with sabnzbd; I just looked in the system log as I tried to stop the Sab docker, and I see a line that say "TCP: request_sock_TCP: Possible SYN flooding on port 9090. Sending cookies. Check SNMP counters". Sabnzbd is the only program I have that is running on port 9090.

 

A bit of an anecdotal comment BUT I am coming to the conclusion that SAB doesn't work well with unRAID at all, whether it be via a Plugin or a Docker.

 

My Network connection to the unRAID Server (on an X10SL7-F) running SAB VERY regularly with some heavy usage. With limited to no usage, the server is stable. As I have noted before I have done some testing which has included switch, cable and router replacements, neither of which has fixed the issue. The problem gives the "appearance" that the WEB GUI is down BUT in fact for me it is the Network Connection that becomes unstable. The only thing I have found to do is do one of 4 things:

 

- Restart the SAB Container (docker restart sabnzbd);

- Restart Docker; (/etc/rc.d/rc.docker restart)

- Restart Inet1 (/etc/rc.d/rc.inet1 restart)

 

Those only work if you can telnet or SSH into the CLI. Sometimes you can, sometimes not.

 

If you can't telnet or ssh to the CLI then there are only 2 options I have found which preclude a hard reset and an unclean shutdown:

 

- utilising IPMI JAVA console, run one of those commands (as I mention below, the 3rd one is probably the best) OR

- Remove the network cable from the LAN port on your Server and Plug it back in again.

 

One of those first 4, seem to always do the trick. I feel that ultimately 1&2 end up "resetting" the network interface and as such doing point 3 is probably the best thing to do. I have set a CRON job to do this daily at 5:20am.

 

telnet into server

 

nano /boot/config/plugins/dynamix/restartnetwork.cron

 

If the "restart network.cron" file exists it will edit it. If not, it will create it. Paste the following into this file:

 

# Stop then Start the Network service every day at 5:20am
20 5 * * * /etc/rc.d/rc.inet1 restart |& logger

 

<CTRL> + <X> to exit and Save output. The Cron Job is now saved. I have noticed that the dynamix cron files are not loaded just because you create a file BUT you can force this. Just go into the Settings>Scheduler AND make a change (so the Apply Button gets un-greyed out) and change it back (Apply Button remains un-greyed out) and Hit Apply (essentially not changing ANYTHING) then the cron file gets loaded.

 

The server doesn't go down and doesn't "seem" to impact Dockers. I get a minor issue whereby the local dos names of VM's become un resolvable for a time BUT that seems to rectify itself.

 

Since I have done that I seem to be having less issues. I don't know if this has something to do with large volumes of traffic, connections (as mentioned in my previous post) or what BUT I know what appears to be working.

Link to post

Thanks for the reply to this issue! At least I have some more things to try the next time it happens. Funny thing is, I never once had an issue with Sab when I was on 6.1.7-6.1.9. This issue only just recently started with the 6.2 beta. Previously my Sab would be downloading many files at once, as well as unpacking and processing them. I did turn off SSL for port 9090 on sab, as well has limit my connections to the servers, and pause any current downloads. My issue at the moment seems to be when Sab is unpacking large files and going through verification of them.

Link to post

Thanks for the reply to this issue! At least I have some more things to try the next time it happens. Funny thing is, I never once had an issue with Sab when I was on 6.1.7-6.1.9. This issue only just recently started with the 6.2 beta. Previously my Sab would be downloading many files at once, as well as unpacking and processing them. I did turn off SSL for port 9090 on sab, as well has limit my connections to the servers, and pause any current downloads. My issue at the moment seems to be when Sab is unpacking large files and going through verification of them.

 

You can also try a few of these other SAB Config things (from the SAB wiki) when you are undertaking high speed downloading:

 

- Set the article cache in Config -> General. This will keep articles in memory and not write them to disk (which is slower). Depending on how much RAM you can spare two good values are 70M or 120M (the M denotes megabytes and is required). If you download a lot of rar files that are 100MB or larger then use the latter value.

- Lower your connection count in Config -> Servers. It may seem counter-intuitive, since more connections should be faster than fewer connections, but if you use the 50+ connections some hosts give you the overhead from constantly opening and closing connections can slow you down. So start at the max allowed connections and slowly lower your count until you max out your speed. Or do it the other way round: start with 5 connections, measure the speed, and raise to 7, measure again, 9, measure again, etc Normally 10 connections are enough.

- Enable the Pause Downloading During Post-Processing Switch. This pauses downloading at the start of post processing and resumes when finished.

 

Note, that if none of the above methods make a difference, have a look at how much CPU is being used while downloading, if it is at 100% it is likely your processor is too slow to maintain high enough speeds.

 

My personal view is that these "help" BUT I personally feel there is something to do with SSL, connection count, Python and unRAID. Something is not quite right with the interaction. I don't know what it is BUT I am not the only one with this issue. I just don't know what the cause is to try and fix it. I feel like I am getting there though ...

Link to post

Thanks for the reply to this issue! At least I have some more things to try the next time it happens. Funny thing is, I never once had an issue with Sab when I was on 6.1.7-6.1.9. This issue only just recently started with the 6.2 beta. Previously my Sab would be downloading many files at once, as well as unpacking and processing them. I did turn off SSL for port 9090 on sab, as well has limit my connections to the servers, and pause any current downloads. My issue at the moment seems to be when Sab is unpacking large files and going through verification of them.

 

You can also try a few of these other SAB Config things (from the SAB wiki) when you are undertaking high speed downloading:

 

- Set the article cache in Config -> General. This will keep articles in memory and not write them to disk (which is slower). Depending on how much RAM you can spare two good values are 70M or 120M (the M denotes megabytes and is required). If you download a lot of rar files that are 100MB or larger then use the latter value.

- Lower your connection count in Config -> Servers. It may seem counter-intuitive, since more connections should be faster than fewer connections, but if you use the 50+ connections some hosts give you the overhead from constantly opening and closing connections can slow you down. So start at the max allowed connections and slowly lower your count until you max out your speed. Or do it the other way round: start with 5 connections, measure the speed, and raise to 7, measure again, 9, measure again, etc Normally 10 connections are enough.

- Enable the Pause Downloading During Post-Processing Switch. This pauses downloading at the start of post processing and resumes when finished.

 

Note, that if none of the above methods make a difference, have a look at how much CPU is being used while downloading, if it is at 100% it is likely your processor is too slow to maintain high enough speeds.

 

My personal view is that these "help" BUT I personally feel there is something to do with SSL, connection count, Python and unRAID. Something is not quite right with the interaction. I don't know what it is BUT I am not the only one with this issue. I just don't know what the cause is to try and fix it. I feel like I am getting there though ...

 

Thanks for all the tips. I'll try some of them out. My processor is definitely more than enough though. I have a dual Xeon 5530 setup (16 threads). Also I believe my article caching was set pretty high by default. I think it was something like 200 or 256M.

Link to post

I'm also experiencing this, being unable to stop the docker process and first SAB hangs, then the whole unRAID UI hangs. It seems to happen during par2, but I don't believe my CPU is at fault as I've never had this problem before. I have tried scrubbing my cache drive and had 0 errors, but the SMART reports for it do indeed contain some old age and prefail attributes. Is it possible that my cache drive is causing this and needs replaced? Or is there a way to find a definitive answer before spending the $$ on one to find out? I can't imagine everyone with this issue has a cache drive on the way out but I suppose it's possible? I'm also using needo's docker still. Should I switch to another?

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.