greg_gorrell Posted January 29, 2022 Share Posted January 29, 2022 (edited) Hello, I have an H ML350P with plenty of resources, been attempting to upgrade from 6.8.2 to 6.9.2 for quite some time now. Each time I do, I am unable to start more than a few docker containers before the Web GUI starts acting erratically. I am not sure what the cause is, and the only thing I see in the logs in common with each occurrence is the following "upstream time out" error messages: Jan 29 09:49:19 ML350P nginx: 2022/01/29 09:49:19 [error] 9325#9325: *902 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.0.51, server: , request: "POST /plugins/community.applications/scripts/notices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "10.0.0.101", referrer: "http://10.0.0.101/Main" After this error shows in the log, no commands to the server will work. While I can still navigate the web interface, aside from the Docker page which just tries to load endlessly, I am unable to send a command to stop or restart the docker service. The machine will not reboot either with the button on GUI or by command line. The syslog indicates the system is going down for reboot, but the nothing happens after that. I have not been able to pin this down to a particular container and seems to be fine when no containers are running. As soon as I fire up three or so, I can expect the issue to occur again. Also, I should note that I am using ZFS and that is where my docker config and containers are located. I have also tried deleting the docker image file as well, now I cannot even get them to run from the templates. I am thinking this may be a ZFS issue, but is there anywhere else I can look for some clues? Here is what happens when I tried to add my dockers back: Jan 29 10:20:44 ML350P nginx: 2022/01/29 10:20:44 [error] 7556#7556: *2249 upstream timed out (110: Connection timed out) while reading upstream, client: 10.0.0.51, server: , request: "POST /Docker/AddContainer?xmlTemplate=user%3A%2Fboot%2Fconfig%2Fplugins%2FdockerMan%2Ftemplates-user%2Fmy-UniFi-Video.xml&rmTemplate= HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "10.0.0.101", referrer: "http://10.0.0.101/Docker/AddContainer?xmlTemplate=user%3A%2Fboot%2Fconfig%2Fplugins%2FdockerMan%2Ftemplates-user%2Fmy-UniFi-Video.xml&rmTemplate=" Thanks in advance! ml350p-diagnostics-20220129-1040.zip Edited February 22, 2022 by greg_gorrell Quote Link to comment
Squid Posted January 29, 2022 Share Posted January 29, 2022 OK. If you think that it's being caused by the first error listed there, you can disable that via Apps - Settings (or Settings - Community Applications) and disable Emergency notifications.. Quote Link to comment
greg_gorrell Posted January 29, 2022 Author Share Posted January 29, 2022 Thanks for the reply Squid, but I honestly am not sure what would be causing this issue but the "upstream timeout" seems to be in the log every time this happens. To clarify though, I don't suspect CA has anything to do with it and just happens to be related to the job I was performing at the time this happened most recently. Generally, the timeout occurs when I am accessing the Docker page and not the CA Apps page, like it timesout when querying the service. Since I am getting no other information from the logs, I have no clue where to start but it seems that the issue lies with the query itself from the Web GUI and the Dockers. Maybe not, but when I start the array and run it with either Docker disabled completely or Docker enabled with no containers running, the issue does not manifest itself. After a couple Docker containers are running, at some point this timeout will occur and any subsequent commands to control a service will not execute, whether sent over SSH or the web interface. I am going to attempt to move all of the docker related stuff off the zpool and onto a cache drive managed by Unraid, but any assistance on how to better troubleshoot this would be greatly appreciated. Quote Link to comment
greg_gorrell Posted January 30, 2022 Author Share Posted January 30, 2022 Perhaps I am not asking in the correct way. Could somebody please explain to me how the web interface interacts with the underlying services? When I click "reboot server" on the main page, what has to happen for the "shutdown -r" command to be executed by the system? Is it possible that the web server component of Unraid sends a command to the system and will not move on until that command is completed? Is it possible I have an issue with Docker or ZFS and that issue is why the timeout is occurring, rendering the timeout more of a symptom than a cause for the problem? Thanks again. Quote Link to comment
trurl Posted January 30, 2022 Share Posted January 30, 2022 3 hours ago, greg_gorrell said: ZFS Is that the reason for all of the "dumpster" mounts? Quote Link to comment
greg_gorrell Posted February 22, 2022 Author Share Posted February 22, 2022 Yes, that is the name of the zpool. I did notice that everything works fine with a fresh docker.img file created on the cache or array via the settings and the appdata folders on the zpool, so it is definitely some weird little bug with using ZFS. It works for now, I'll see what happens with the official ZFS implementation when it comes around. Apologies for the delay in responding, I had a drive go on the other server that kinda took priority lately but thank you for taking the time to check out the diagnostics and offer input. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.