Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Docker containers randomly lose access to the network

Featured Replies

Hi!

 

I have an issue that I don't quite know how to troubleshoot...

 

I have a bunch of docker containers running on my Unraid server. An nginx reverse proxy and a few services (Seafile, Immich, Plex, some static websites, etc.). I'm also running tailscale directly on the server for remote access, but not in the containers. For about two years, this was working perfectly. But during the last 3-6 months, a weird issue started appearing:

 

The containers all should have access to the outside network. However, after running for some time, eventually some of the containers just... lose that access? And once that happens, the computer cannot be restarted normally and needs to be turned off by disconnecting power. This is an issue because some of those services do actually need to talk to each other, or to some APIs that live on the internet.

 

Detailed description:

After a fresh start, each container can access the other containers and the internet (e.g. ping/wget works for e.g. google.com). However, after some random period of time, one of the containers (any one of them, not a specific one) will lose access to the outside network. I can still access the service that is running in that container, but running ping/wget to any URL that is outside of the container will just timeout. At this point, if I try to restart the computer, it will just freeze and I have to unplug the power. Once it reaches this "error state", I can still turn docker on/off in the Unraid interface, but it will not restore the network connectivity. I believe that completely deleting the problematic container and creating it from scratch does help, but I haven't tried that in a while.

 

Any idea what to look for or what to try? 

Solved by daemontus

  • Community Expert

most likely a misconfiguration and time out in the reverse proxy.

docekr resolv configs (with are there dns setting in the containters) and network mode type may be playing a role here as well

Are you using macvlan? or IPvlan?

Please post unraid diagnostics.

  • 2 weeks later...
  • Author

Hi! Thank you for the suggestion!

(1) Sadly, macvlan. I am currently only running a basic router box that gets very upset when I enable IPvlan, but I might be able to make it work somehow.

 

(2) Regarding the misconfiguration: You mean the reverse proxy container? That one is usually fine. What is breaking most often are the services (although it does seem to be random, sometimes it is indeed the reverse proxy that breaks). What I mean by that: Let's say I have a Seafile container that I am accessing through the reverse proxy, and the Seafile container needs to connect to an identity provider (to authenticate users) that is not on my network. Once the issue occurs, the Seafile container cannot access the identity provider anymore, but everything works fine otherwise. So, as long as I am logged in, the service is completely fine, but the login itself is not working because Seafile can't talk to the identity provider. Once I reboot the server, it works again for a week or two. If I open a console directly in the Seafile container and try to run wget on something on my server, it works fine. If I wget the identity provider, it times out. (If I wget the identity provider from the Unraid terminal directly, it of course works).

 

(3) Also, the reverse proxy config has not changed for two years now, and this is only happening for the last ~6 months.

 

(4) Diagnostics are now attached :)
 

zavazadlo-diagnostics-20241031-1653.zip

  • Author

Also... I am somewhat suspicious of my routing table? It almost looks like docker is doing something weird with its networks... I tried to clean it up by removing the "none", but it eventually re-appears. This is captured when everything is working... I'll see if anything changes once the system breaks again.

 

root@Zavazadlo:~# docker network ls
NETWORK ID     NAME             DRIVER    SCOPE
0882f7a85bf4   bridge           bridge    local
0ea9b4c23941   eth1             macvlan   local
deb2eb468f02   host             host      local
6c9a6201c032   immich_default   bridge    local
b160a6775829   none             null      local
55613d251b52   seafile-net      bridge    local

Screenshot2024-10-31at16_55_53.thumb.png.401b3f7ad4a47075a86f231d13265a88.png

  • Community Expert

you may need to do docker inspect commands. to grab the additional networking information.

docker ps

 

docker inspect <container id #######>

the br-### is created at boot and tells me that you have 2 dockers that make their own docker network bridge. (assumed)Probably a compose file.

Ideally, if running a reverse proxy like nginx, you want nginx to be running in the same subnet to be able to communicate and talk with each other.

Please confirm that you have unraid docker setting host access enabled to help with some of the ip routing you described:
image.png.1756ab158dd3b58f03261c1de223f326.png

looks like you have 2x 10 gb eth0 and eth1 as a network. it also appears you have bridging enabled?(if reading the diag logs right...). Based on the network you described, I would also enable bonding

image.png.eba88bb5fa2e6ed75ccb61d942fda240.png

review:


Where I attempted to explain why br off (before unraid potential fixed the macvaln issues...) why bonding and other linux networking basics...
*there are still some other networking glitches with it though... but more of a issues with how vhost / vbr0 tap back into make a parent interface for internal unraid networking...

ATM I don't see a problem with you networking or routes... What you have described to me is the ability for the docker... again due to dns / resolv config not be able to make it off and into the internet...

I Would advise if using compose to setup the networking to use your default created network bridge and not make there own bridge networks...

Thank you for sharing your docker network ls

6c9a6201c032 immich_default bridge local

55613d251b52 seafile-net bridge local

^ as theses may need to be removed and you run theses under the defualt network bridge that unraid provides so they can all talk to each other under 1 subnet....

^ this is also why you have 2 br-### as that is used to tap into and use as a network for it to get a ip and have network related servcies....

you may be dealing with a layer3/layer2 networking issues... as i think immich is a network based docker....

review post on the resolv config and set a custom dns to the dockers...
use a public dns like google 8.8.8.8 for testing....

 

  • Author

Thank you very much for the detailed answer!

 

Regarding your questions:

 * Host access is enabled.

 * The two interfaces are just 1gbit, but you are right that bonding is disabled. Original plan was to passthrough one of them to a VM for streaming games, but in the end it stayed unused. 

 

I guess this fixes the "I don't know where to start troubleshooting" :)

I now have a few things to look for and test. (1) I'll try to experiment with bonding. (2) I'll reconfigure the docker containers to get rid of the extra bridge networks.

 

I'll report back one I have more info. The failure has not happened since we started this thread, so I haven'd had a chance to look into it while it's not working. I guess this will also yield some new info.

Once more, thank you very much.

  • Author
  • Solution

Soo... I think it's fixed.

 

The problem was tailscale, even though tailscale is not used by any of the docker containers directly. Some more info is available here: https://github.com/tailscale/tailscale/issues/12108 and here https://www.reddit.com/r/docker/comments/1do20m6/docker_container_randomly_losing_ability_to/, but overall what happened seems to be this:

1. Tailscale is installed on the Unraid server for remote management. The docker containers don't use it for anything relevant.

2. However, tailscale changes the DNS settings of the Unraid server to allow name resolution across the VPN network. As I learned, this is largely fine, as long as the containers use the same DNS config as the host machine.

3. If a DHCP IP renewal happens, this also changes the DNS settings, at which point tailscale automatically comes in and "fixes" the config. But this may not be correctly reflected by the docker containers, especially if that container has its own IP address or it uses a custom docker network (in such case, the container does not even use the same config as the host machine anyway).

4. My understanding is that the reason why the issue was appearing randomly is because it was triggered by the DHCP renewal.

 

5. Earlier this year, tailscale added a security feature to prevent "unauthorized" DNS requests on the VPN network. Unfortunately, if the configuration in the docker container and on the host machine are out of sync, this meant that any DNS request coming from that container was recognized as malicious and ignored by the tailscale resolver, which would otherwise forward it to the "normal" DNS server before.

6. So for now, using --stateful-filtering=false with tailscale seems to have fixed the issue.

Anyway, thank you again for the help. I was looking at the wrong things, but at least you gave me a list of stuff that I checked to make sure they are probably ok and so I know I have to look elsewhere :)

 

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.