• [6.9.2] Docker with multiple connected networks cannot start on reboot / docker daemon restart


    hdlineage
    • Retest Minor

    Hello,

    I'm currently using Home Assistant docker connected to two networks: one is custom created (using docker network create), the other one is br0 (my Lan network).

     

    I found that if a second network is connected to the docker (using docker network connect), on unraid reboot or even just turning off and on the unraid docker daemon the container will not auto-start. When trying to start the container manually I get the error message "No such container."

     

    The only way to fix it is by a force image update or by editing/updating the container config (so that a new container is created). 

     

     

     




    User Feedback

    Recommended Comments

    i used portainer to add the bro to my dockers that use a proxy network.  i never really found a way to make it stick

     

    however I never really dug into it. Hopefully someone else has and its easy to do

    Link to comment

    A few days ago I had this issue as well, I spent a little time researching it.

     

    Unraid re-creates the Docker networks on reboot (actually, the service script destroys them on stop then re-creates them on start, so this happens even if you just restart the docker service).

    When the networks are re-created they get new network ids.

     

    Problem is, the container config still references the old network ids (you can see this if you do a docker container inspect <container>) so if you look in syslog you'll see messages like these:

     

    unraid rc.docker: Plex-Media-Server: Error response from daemon: network ea31527bec520a923c5c8a466c0e265b775ba66262c7613d08684c786f5de5b4 not found
    unraid rc.docker: Error: failed to start containers: Plex-Media-Server

     

    If you try to start it in the GUI it's not very good at relaying the actual error message, you just get a 'No Such Container' error (as you noted).

     

    If you look carefully through /etc/rc.d/rc.docker you'll see how Unraid deals with this with respect to the 'default' network assigned to the container (the one you chose in the GUI when you created the container) - it will read the GUI-stored container 'config' from an xml template under /boot/config/plugins/dockerMan/templates-user/my<container>.xml and in there is a <Network> tag with the network name, it'll make note of the container and the network in an array and later in the script it will loop over the array and re-connect the networks to the containers so the network id is current and they will start.

     

    Problem is it ignores any additional networks that are attached later (outside the GUI, or even in the GUI but with the 'Post Arguments' trick you so often see in the forums).

     

    Sooo... I spent a little time 'massaging' the rc.docker script to also add any additional networks into the same array so they, too, will be re-attached later.

     

    If you want to do the same you'll need to edit /etc/rc.d/rc.docker and, around line ~179 you should see a block of code like this:

     

      # get container settings for custom networks to reconnect later
      declare -A NETRESTORE
      CONTAINERS=$(docker container ls -a --format='{{.Names}}'|tr '\n' ' ')
      for CONTAINER in $CONTAINERS; do
        # the file case (due to fat32) might be different so use find to match
        XMLFILE=$(find /boot/config/plugins/dockerMan/templates-user -maxdepth 1 -iname my-${CONTAINER}.xml)
        if [[ -n $XMLFILE ]]; then
          THIS_NETWORK=
          THIS_IP=
          while read_dom; do
            [[ $ENTITY == Network ]] && THIS_NETWORK=$CONTENT
            [[ $ENTITY == MyIP ]] && THIS_IP=${CONTENT// /} && THIS_IP=${THIS_IP//,/;}
          done <$XMLFILE
          # only restore valid networks
          if [[ -n $THIS_NETWORK ]]; then
            THIS_ID=$(docker container inspect "$CONTAINER" --format='{{.ID}}')
            NETRESTORE[$THIS_NETWORK]="$THIS_ID,$THIS_IP ${NETRESTORE[$THIS_NETWORK]}"
          fi
        fi #<-- insert additional code BETWEEN this line
      done #<-- and this line

     

    You'll need to insert the following, additional code in-between the last 'fi' and the last 'done' (see my 'insert here' style comments above):

     

        # the loop above is based on the xml template which only defines one network - what about
        # containers where someone has done docker network connect <second network> <container>?
        # for those additional networks we need to cycle through them and add them to the array also
        ALL_CONTAINER_NETWORKS=$(docker container inspect $CONTAINER\
          --format='{{range $key, $value := .NetworkSettings.Networks}}{{$key}},
            {{if $value.IPAMConfig}}
              {{if $value.IPAMConfig.IPv4Address}}{{$value.IPAMConfig.IPv4Address}}{{end}}
              {{if $value.IPAMConfig.IPv6Address}}{{$value.IPAMConfig.IPv6Address}}{{end}}
            {{end}}
          |{{end}}'\
        )
        # an unfortunate side-effect of spreading the command above across multiple lines (for readability) is those newlines
        # sometimes end up in the final string, so take this opportunity to remove extra spaces and newlines from the result
        ALL_CONTAINER_NETWORKS=${ALL_CONTAINER_NETWORKS//[ $'\n']/}
        for CN in ${ALL_CONTAINER_NETWORKS//|/ }; do
          AN_ADDITIONAL_CONTAINER_NETWORK=${CN%,*}
          AN_ADDITIONAL_CONTAINER_IP=${CN#*,}
          if [[ -n $AN_ADDITIONAL_CONTAINER_NETWORK ]] && [[ $AN_ADDITIONAL_CONTAINER_NETWORK != $THIS_NETWORK ]]; then
            echo "container $CONTAINER has an additional network that will be restored: $CN" | logger -t $(basename $0)
            NETRESTORE[$AN_ADDITIONAL_CONTAINER_NETWORK]="$THIS_ID,$AN_ADDITIONAL_CONTAINER_IP ${NETRESTORE[$AN_ADDITIONAL_CONTAINER_NETWORK]}"
          fi
        done

     

    Then save it and try to stop/start the docker service & see if it works OK.

     

    If so, you'll need a way to make it permanent since /etc is sitting on a ramdisk and will be blown away on the next reboot.

     

    The simplest way I found is to store a copy of the modified rc.docker script under /boot/scripts (I created the scripts directory for this) and then add a line to /boot/config/go to replace the one under /etc/rc.d with the modified one, like so:

     

    mv /etc/rc.d/rc.docker /etc/rc.d/rc.docker_orig && cp /boot/scripts/rc.docker.hack /etc/rc.d/rc.docker && chmod 755 /etc/rc.d/rc.docker

     

    This will run very early on in the boot process, before docker is started.

     

    This seems to work well for me, I am on 6.10 at the moment, I make no guarantees that it will work for everyone but feel free to tweak it for your needs if it doesn't!

     

    What would be great is if some form of this could make it into the next release so we don't have to make hacks like these!

     

    • Like 1
    Link to comment

    I am impressed by the work you put in to find the problem and provide a working solution.

    Thank you for the effort, and yes I agree this should be included in future releases. 

    I believe this is an important feature as some services need to be run on multiple different networks (Apple home kit for example).

    Link to comment

    so . . . since i am still having the issue, i figured i would come here.

    Now, the thing is . . . i understand what you're saying and what i need to do.

    However . . . how do i get to the file? :D

     

    Perhaps this could be a userscript?

    Link to comment

    HA! that was the first thing I tried.

     

    In order to use a user script you would have to mark the container to NOT auto start and you'd have to both attach your network as well as start the container in your user script and you'd have to schedule it for first array start.

     

    The issue with that approach is it leaves a LOT of holes, for example simply stopping and starting the docker service will destroy & re-create the networks then when you manually start your container (because it's not set to auto-start, remember?) it will fall flat on its face.

     

    When I went down this path it turned out to be full of holes and the effort to work around all the holes was far more than the solution I noted above.

     

    In order to do this properly you need to intervene AFTER the rc script has destroyed & re-created the docker networks but BEFORE it tries to start up the containers.

     

    No amount of go file or user script magic will work, unfortunately.

     

    The only place to intervene is that spot in the rc script I noted in my previous post.

     

    To get you started do:

     

    mkdir /boot/scripts
    cp /etc/rc.d/rc.docker /boot/scripts/rc.docker.hack
    nano /boot/scripts/rc.docker.hack
    

     

    Then make the changes I described in the nano editor, save it (ctrl-x and answer y when it asks to save). To test it copy it to /tmp (since unraid won't allow anything to execute from the flash drive, even if you chmod it executable):

     

    cp /boot/scripts/rc.docker.hack /tmp/ && chmod u+x /tmp/rc.docker.hack
    
    #stop
    /tmp/rc.docker.hack stop
    
    #start
    /tmp/rc.docker.hack start
    
    #restart
    /tmp/rc.docker.hack restart

     

    Once you're satisfied it works copy it back to /boot/scripts/:

     

    cp /tmp/rc.docker.hack /boot/scripts/

     

    (it'll overwrite) and then edit /boot/config/go:

     

    nano /boot/config/go

     

    And add the line I noted in my previous post, save it like before.

     

    *This will copy-in your 'hack' script to /etc/rc.d very early in the boot process (since lines in the go file are ran very early) so YOUR script is ran to start docker, not the original one, at boot as well as any time the docker service is restarted (either by you from the GUI or when the array is stopped and started, etc.).

     

    This is the cleanest way to get things working as they should.

     

    Link to comment

    Just wanted to reply & note this is now fixed in the 6.11.1 release:

     

    On 10/6/2022 at 5:56 PM, limetech said:

    Updated docker to v20.10.18 and improved networking:

    • When DHCP is used, wait for IPv4 assignment before proceeding on system startup, this avoids a possible race-condition at boot time when host access to custom networks is enabled.
    • Allow user defined networks to be reconnected at docker service start. Now all defined networks will be automatically reconnected.
    Link to comment

    Awesome. Been waiting for this!

     

    Also the DHCP vs local thing. many times i've had the issue where it would simply assign a duplicate network address because it couldn't communicate over the vlan's yet.... I can finally make it dynamic again!

    Link to comment

    Stumbled upon this issue while trying to resolve the same error I’m getting from the CA Backup / Restore Appdata  plug-in. Is this something that would need to be fixed in the plug-in?
     

    I’m on 6.11.5

    Link to comment

    I think this problem still persists when I connect a container to two networks created by unRAID, say, br0 and br0.10. 

    Link to comment

    It was fixed for a time, then it came back in 6.12, reported here:

    Quote

     

    Not sure how to elevate it to the devs, it's a simple fix.

    In the mean time I went back to my workaround/hack.

    • Like 1
    Link to comment
    1 hour ago, user12345678 said:

    It was fixed for a time, then it came back in 6.12, reported here:

    Not sure how to elevate it to the devs, it's a simple fix.

    In the mean time I went back to my workaround/hack.

    Appreciate it! I wasn't able to find this page. You saved my day!

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.