Jump to content

Unable To Start Docker Engine After Minor Update


Go to solution Solved by Sn3akyP3t3,

Recommended Posts

I think I narrowed down the problem to "custom networks" and I did eventually mount and load all containers from the original docker image, but I don't understand how or why it is working now if that is the correct root cause.

To start with I decided to update to 6.12.9 from 6.12.8.  Immediately after the update I saw on the web Docker tab "Docker Service Failed to Start".  I checked the logs and saw a lot of this repeating endlessly:

Quote

Mar 30 14:20:34 FakeServerName unraid-api[3848]: ✔️ UNRAID API started successfully!
Mar 30 14:20:35 FakeServerName unraid-api[3848]: ⚠️ Caught exception: connect ECONNREFUSED /var/run/docker.sock
Mar 30 14:20:35 FakeServerName unraid-api[3848]: ⚠️ UNRAID API crashed with exit code 1


I figured that the update must have done something bad so I tried to downgrade back to 6.12.8, but the issue with Docker unable to start persisted.  I don't know what to make of that.  Only that the trigger seems to have been the update.


After reading up on possible failures I figured perhaps there was a problem with I/O.  The cache drive where the appdata is stored and the docker.img is loading from had only around 50% utilization so it wasn't out of space.  I didn't see any OOM exceptions.  I figured maybe then the docker.img file was corrupt or out of room.  I grew the Docker image size from 200GB to 400GB without remembering that I can't reduce it once going up (will have to revisit that later).  I mounted the array to start Docker again and got the same error. Next I set the vDisk type from btrfs to xfs and UnRaid created a different filename so the old was remained preserved.  I rebooted and found docker could mount and I was able to install apps, but as expected there were no docker custom networks.  I shutdown docker and restored the vDisk back to btrfs.  From here I figured there must be something to do with networking.

 

I had been using a static IP up until now with static DNS set in UnRaid network config, but I figured maybe I botched that so I had my router setup with a static route for the MAC of the UnRAID eth0 connection and that seems to preserve the need for the static IP.  I also had set the box to use VLAN, but never got around to putting that into effect so I disabled that.  Starting docker with that change did nothing.

I figured then something might be editable with the Docker config itself so I started playing with networking settings from there alone.  Turns out that changing this

Quote

Preserve user defined networks:  Yes

To This

Quote

Preserve user defined networks:  No

was the only necessary hat trick required.  I truly don't understand why, because when I started up Docker after that and checked for the docker custom networks listing, "docker network ls", they were all there.  A log snippet showed one error, then rode on some degree of success.  Its been running happily now for hours, but I think this is a dangerous state to operate in for me since I don't want to lose the custom network settings.  A logging snippet of that said success:
 

Mar 30 14:21:15 FakeServerName emhttpd: shcmd (1199): /etc/rc.d/rc.docker start
Mar 30 14:21:15 FakeServerName root: starting dockerd ...
Mar 30 14:21:15 FakeServerName unraid-api[7190]: ⚠️ Caught exception: connect ECONNREFUSED /var/run/docker.sock
Mar 30 14:21:15 FakeServerName unraid-api[7190]: ⚠️ UNRAID API crashed with exit code 1
Mar 30 14:21:15 FakeServerName avahi-daemon[7169]: Service "FakeServerName" (/services/ssh.service) successfully established.
Mar 30 14:21:15 FakeServerName avahi-daemon[7169]: Service "FakeServerName" (/services/smb.service) successfully established.
Mar 30 14:21:15 FakeServerName avahi-daemon[7169]: Service "FakeServerName" (/services/sftp-ssh.service) successfully established.
Mar 30 14:21:21 FakeServerName unraid-api[7697]: ✔️ UNRAID API started successfully!
Mar 30 14:21:36 FakeServerName nmbd[7060]: [2024/03/30 14:21:36.755237,  0] ../../source3/nmbd/nmbd_become_lmb.c:398(become_local_master_stage2)
Mar 30 14:21:36 FakeServerName nmbd[7060]:   *****
Mar 30 14:21:36 FakeServerName nmbd[7060]:   
Mar 30 14:21:36 FakeServerName nmbd[7060]:   Samba name server FakeServerName is now a local master browser for workgroup FAKEWORKGROUP on subnet 10.10.8.130
Mar 30 14:21:36 FakeServerName nmbd[7060]:   
Mar 30 14:21:36 FakeServerName nmbd[7060]:   *****
Mar 30 14:22:01 FakeServerName kernel: docker0: port 1(vethb1f1076) entered blocking state
Mar 30 14:22:01 FakeServerName kernel: docker0: port 1(vethb1f1076) entered disabled state
Mar 30 14:22:01 FakeServerName kernel: device vethb1f1076 entered promiscuous mode
Mar 30 14:22:04 FakeServerName kernel: eth0: renamed from veth71d8bae
Mar 30 14:22:04 FakeServerName kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethb1f1076: link becomes ready
Mar 30 14:22:04 FakeServerName kernel: docker0: port 1(vethb1f1076) entered blocking state
Mar 30 14:22:04 FakeServerName kernel: docker0: port 1(vethb1f1076) entered forwarding state
Mar 30 14:22:04 FakeServerName kernel: IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered blocking state
Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered disabled state
Mar 30 14:22:09 FakeServerName kernel: device veth8cc6977 entered promiscuous mode
Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered blocking state
Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered forwarding state
Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered disabled state
Mar 30 14:22:14 FakeServerName kernel: eth0: renamed from veth90def87


Before I signed off that this was the only setting needed to flip I recreated the blocker with the same UnRAID API crashing behavior just by flipping this back:

Quote

Preserve user defined networks:  Yes

and just as before flipping it back to No allowed me to bring Docker back online with all those custom networks still readily available. At the first occurrence of success I decided to re-apply the upgrade again back to 6.12.9.  I flipped the "Preserve user defined networks" a handful of times because I couldn't believe what I was observing and thought maybe I had done something more significant that I had forgotten, but nope.  It was that simple.  The logs may be confusing when observing the errors coming and going at the trailing end.  (This would be the definition of madness I guess.)

Just in case asked, yes, I am using those custom docker networks.  They're not vestigial.

fakeservername-diagnostics-20240330-1733.zip

Edited by Sn3akyP3t3
Forgot to mention that I reverted back to the update after first sign of success.
  • Thanks 1
Link to comment
  • 3 weeks later...
Posted (edited)

Updated to 6.12.10 today.  The problem now is that the mentioned workaround above no longer works.  I'm unclear what triggered this to begin in the first place.  I'll be rebuilding my docker container structure tomorrow in an attempt to recover from this situation.

 

I also saw the comments in the release notes about changing from macvlan (which is the setting applied currently) to ipvlan, but that appears to be greyed out.  I don't know why I would have switched to macvlan, but if they're required for docker custom networks then that is likely why.

 

I did notice one awkward behavior never observed before while operating in this manner.  Whenever a docker container updates it deselects the custom docker network that was previously using.  This might be expected per the setting to `Preserve user defined networks: No`, but I wouldn't know for sure as I was only doing this because this is the only combo selection of settings that worked.

Edited by Sn3akyP3t3
Link to comment
20 hours ago, Vr2Io said:

If that, does it make conflict ? What special of that custom networks must preserve and no other method to eliminate it. 

No idea at the moment.  I have kept the docker image around for now in case there's some means to exhume some information that would be helpful to answer that question.  The logs from docker in this area are quite insignificant at identifying what exactly the root conflict really is:

Quote

failed to start containerd: timeout waiting for containerd to start


I suspect the problem to be related to "custom networks", but I have no solid proof.  Only that the relief in bringing the docker image back into service had to do with disabling preservation of custom networks and that one of the qwirk behaviors observed while I was able to operate in this degraded state was that selected custom networks deslected themselves whenever a docker container updated itself with the auto-update feature.

Link to comment
  • Solution

My resolution for this all was to recreate the docker image using the previous apps feature.  Rather than attempt the possible fate of resurrecting the nondeterministic behavior with custom networks I've decided to go whole enchilada with vlan tagging instead.  This provides the desired network segregation similar to what I was getting with custom networks, but probably far more granular and likely enables possibilities that I'm not yet aware of.

Link to comment

I m not expert of docker network, but interest on what other people doing and why need preserve custom network, in first, I m thinking does any VPN plugin / docker cause problem.

Anyway you found another solution, btw due to I put docker path to /tmp so every reboot need re-download all docker, that may help fix some hidden issue.  And I also apply VLAN to separate stuff for what I need. 

Edited by Vr2Io
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...