Sn3akyP3t3 Posted March 30 Share Posted March 30 (edited) I think I narrowed down the problem to "custom networks" and I did eventually mount and load all containers from the original docker image, but I don't understand how or why it is working now if that is the correct root cause. To start with I decided to update to 6.12.9 from 6.12.8. Immediately after the update I saw on the web Docker tab "Docker Service Failed to Start". I checked the logs and saw a lot of this repeating endlessly: Quote Mar 30 14:20:34 FakeServerName unraid-api[3848]: ✔️ UNRAID API started successfully! Mar 30 14:20:35 FakeServerName unraid-api[3848]: ⚠️ Caught exception: connect ECONNREFUSED /var/run/docker.sock Mar 30 14:20:35 FakeServerName unraid-api[3848]: ⚠️ UNRAID API crashed with exit code 1 I figured that the update must have done something bad so I tried to downgrade back to 6.12.8, but the issue with Docker unable to start persisted. I don't know what to make of that. Only that the trigger seems to have been the update. After reading up on possible failures I figured perhaps there was a problem with I/O. The cache drive where the appdata is stored and the docker.img is loading from had only around 50% utilization so it wasn't out of space. I didn't see any OOM exceptions. I figured maybe then the docker.img file was corrupt or out of room. I grew the Docker image size from 200GB to 400GB without remembering that I can't reduce it once going up (will have to revisit that later). I mounted the array to start Docker again and got the same error. Next I set the vDisk type from btrfs to xfs and UnRaid created a different filename so the old was remained preserved. I rebooted and found docker could mount and I was able to install apps, but as expected there were no docker custom networks. I shutdown docker and restored the vDisk back to btrfs. From here I figured there must be something to do with networking. I had been using a static IP up until now with static DNS set in UnRaid network config, but I figured maybe I botched that so I had my router setup with a static route for the MAC of the UnRAID eth0 connection and that seems to preserve the need for the static IP. I also had set the box to use VLAN, but never got around to putting that into effect so I disabled that. Starting docker with that change did nothing. I figured then something might be editable with the Docker config itself so I started playing with networking settings from there alone. Turns out that changing this Quote Preserve user defined networks: Yes To This Quote Preserve user defined networks: No was the only necessary hat trick required. I truly don't understand why, because when I started up Docker after that and checked for the docker custom networks listing, "docker network ls", they were all there. A log snippet showed one error, then rode on some degree of success. Its been running happily now for hours, but I think this is a dangerous state to operate in for me since I don't want to lose the custom network settings. A logging snippet of that said success: Mar 30 14:21:15 FakeServerName emhttpd: shcmd (1199): /etc/rc.d/rc.docker start Mar 30 14:21:15 FakeServerName root: starting dockerd ... Mar 30 14:21:15 FakeServerName unraid-api[7190]: ⚠️ Caught exception: connect ECONNREFUSED /var/run/docker.sock Mar 30 14:21:15 FakeServerName unraid-api[7190]: ⚠️ UNRAID API crashed with exit code 1 Mar 30 14:21:15 FakeServerName avahi-daemon[7169]: Service "FakeServerName" (/services/ssh.service) successfully established. Mar 30 14:21:15 FakeServerName avahi-daemon[7169]: Service "FakeServerName" (/services/smb.service) successfully established. Mar 30 14:21:15 FakeServerName avahi-daemon[7169]: Service "FakeServerName" (/services/sftp-ssh.service) successfully established. Mar 30 14:21:21 FakeServerName unraid-api[7697]: ✔️ UNRAID API started successfully! Mar 30 14:21:36 FakeServerName nmbd[7060]: [2024/03/30 14:21:36.755237, 0] ../../source3/nmbd/nmbd_become_lmb.c:398(become_local_master_stage2) Mar 30 14:21:36 FakeServerName nmbd[7060]: ***** Mar 30 14:21:36 FakeServerName nmbd[7060]: Mar 30 14:21:36 FakeServerName nmbd[7060]: Samba name server FakeServerName is now a local master browser for workgroup FAKEWORKGROUP on subnet 10.10.8.130 Mar 30 14:21:36 FakeServerName nmbd[7060]: Mar 30 14:21:36 FakeServerName nmbd[7060]: ***** Mar 30 14:22:01 FakeServerName kernel: docker0: port 1(vethb1f1076) entered blocking state Mar 30 14:22:01 FakeServerName kernel: docker0: port 1(vethb1f1076) entered disabled state Mar 30 14:22:01 FakeServerName kernel: device vethb1f1076 entered promiscuous mode Mar 30 14:22:04 FakeServerName kernel: eth0: renamed from veth71d8bae Mar 30 14:22:04 FakeServerName kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethb1f1076: link becomes ready Mar 30 14:22:04 FakeServerName kernel: docker0: port 1(vethb1f1076) entered blocking state Mar 30 14:22:04 FakeServerName kernel: docker0: port 1(vethb1f1076) entered forwarding state Mar 30 14:22:04 FakeServerName kernel: IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered blocking state Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered disabled state Mar 30 14:22:09 FakeServerName kernel: device veth8cc6977 entered promiscuous mode Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered blocking state Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered forwarding state Mar 30 14:22:09 FakeServerName kernel: br-659b348d1ec9: port 1(veth8cc6977) entered disabled state Mar 30 14:22:14 FakeServerName kernel: eth0: renamed from veth90def87 Before I signed off that this was the only setting needed to flip I recreated the blocker with the same UnRAID API crashing behavior just by flipping this back: Quote Preserve user defined networks: Yes and just as before flipping it back to No allowed me to bring Docker back online with all those custom networks still readily available. At the first occurrence of success I decided to re-apply the upgrade again back to 6.12.9. I flipped the "Preserve user defined networks" a handful of times because I couldn't believe what I was observing and thought maybe I had done something more significant that I had forgotten, but nope. It was that simple. The logs may be confusing when observing the errors coming and going at the trailing end. (This would be the definition of madness I guess.) Just in case asked, yes, I am using those custom docker networks. They're not vestigial. fakeservername-diagnostics-20240330-1733.zip Edited March 30 by Sn3akyP3t3 Forgot to mention that I reverted back to the update after first sign of success. 1 Quote Link to comment
Sn3akyP3t3 Posted April 20 Author Share Posted April 20 (edited) Updated to 6.12.10 today. The problem now is that the mentioned workaround above no longer works. I'm unclear what triggered this to begin in the first place. I'll be rebuilding my docker container structure tomorrow in an attempt to recover from this situation. I also saw the comments in the release notes about changing from macvlan (which is the setting applied currently) to ipvlan, but that appears to be greyed out. I don't know why I would have switched to macvlan, but if they're required for docker custom networks then that is likely why. I did notice one awkward behavior never observed before while operating in this manner. Whenever a docker container updates it deselects the custom docker network that was previously using. This might be expected per the setting to `Preserve user defined networks: No`, but I wouldn't know for sure as I was only doing this because this is the only combo selection of settings that worked. Edited April 21 by Sn3akyP3t3 Quote Link to comment
Vr2Io Posted April 21 Share Posted April 21 On 3/31/2024 at 7:05 AM, Sn3akyP3t3 said: I think I narrowed down the problem to "custom networks" If that, does it make conflict ? What special of that custom networks must preserve and no other method to eliminate it. Quote Link to comment
Sn3akyP3t3 Posted April 22 Author Share Posted April 22 20 hours ago, Vr2Io said: If that, does it make conflict ? What special of that custom networks must preserve and no other method to eliminate it. No idea at the moment. I have kept the docker image around for now in case there's some means to exhume some information that would be helpful to answer that question. The logs from docker in this area are quite insignificant at identifying what exactly the root conflict really is: Quote failed to start containerd: timeout waiting for containerd to start I suspect the problem to be related to "custom networks", but I have no solid proof. Only that the relief in bringing the docker image back into service had to do with disabling preservation of custom networks and that one of the qwirk behaviors observed while I was able to operate in this degraded state was that selected custom networks deslected themselves whenever a docker container updated itself with the auto-update feature. Quote Link to comment
Solution Sn3akyP3t3 Posted April 22 Author Solution Share Posted April 22 My resolution for this all was to recreate the docker image using the previous apps feature. Rather than attempt the possible fate of resurrecting the nondeterministic behavior with custom networks I've decided to go whole enchilada with vlan tagging instead. This provides the desired network segregation similar to what I was getting with custom networks, but probably far more granular and likely enables possibilities that I'm not yet aware of. Quote Link to comment
Vr2Io Posted April 22 Share Posted April 22 (edited) I m not expert of docker network, but interest on what other people doing and why need preserve custom network, in first, I m thinking does any VPN plugin / docker cause problem. Anyway you found another solution, btw due to I put docker path to /tmp so every reboot need re-download all docker, that may help fix some hidden issue. And I also apply VLAN to separate stuff for what I need. Edited April 22 by Vr2Io Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.