Docker service is crashing my server

June 4, 20251 yr

Hi all

I suddenly run into this problem as following:

Today my server suddenly was not responsive and I had to hard reset. After the boot I ran into the issue again, but this time was able to see the culprit:

/usr/bin/dockerd -p /var/run/dockerd.pid --log-opt max-size=500m --log-opt max-file=1 --log-level=fatal --storage-driver=btrs

This task was using almost all of my CPU and increasing Memory minute by minute until it finally crashed. After that the webui and the server was unresponsive on SSH and also WebUI or direct console access. I rebootet again and tried to stop some containers etc. but the same thing happened. This time though SSH came back, the WebUI never did but I was able to collect diagnostics through this:

I have run with these containers for a long time and today is the first time this happened. What could be the issue? Should I replace the flash drive? Cache SSD? Anything? The SMART values seem to look okay.

However in the syslog I can clearly see the out of memory error. With the worst offenders there seem to be dockerd but also meilisearch so I will try to reboot without that momentarily.

Does anyone see anything else out of order here?

Thanks

redstore-diagnostics-20250604-2205.zip

Quote

June 5, 20251 yr

Community Expert

You will first try to find out which container(s) is causing the OOM errors, try limiting their RAM usage, or just enable half of them, if no issues, try the other half.

Quote

June 6, 20251 yr

Author

Just did that, it seems better but not perfect yet. I limited all my containers to reasonable values and while I didn't run in to complete OOM errors still the WEBUI got unrespsonsive after some time and the docker.d process got very high memory and CPU usage. I also didn't run the ones I suspect were using much memory like llm containers etc. It was running fine for a while and Plex and many other things were running but yeah, I definitely saw much higher load than just 2 days ago (and the last few months before this) and this seemed to all be caused by the dockerd process.

Also many other weird things happen and my logs are filled with many errors from other things including GPUs etc. Docker stats however showed that all containers stayed well under their memory limit and calculating that it seemed fine, but still the dockerd processes kept creeping up.

Now running a memory check and collecting other logs to investigate more. Something is clearly very wrong. The fact that everything worked perfectly with no errors etc. for many months where I didn't change or start any new containers is a bit concerning. Yes, I do run a lot of containers but they were running for quite a while but this is leading me to think maybe something is wrong with hardware maybe and I want that ruled out first.

So next step for me is memory check, then I'll check and replace the flash drive to see if anything could be wrong there to rule out any defects there. The disks and filesystems seem to be fine so far but I'll be double checking that. This morning my docker.log file was filled with these lines repeating:
2025/06/06 03:09:05 http: superfluous response.WriteHeader call from go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*respWriterWrapper).WriteHeader (wrap.go:88)

time="2025-06-06T03:09:06.239570146+02:00" level=error msg="post event" error="context deadline exceeded"

time="2025-06-06T03:09:06.511020566+02:00" level=error msg="ttrpc: received message on inactive stream" stream=1503

2025/06/06 03:09:22 http: superfluous response.WriteHeader call from go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*respWriterWrapper).WriteHeader (wrap.go:88)

So that seems to still be some memory leak somewhere. So my plan of action:

Check Hardware
Check disks, file system etc.
Try even smaller subsets of containers than before
Go from here

Thank you for your help so far.

Quote

1

June 6, 20251 yr

Author

Turns out it aparently isn't one of my containers. This is the output from htop when no container is started:

I think this somehow has messed up my docker config. I don't use a docker image but I use a docker folder, I might need to try and back this up and start over.

Edit:

Getting worse, a few minutes later it has filled up my ram again and is crashing the WebUI and other things:

Edited June 6, 20251 yr by RedXon

Quote

June 6, 20251 yr

Community Expert

Looking at the diags, you are using a docker folder with zfs native, that is a known issue, recreating it using the overlay2 driver, or better yet IMHO, change to an image:

https://docs.unraid.net/unraid-os/release-notes/7.0.0/#add-support-for-overlay2-storage-driver

Quote

1

June 6, 20251 yr

Author

Ah I see let me try that. Funnily enough this wasn't an issue before but good to know. I switched from image to folder recently as my image kept getting full as I have a lot of images and my configs weren't always ideal with docker volumes, most of which I switched to folder mounts since.

Also, something to consider, i don't really use the appstore much, almost all of which I run is running directly with docker-compose.

Quote

June 6, 20251 yr

Community Expert

1 hour ago, RedXon said:
Funnily enough this wasn't an issue before

It will be more of an issue with 7.1 vs 7.0, due to the kernel change, and an existing zfs bug with the newer kernels.

2 hours ago, RedXon said:
Also, something to consider, i don't really use the appstore much, almost all of which I run is running directly with docker-compose.

That should not make a difference in this case.

Quote

June 6, 20251 yr

Author

Thank you, with the limited test with just some selected containers it seems to work for now after switching to overlay2 for now.

However I just saw that memory limit don't seem to work on Unraid when using compose, I get this error for now, but need to check further:

Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.

So I'll see how that goes after I restart most of my "important" compose files.

Quote

June 7, 20251 yr

Community Expert

Sorry, can't really help with compose, you can try asking on the existing plugin support thread.

Quote

June 7, 20251 yr

Author

Unfortunately it only seemed to get better with overlay2 instead of native. It was working just fine for a while but now my dockerd process is using over 40GBs again.

Might need to check if a change back to the docker image could resolve this. Very weird behaviour. At the moment dockerd is filling ram and making the WebUI unrespsonive. The weird thing is I still haven't figured out where these 40Gigs usage come from, not from the containers it seems, as I managed to limit all the containers that run at the moment and they can consume a maximum of 20GB of memory in total and up until dockerd has put the server OOM again were using around 12 Gigs. So something else is clearly going on.

Quote

June 9, 20251 yr

Author
Solution

Alright, just another status update (for you, anyone interested and anyone seeing this post in the future with a similar problem).

After the overlay2 didn't bring the desired effect, I switched back to a docker image. I tested it for what is now aproximately just under 24 hours and it seems to be good. I see the dockerd processes in htop but they each consume just 0.1% memory, where before (with natie and overlay2 docker storage in a folder) it would creep up until it was well over 40% memory and rising until it would just crash and be killed by the linux oom-killer.

I don't have all the containers up yet just the ones I did the testing before (so the arr suite, plex, nextcloud and some databases) and so far so good. I just started the LLM containers, my GIT, immich and some others again so I'm sitting at 39 running containers with just about 45% memory usage and 39% usage of the docker.img file. Seems to be working so far.

The reason I intially switched away from docker.img maybe for some background is that a few years ago (I've been using Unraid since about 2016 or longer I guess) I got fed up with the way unRaid handeled containers, the Appstore etc. as I was used to working with compose and at that point also kubernetes. So I switched all my containers to a few VMs and run everything with compose. On the VMs naturally I could use docker volumes among other things as the volumes would be in a folder, not in an image with limited file size. The VMs did have downsied though, as mounting the array in the VM was always not an ideal solution for some tasks.

A bit later I figured out a good workflow for me to run compose directly on unRaid so I switched everything back to running on the host itself although with compose. Because I missed some docker volumes and I did have many images (with that many containers that I had at the time) my docker.img got full very fast, which is why I switched to the folder method as I thought it was better than the other one. But now that I have fixed almost all of the docker volumes to folder mounts on /mnt/cache/appdata though I should not run in to the problem of a full docker image again I hope. So fingers crossed this will work for good now.

And still while writing this long paragraph (which to be fair probably no one asked for and will skip anyway) I started the rest of all the containers I was running before (so at the time my problems started) and so far I can report no memory leaks or high CPU load as I had before. The dockerd service is sitting happily at an average of 1-2% CPU load and 0.1-0.2% memory load as it did like before the problems started. I will investigate for a few more days and mark as solved if it doesn't happen again.

So TL;DR for anyone else I guess:

If you're running ZFS cache on unRaid 7.1.x and using a docker folder seeing this task use a lot of ressources in htop:

/usr/bin/dockerd -p /var/run/dockerd.pid --log-opt max-size=500m --log-opt max-file=1 --log-level=fatal --storage-driver=btrs

Then you might as well try to switch to a docker image again and see if it works.

Some other users reported having this issue when the container mounts were not on /mnt/cache/appdata but on /mnt/user/appdata instead, pointing to the fact that the dockerd process is not happy with having slow I/O, so another option for the affected would be to check that and also storage integrity.

Edited June 9, 20251 yr by RedXon

Quote

Docker service is crashing my server

Featured Replies

Solved by RedXon

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)