• [6.11.5 - 6.12.0] Docker is Maxing out CPU and Memory, Triggering `oom-killer`


    Ryonez
    • Urgent

    When working with docker, I've been having issues for the past month, though not without finding what exactly the trigger is.
    The first time this happened was when I update all containers, around the 1st of May for me. Forced an unclean reboot after oom-killer was triggered, and the containers loaded up all right after the OS came back up, albeit with high CPU usage, though I can't remember if the memory usage spiked at the time.
    The second was a few days later, May the 4th for me. Again triggered during an update, forced another unclean reboot after oom-killer was triggered, this time the containers did not come back up, docker chocked the system up both on CPU and memory.

    Symptoms:
    Changes to the docker image seem to illicit high CPU usage from docker. Is also causes high spikes in memory usage.
    The spike in memory usage can memory to get completely used, triggering the oom-killer.
    The more containers in use, the more likely it seems to cause this every time. The amount required to trigger oom-killer is unknown currently.

    Things tried to resolve this:
    Checking the docker image filesystem (Result: No problems).
    Deleting and rebuilding the docker image, several times.
    Did a cache filesystem check (BTRFS, Result: Errors found.)
    Replaced the cache drives completely.
    Re-flashing the the boot USB and only copying configs back over.
    Starting shifting almost all my docker container configs to portainer, as docker template duplicate fields and restores unwanted fields from template updates. "Currently" (I'm having to rebuild the docker image again atm), only 3-4 containers are unRaid managed.

    Currently it seems if I only load containers up in small batches, I can get every container up. But only from a clean docker image, or when editing a small number of containers at a time (this I'm feeling iffy on).

    For the first diagnostic log attached (atlantis-diagnostics-20230507-1710), this seems to have logs from the 4th of May mentioned above.
    It'll have the logs from when docker was no longer able to be started. I ended up leaving docker off when it triggered the oom-killer. the log on the 7th shows that zombie process from the docker containers where preventing unmounting the cache. and unclean reboot was done.

    For the second diagnostic log attached (atlantis-diagnostics-20230521-1029), I was working on getting a small container stack of 3 images going. But I was configuring them, meaning they were reloaded several times in a row, and seemly creating a situation where the CPU and Memory spikes got stacked up enough for docker to trigger the oom-killer.

    Marking this as urgent as docker killing itself via oom-killer is leaving a big risk of data corruption, along with leaving docker pretty much useless currently. 

    atlantis-diagnostics-20230507-1710.zip atlantis-diagnostics-20230521-1029.zip




    User Feedback

    Recommended Comments

    13 minutes ago, Squid said:

     


    I don't believe this to be an issue with the docker containers memory usage.

    For example, the three images I was working on today:

    1. scr.io/linuxserver/diskover
    2. docker.elastic.co/elasticsearch/elasticsearch:7.10.2
    3. alpine (elasticsearch-helper)

     

    diskover wasn't able to connect to elasticsearch, elasticsearch was crashing due to perm issues I was looking at, and elasticsearch-helper runs a line of code and shuts down.

    I don't see any of these using 10+GB of ram in a non functional state. The system when running steady uses 60% of 25GB of ram. And this wasn't an issue with the set of containers I was using until the start of this month. 

    I believe this to be an issue in docker itself currently.

    Link to comment

    Just a heads up on my progress from this morning.

    My containers are back up, didn't pass 50% of my RAM during this. So currently up and running until docker craps itself again/

    Link to comment
    22 hours ago, Ryonez said:

    The system when running steady uses 60% of 25GB of ram.

     

    22 hours ago, Ryonez said:

    And this wasn't an issue with the set of containers I was using until the start of this month. 

     

    Diskover ( indexing ) may use up all memory suddenly. Once OOM happen, system crash also expected. I run all docker in /tmp ( RAM ), even CCTV's recording, as memory usage really steady so haven't trouble.

     

    Try below method to identify which folder will use up memory and best map out to a SSD.

     

     

    Edited by Vr2Io
    Link to comment
    5 minutes ago, Vr2Io said:

    Diskover ( indexing ) may use up all memory suddenly. Once OOM happen, system crash also expected. I run all docker in /tmp ( RAM ), even CCTV's recording, as memory usage really steady so haven't trouble.


    Diskover just happened to be what I was trying out yesterday when the symptoms occurred yesterday. It has not been present during the earlier times the symptoms occurred. After rebuilding docker yesterday I got diskover working and had it index the array, with minimal memory usage (I don't think I saw diskover go above 50mb).

     

     

    8 minutes ago, Vr2Io said:

    Try below method to identify which folder will use up memory and best map out to a SSD.


    I'm not really sure how this is meant to help with finding folders using memory. It actually suggesting it's moving log folders to ram, which would increase the ram usage. If the logs are causing a blowout of data usage, that usage would go to the cache/docker location, not memory.
    It might be good to note I use docker in the directory configuration, not the .img, so I am able to access those folders if needed.

    Link to comment

    Right, you run docker in storage not RAM, so something ( may be individual docker ) cause OOM. 

    Link to comment

    So, for some behavioural updates.

    Noticed my RAM usage was at 68, not the 60 it normally rests at. Server isn't really doing anything, so figured it was from the two new containers, diskover and elasticsearch, mostly from elasticsearch. So I shut that stack down.

    That caused docker to spike the CPU and RAM a bit.

    The containers turned off in about 3ish seconds, and RAM usage dropped to 60. Then, docker started doing it's thing:
    image.png.50d1cfa51ea42276dd7064bc70c00ad3.png

    Ram usage started spiking up to 67 in this image, and CPU usage is spiking as well.

    After docker settles with whatever it's doing:
    image.png.06f6d48a47f8f15fae6717e8554c450f.png

    This is from just two containers being turned off. This gets worse the more docker operations you preform.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.