Jump to content
  • [6.12.10] Cannot fork: Resource temporarily unavailable


    Hendrik112
    • Urgent

    Hi guys, my unraid server has been running 24/7 for a few months before suddenly docker services start to fail and the docker daemon being unavailable. I tried logging in to the WebUI, but that was completely frozen. I sadly wasn't able to copy diagnostics, since ssh was also unavailable.

    Using the IPMI I managed to get the following screenshot while initiating an acpi shutdown. The server was completely unresponsive for a few minutes until it finally shut down.
    Upon restart, a parity check was automatically triggered, which indicates to me that the “soft shutdown” wasn't performed correctly.

    The same happened again a day later. The same messages in the ipmi and the WebUI being barely useable, indicating "no flash drive". Not sure if this was just a frontend bug or the flash drive actually having issues.

     

    I have been using unraid for 5 years now and never saw anything like this before.

    The diagnostics file was generated after the restart has been performed.

     

    I searched around about the error message, but didn't find anything related to my error. My system has 128GB of ECC ram, so it's unlikely of being related to running oom on a 2GB system.

     

    unraid.png

    anton-diagnostics-20240609-0925.zip

     

    Edit:

     

    Flash drive issues likely cause these errors. Beginning with this log message:

    emhttpd: Unregistered Flash device error (ENOFLASH7)

     

    Followed by this:

    emhttpd: Plus key detected, GUID: 0781-5581-0000-100124105314 FILE: /boot/config/Plus.key
    emhttpd: error: device_read_smart, 9567: Cannot allocate memory (12): device_spinup: stream did not open: nvme1n1

     

    After multiple of these flash drive errors, followed by more memory allocation errors, the whole system becomes more and more unresponsive. Resulting in docker containers not being able to allocate resources and becoming more and more unstable.

    Which results in the original error message:

    Cannot fork: Resource temporarily unavailable

    At this point, the unraid WebGUI is barely responsive, and this stage can continue for about 1-3h, during which I am still able to gracefully shut down the server using the IPMI KVM with the powerdown command.

    Uptime Kuma is the only service able to still push out a discord notification that a ping process has failed at this stage.

    If I miss this window, the server locks up completely so that even an ACPI Shutdown doesn't get through. The only way to shut down now is by turning off the PSU.
     




    User Feedback

    Recommended Comments



    I ran the exact same setup in a fresh Proxmox Debian KVM. The Debian Docker implementation is not vulnerable to Docker 'fork bombs'. The PIDs are successfully limited to 7000, and the only effect is the container status changing to unhealthy. This is achieved with no additional configuration whatsoever.


    This confirms that this is not a problem with Docker itself, but rather with Unraid's implementation of Docker.

    Link to comment

    Thanks, I've already reported this issue to LT to make sure they are aware, and to see if they can do something to prevent this.

    • Like 1
    Link to comment
    On 6/25/2024 at 2:45 AM, Hendrik112 said:

    I ran the exact same setup in a fresh Proxmox Debian KVM. The Debian Docker implementation is not vulnerable to Docker 'fork bombs'. The PIDs are successfully limited to 7000, and the only effect is the container status changing to unhealthy. This is achieved with no additional configuration whatsoever.


    This confirms that this is not a problem with Docker itself, but rather with Unraid's implementation of Docker.

     

    Thank you for tracking down this issue.  Here is a way to limit the number of processes any container is permitted to create:

     

    From a terminal session, edit the file /boot/config/docker.cfg and add this line to the end:

     

    DOCKER_OPTS="--default-ulimit nproc=256:1024"

     

    You must then disable Docker and then re-enable for settings to take effect.

     

    The string 256:1024 sets the soft and hard limits for the number of processes the container can create.  This setting will apply to all containers.  There is a way to set these limits per-container but that would require a code change.  Of course you can set different values.

     

    Please let me know if this solves the 'fork bomb' issue.

    Link to comment
    1 hour ago, limetech said:

     

    Thank you for tracking down this issue.  Here is a way to limit the number of processes any container is permitted to create:

     

    From a terminal session, edit the file /boot/config/docker.cfg and add this line to the end:

     

    DOCKER_OPTS="--default-ulimit nproc=256:1024"

     

    You must then disable Docker and then re-enable for settings to take effect.

     

    The string 256:1024 sets the soft and hard limits for the number of processes the container can create.  This setting will apply to all containers.  There is a way to set these limits per-container but that would require a code change.  Of course you can set different values.

     

    Please let me know if this solves the 'fork bomb' issue.

    Hi, thank you for looking into this.

    I set the DOCKER_OPTS according to your instructions and quickly noticed that 256:1024 doesn't seem sufficient for my 30 docker containers. High CPU usage and multiple containers reported that they were running out of memory and failed to start.
    After increasing nproc to 1024:2048 all containers started successfully.

    The slow 'fork bomb' was however not stopped by this setting, using over 7500 PIDs when I stopped it to not crash my system again.

    I then used compose down to remove all containers and lowered nproc to its initial 256:1024 limit. After restarting docker I only created the authelia stack, and it again used over 7500 PIDs. The docker setting that can contain it is setting pids_limit: 50 in a compose file. This seems to be a very hard limit compared to the one set by debian causing the container to crash.

    In case you want to reproduce this issue yourself, I created a detailed bug report on the authelia github which should contain all the necessary steps.

    Edited by Hendrik112
    Link to comment

    After changing the DOCKER_OPTS line did you go to Settings/Docker settings and disable docker, hit Apply, then enable docker and hit Apply?
    You can open console for a container and type:

    ulimit -a

    and see if the limits were applied.  Look for 'max_user_processes' in the output.

     

    Also: "256:1024" is just a guess - I don't know what is ultimately going to be the best setting.  In a future release we'll probably make this a per-container config setting.

    Link to comment

    Yes, my workflow is the following:
    1. Settings->Docker->Enable Docker: No->Apply
    2. Set docker.cfg
    3. Settings->Docker->Enable Docker: Yes->Apply

    Output of ulimit -a inside the authelia docker container:
     

    core file size (blocks)         (-c) 0
    data seg size (kb)              (-d) unlimited
    scheduling priority             (-e) 0
    file size (blocks)              (-f) unlimited
    pending signals                 (-i) 514764
    max locked memory (kb)          (-l) unlimited
    max memory size (kb)            (-m) unlimited
    open files                      (-n) 40960
    POSIX message queues (bytes)    (-q) 819200
    real-time priority              (-r) 0
    stack size (kb)                 (-s) unlimited
    cpu time (seconds)              (-t) unlimited
    max user processes              (-u) 256
    virtual memory (kb)             (-v) unlimited
    file locks                      (-x) unlimited


    Output of ps inside the authelia docker container:
     

    PID   USER     TIME  COMMAND
        1 99        0:00 authelia
       99 root      0:00 [ssl_client]
      110 root      0:00 [ssl_client]
      118 root      0:00 [ssl_client]
      126 root      0:00 [ssl_client]
      134 root      0:00 [ssl_client]
      142 root      0:00 [ssl_client]
    ...
     1397 root      0:00 [ssl_client]
     1398 root      0:00 {healthcheck.sh} /bin/sh /app/healthcheck.sh
     1404 root      0:00 wget --quiet --no-check-certificate --tries=1 --spider https://localhost:9091/api/health
     1405 root      0:00 ssl_client -s3 -n localhost -I
     1406 root      0:00 ps


    Output of ps -e | wc -l inside the authelia docker container:
     

    1177

     

    Edit: The setting seems to be applied correctly, but it doesn't limit the amount of processes at all. I have the nvidia-driver plugin installed if that maybe makes a difference.

    Edited by Hendrik112
    Link to comment

    Ok forget about that DOCKER_OPTS string, you can remove it.

     

    Instead, edit the authelia container, select Advanced View (upper right toggle) and include this string in the 'Extra Parameters' setting:

     

    --pids-limit 1000

     

    You can set any value besied 1000 of course.  The total number of processes inside that container should now be limited.

    Link to comment

    That did it. I guess --pids-limit is just the docker run equivalent to the pids_limit I already tried in docker compose.

    I added --pids-limit 1000 and --health-interval 0.01s to speed up the process to the 'Extra Parameters' setting. The processes were successfully limited at 1000 and the container became unhealthy as intended.
    This solution requires pid limiting for each container, which could be automated by the gui. Sadly, that won't apply to people who use docker compose via the compose plugin or portainer.

    Edited by Hendrik112
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...