[6.12.10] SSH Sessions creates a new cgroup that is never deleted after connection is closed.

KnF · May 20

Anyone had a chance to check this out?

user12345678 · September 5

I am also experiencing this issue.

user12345678 · September 7

Here is a script that can be added to User Scripts plugin and run on a schedule, however often is necessary.

It is undoubtedly the wrong way to address this, but it works for now.

#!/bin/bash

#
# This issue crept up in unraid 6.12.x, presumably with a switch from cgroup v1 to v2.
#
# cgroups are used for SSH connections (via elogind) and when the SSH session ends the cgroup is not cleaned up.
#
# This results in old cgroups continuing to collect until errors like this start preventing new connections:
#
#    unraid2 elogind-daemon[1504]: Failed to create cgroup c65514: No space left on device
#
# Here are some mentions on the unraid forums:
#
#    https://forums.unraid.net/topic/45586-ssh-and-denyhosts-updated-for-v61/?do=findComment&comment=1398585
#    https://forums.unraid.net/bug-reports/stable-releases/61210-ssh-sessions-creates-a-new-cgroup-that-is-never-deleted-after-connection-is-closed-r2981/
#
# In the first link, the workaround was to comment the line in /etc/pam.d/sshd for pam_elogind.so which works, but 
# is that the right way to do it?
#
# Seems this used to be handled properly up through 6.11.x when cgroups v1 was in use via the release_agent paradigm.
# Indeed, for cgroup v1 it used to be mounted like this:
#
#    cgroup /sys/fs/cgroup/elogind cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib64/elogind/elogind-cgroups-agent,name=elogind 0 0
#
# cgroup v2 has no concept of release_agent and in 6.12.x /lib64/elogind/elogind-cgroups-agent still exists, so it seems it's an incomplete migration from v1 to v2.
#
# cgroup v2 has a cgroup.events file under each individual cgroup which contains values that an external process is supposed to use inotify
# events to monitor and when the populated value goes to zero it can be cleaned up.
#
# this process is either part of systemd (which is not complete/fully present in slackware) or it is supposed to be implemented
# by sshd/elogind and isn't in unraid.
#
# in any case, the script below can look for cgroups where populated is zero and assume they can safely be deleted via the cgdelete command.
#
# it can be ran however often is necessary to match whatever rate of SSH connections are made/terminated.
#
# *This script is undoubtedly also the wrong way to handle this, but it at least leaves elogind in-play and continues
# to allow cgroups to be used for SSH connections.
#

CGROUP_BASE_DIR="/sys/fs/cgroup"

for CGRP in $(find ${CGROUP_BASE_DIR} -type d -maxdepth 1 -regextype sed -regex "^.*/c[0-9]\{1,\}$" -printf "%f\n")
do
  { sed -n '/populated 0/Q 1'; i=$?; } < ${CGROUP_BASE_DIR}/${CGRP}/cgroup.events
  { sed -n '/frozen 0/Q 1'; j=$?; } < ${CGROUP_BASE_DIR}/${CGRP}/cgroup.events
  if [[ $i -eq 1 && $j -eq 1 ]]
  then
    echo "cleaning up old cgroup ::/${CGRP}"
    /usr/bin/cgdelete -r "::/${CGRP}"
  else
    echo "cgroup ::/${CGRP} still active, skipping"
  fi
done

ich777 · September 12

I'm currently looking into this and also into other issues in regards to cgroup2.

EDIT: @user12345678 is the script actually working for you that you've posted.

EDIT2: @KnF do you have maybe Diagnostics from when that occured, it would be very interesting what happened on your system since I really can't reproduce the "no space left on device" I'm not at around 10000 ssh login/logouts and everything is working over here.

user12345678 · September 12

@ich777 - Thank you for looking into this.

I did not experience the actual error, I just noticed the same symptoms (cgroups not being cleaned up after SSH sessions are terminated) while setting up an SFTP connection and could clearly see that I would run into it eventually so I did some research (which included finding this forum topic and another, similar one) and came up with this workaround script.

It does indeed work. I just run it about every hour or-so and it will find the abandoned cgroups and delete them.

I think the way it is intended to work in cgroups v2 is there should be a daemon process monitoring inotify events for the cgroup.events file, checking to see if populated and frozen (or at the very least, populated) goes to zero and, if so, clean up the cgroup.

It may be that the daemon is either part of systemd (which I don't think is fully present in slackware so this part may be missing in unraid) or it could be some sort of plugin/add-on to elogind and is missing here, I am not sure on any of that.

Thanks again!

ich777 · September 12

7 minutes ago, user12345678 said:

It may be that the daemon is either part of systemd (which I don't think is fully present in slackware so this part may be missing in unraid) or it could be some sort of plugin/add-on to elogind and is missing here, I am not sure on any of that.

I already have a solution in mind to solve this issue.

systemd is a different kind of beast...

For systemd it's actually the case that it works in combination with cgroup and does the cleanup of all necessary or better speaking unused cgroups on it's own.

11 minutes ago, user12345678 said:

It does indeed work. I just run it about every hour or-so and it will find the abandoned cgroups and delete them.

I tested the script and on my system it does basically nothing, maybe it would be better to look at cgroup.procs since the active PIDs listed in there and if they are empty no processes are running anymore.

user12345678 · September 12

27 minutes ago, ich777 said:

I tested the script and on my system it does basically nothing, maybe it would be better to look at cgroup.procs since the active PIDs listed in there and if they are empty no processes are running anymore.

Here is an example of how it works for me:

root@unraid:~# ls -d /sys/fs/cgroup/c*/
/bin/ls: cannot access '/sys/fs/cgroup/c*/': No such file or directory
root@unraid:~# ssh -l XXX unraid.localdomain
([email protected]) Password: 
Last login: XXXXXXXXXXXX from XXX.XXX.XXX.XXX
Linux 6.1.106-Unraid.
XXX@unraid:~$ ls -d /sys/fs/cgroup/c*/
/sys/fs/cgroup/c18//
XXX@unraid:~$ exit
logout
Connection to unraid.localdomain closed.
root@unraid:~# ls -d /sys/fs/cgroup/c*/
/sys/fs/cgroup/c18//
root@unraid:~# bash /boot/scripts/cgroup2_cleanup.sh 
cleaning up old cgroup ::/c18
root@unraid:~# ls -d /sys/fs/cgroup/c*/
/bin/ls: cannot access '/sys/fs/cgroup/c*/': No such file or directory
root@unraid:~#

The documentation for cgroups v2 specifically says to monitor cgroups.events so that's the one I went after, if cgroups.procs is better than so be it, I am not an expert on this

https://docs.kernel.org/admin-guide/cgroup-v2.html

cgroup.events

    A read-only flat-keyed file which exists on non-root cgroups. The following entries are defined. Unless specified otherwise, a value change in this file generates a file modified event.

        populated

            1 if the cgroup or its descendants contains any live processes; otherwise, 0.
        frozen

            1 if the cgroup is frozen; otherwise, 0.

https://man7.org/linux/man-pages/man7/cgroups.7.html

Cgroups v2 cgroup.events file
       Each nonroot cgroup in the v2 hierarchy contains a read-only
       file, cgroup.events, whose contents are key-value pairs
       (delimited by newline characters, with the key and value
       separated by spaces) providing state information about the
       cgroup:

           $ cat mygrp/cgroup.events
           populated 1
           frozen 0

       The following keys may appear in this file:

       populated
              The value of this key is either 1, if this cgroup or any
              of its descendants has member processes, or otherwise 0.

       frozen (since Linux 5.2)
              The value of this key is 1 if this cgroup is currently
              frozen, or 0 if it is not.

       The cgroup.events file can be monitored, in order to receive
       notification when the value of one of its keys changes.  Such
       monitoring can be done using inotify(7), which notifies changes
       as IN_MODIFY events, or poll(2), which notifies changes by
       returning the POLLPRI and POLLERR bits in the revents field.

   Cgroup v2 release notification
       Cgroups v2 provides a new mechanism for obtaining notification
       when a cgroup becomes empty.  The cgroups v1 release_agent and
       notify_on_release files are removed, and replaced by the
       populated key in the cgroup.events file.  This key either has the
       value 0, meaning that the cgroup (and its descendants) contain no
       (nonzombie) member processes, or 1, meaning that the cgroup (or
       one of its descendants) contains member processes.

       The cgroups v2 release-notification mechanism offers the
       following advantages over the cgroups v1 release_agent mechanism:

       •  It allows for cheaper notification, since a single process can
          monitor multiple cgroup.events files (using the techniques
          described earlier).  By contrast, the cgroups v1 mechanism
          requires the expense of creating a process for each
          notification.

       •  Notification for different cgroup subhierarchies can be
          delegated to different processes.  By contrast, the cgroups v1
          mechanism allows only one release agent for an entire
          hierarchy.

Edited September 12 by user12345678

ich777 · September 12

14 minutes ago, user12345678 said:

The documentation for cgroups v2 specifically says to monitor cgroups.events so that's the one I went after, if cgroups.procs is better than so be it, I am not an expert on this

Ah, now I get why it is not working for me, you are searching for directories starting with "c" followed by a number here:

On 9/7/2024 at 4:47 PM, user12345678 said:
-regex "^.*/c[0-9]\{1,\}$"

I completely overlooked that part, on my systems the ssh sessions don't start with a "c" that's why it fails, however I'm currently working on another solution to empty the cgroups which are currently not in use.

14 minutes ago, user12345678 said:

https://docs.kernel.org/admin-guide/cgroup-v2.html

Thanks, already read that, I think both ways are a valid way of deleting unused cgroups however for ssh sessions it would be better to use pam for that.

user12345678 · September 12

2 minutes ago, ich777 said:

... however I'm currently working on another solution to empty the cgroups which are currently not in use.

I am 100% positive your way will be better than mine

Thank you for working on this!!

ich777 · September 18

This should be fixed in the next release.

KnF · September 18

Hey guys! Thanks for looking into this issue.

@ich777 I disabled my HomeAssistant scripts so it wouldn't fail anymore, but I'll enable them now so in a couple days the groups will hit the cap. As soon as that happens I will post the Diagnostics like you asked.

For what I get you already have a solution, but I guess this will provide you more information to confirm if your solution works as intended

Thank you both for taking time to look into this issue!

ich777 · September 18

42 minutes ago, KnF said:

@ich777 I disabled my HomeAssistant scripts so it wouldn't fail anymore, but I'll enable them now so in a couple days the groups will hit the cap. As soon as that happens I will post the Diagnostics like you asked.

I don't think that's necessary anymore, the next version from Unraid should have a daemon included which automatically removes unused cgroups.

[6.12.10] SSH Sessions creates a new cgroup that is never deleted after connection is closed.

User Feedback

Recommended Comments

KnF 1

Link to comment

user12345678 5

Link to comment

user12345678 5

Link to comment

ich777 4,147

Link to comment

user12345678 5

Link to comment

ich777 4,147

Link to comment

user12345678 5

Link to comment

ich777 4,147

Link to comment

user12345678 5

Link to comment

ich777 4,147

Link to comment

KnF 1

Link to comment

ich777 4,147

Link to comment

Join the conversation