Services not starting after 6.12.8 upgrade (pid files cannot be created)


Recommended Posts

Rebooted after upgrade from 6.12.6, seems services cannot create pid files for lack of space on the device, not sure on which storage they are created, the USB is only 13% used. Of course since nothing starts I guess rolling back will be a pita.

 

Feb 17 08:46:24 Tower emhttpd: shcmd (35): /etc/rc.d/rc.avahidaemon start
Feb 17 08:46:24 Tower root: Starting Avahi mDNS/DNS-SD Daemon: /usr/sbin/avahi-daemon -D
Feb 17 08:46:24 Tower avahi-daemon[13749]: Found user 'avahi' (UID 61) and group 'avahi' (GID 214).
Feb 17 08:46:25 Tower winbindd[13609]: [2024/02/17 08:46:25.024111,  0] ../../source3/winbindd/winbindd_samr.c:71(open_internal_samr_conn)
Feb 17 08:46:25 Tower winbindd[13609]:   open_internal_samr_conn: Could not connect to samr pipe: NT_STATUS_CONNECTION_DISCONNECTED
Feb 17 08:46:25 Tower avahi-daemon[13749]: Successfully dropped root privileges.
Feb 17 08:46:25 Tower avahi-daemon[13749]: write(): No space left on device
Feb 17 08:46:25 Tower avahi-daemon[13749]: Failed to create PID file: No space left on device
Feb 17 08:46:25 Tower emhttpd: shcmd (35): exit status: 1
Feb 17 08:46:25 Tower emhttpd: shcmd (36): /etc/rc.d/rc.avahidnsconfd start
Feb 17 08:46:25 Tower root: Starting Avahi mDNS/DNS-SD DNS Server Configuration Daemon:  /usr/sbin/avahi-dnsconfd -D
Feb 17 08:46:25 Tower avahi-dnsconfd[13766]: write(): No space left on device
Feb 17 08:46:25 Tower avahi-dnsconfd[13766]: Failed to create PID file: No space left on device
Feb 17 08:46:25 Tower emhttpd: shcmd (36): exit status: 1
Feb 17 08:46:25 Tower emhttpd: Autostart enabled
Feb 17 08:46:27 Tower emhttpd: shcmd (41): /etc/rc.d/rc.php-fpm start
Feb 17 08:46:27 Tower php-fpm[13795]: [ERROR] Unable to write to the PID file.: No space left on device (28)
Feb 17 08:46:27 Tower php-fpm[13795]: [ERROR] FPM initialization failed
Feb 17 08:46:27 Tower root: Starting php-fpm [ERROR] Unable to write to the PID file.: No space left on device (28)
Feb 17 08:46:27 Tower root: [ERROR] FPM initialization failed
Feb 17 08:46:27 Tower root:  failed
Feb 17 08:46:27 Tower emhttpd: shcmd (41): exit status: 1
Feb 17 08:46:27 Tower emhttpd: shcmd (42): /etc/rc.d/rc.unraid-api install
Feb 17 08:46:28 Tower root: unraid-api installed
Feb 17 08:46:28 Tower emhttpd: shcmd (43): /etc/rc.d/rc.nginx start
Feb 17 08:46:29 Tower root: Starting Nginx server daemon...
Feb 17 08:46:29 Tower nginx: 2024/02/17 08:46:29 [crit] 13925#13925: pwrite() "/var/run/nginx.pid" failed (28: No space left on device)

 

Edited by bubbl3
fixed title
Link to comment

Added this to the go file and mostly solved the issue:

#increase /run size
/bin/mount -t tmpfs tmpfs /run -o remount,size=64M

image.png.9b98b704a68a291bbf8c1619d8d9a299.png

 

But dbus and elogind still fail since they run before the go file:
 

Feb 17 09:24:14 Tower dbus-daemon[3719]: Failed to start message bus: Failed to close "/var/run/dbus/dbus.pid": No space left on device
Feb 17 09:24:14 Tower elogind-daemon[3731]: Failed to write PID file /run/elogind.pid: No space left on device

 

Anywhere else the size of the /run mount can be changed so that it applies before anything else?

  • Thanks 1
Link to comment

Adding these lines to the go file seems to be working and no services complaining about dbus:

#increase /run size
/bin/mount -t tmpfs tmpfs /run -o remount,size=64M
#start dbus
/usr/bin/dbus-uuidgen --ensure
/usr/bin/dbus-daemon --system
#start elogind
/etc/rc.d/rc.elogind start

 

Still don't think this is a great solution, ideally the size of /run should be increased at the first mount before any service starts.

  • Thanks 1
Link to comment
6 hours ago, bubbl3 said:

Still don't think this is a great solution, ideally the size of /run should be increased at the first mount before any service starts.

PID files should not really take that much space.  Post the output of this command:

ls -la /run/

 

Link to comment
13 minutes ago, dlandon said:

PID files should not really take that much space.  Post the output of this command:

ls -la /run/

 

 

 

root@Tower:/# ls -la /run/
total 80
drwxr-xr-x 19 root  root   1040 Feb 17 17:59 ./
drwxr-xr-x 21 root  root    460 Feb 17 14:55 ../
-rw-r--r--  1 root  root      0 Feb 17 14:52 acpid.pid
srw-rw-rw-  1 root  root      0 Feb 17 14:52 acpid.socket=
-rw-------  1 root  root      0 Feb 17 14:53 agetty.reload
-rw-r--r--  1 root  root      6 Feb 17 14:53 apcupsd.pid
-rw-r--r--  1 root  root      0 Feb 17 14:52 atd.pid
drwxr-xr-x  2 avahi avahi    80 Feb 17 14:54 avahi-daemon/
-rw-r--r--  1 root  root      6 Feb 17 14:54 avahi-dnsconfd.pid
drwxr-xr-x  2 root  root     80 Feb 17 14:55 blkid/
drw-------  3 root  root     60 Feb 17 14:54 containerd/
drwxr-xr-x  2 root  root     40 Feb 17 14:52 cron/
drwx------  2 root  root     40 Feb 17 14:53 cryptsetup/
drwxr-xr-x  2 root  root     80 Feb 17 14:53 dbus/
drwx------  8 root  root    180 Feb 17 14:54 docker/
srw-rw----  1 root  docker    0 Feb 17 14:54 docker.sock=
-rw-r--r--  1 root  root      5 Feb 17 14:54 dockerd.pid
drwxr-xr-x  8 root  root    160 Feb 17 14:54 elogind/
-rw-r--r--  1 root  root      6 Feb 17 14:53 elogind.pid
srw-rw-rw-  1 root  root      0 Feb 17 14:53 emhttpd.socket=
drwxr-xr-x  2 root  root     40 Feb 17 14:52 faillock/
-rw-r--r--  1 root  root      6 Feb 17 14:53 inetd.pid
drwxr-xr-x 12 root  root    440 Feb 17 14:54 libvirt/
drwxr-xr-x  4 root  root     80 Feb 17 14:52 lock/
drwx------  2 root  root     40 Feb 17 14:52 lvm/
drwxr-xr-x  2 root  root     40 Feb 17 14:51 mount/
-rw-r--r--  1 root  root    216 Feb 17 17:30 nchan.pid
-rw-r--r--  1 root  root      6 Feb 17 14:53 nginx.pid
srw-rw-rw-  1 root  root      0 Feb 17 14:53 nginx.socket=
-rw-r--r--  1 root  root      0 Feb 17 14:52 ntpd.pid
-rw-r--r--  1 root  root      5 Feb 17 14:53 php-fpm.pid
srw-rw----  1 root  users     0 Feb 17 14:53 php5-fpm.sock=
-rw-------  1 root  root     25 Feb 17 14:53 qga.state
-rw-r--r--  1 rpc   rpc       6 Feb 17 14:53 rpc.statd.pid
drwxr-x---  2 rpc   root     40 Feb 17 14:53 rpcbind/
-r--r--r--  1 root  root      0 Feb 17 14:53 rpcbind.lock
srw-rw-rw-  1 root  root      0 Feb 17 14:53 rpcbind.sock=
-rw-r--r--  1 root  root      5 Feb 17 14:54 rsyslogd.pid
-rw-r--r--  1 root  root      1 Feb 17 14:52 runlevel
drwxr-xr-x  4 root  root     80 Feb 17 14:52 samba/
-rw-r--r--  1 root  root      6 Feb 17 17:58 samba-dcerpcd.pid
-rw-------  1 root  root      6 Feb 17 14:53 sm-notify.pid
-rw-r--r--  1 root  root      6 Feb 17 14:54 smbd.pid
-rw-r--r--  1 root  root      6 Feb 17 14:53 sshd.pid
-rw-r--r--  1 root  root      5 Feb 17 14:54 syslogd.pid
lrwxrwxrwx  1 root  root      7 Feb 17 14:52 systemd -> elogind/
drwxr-xr-x  8 root  root    180 Feb 17 16:56 udev/
srwxr-xr-x  1 root  root      0 Feb 17 14:53 unraid-api.sock=
drwxr-xr-x  3 root  root     60 Feb 17 14:55 user/
-rw-rw-r--  1 root  utmp   4224 Feb 17 14:55 utmp
-rw-r--r--  1 root  root      6 Feb 17 14:54 winbindd.pid
-rw-------  1 root  root      0 Feb 17 14:52 xtables.lock

 

Link to comment
10 hours ago, dlandon said:

Can you take a little closer look and see if you can figure out what is taking all the space?  I'm seeing Apparent size: 511.3 KiB.

That is correct, most of the files are 0 bytes and the total data that can be read from them is 512KB (confirmed by the output of tree --du -h -a /run > run.txt , attached here) , that said the block size is 4KB so every file will consume that amount of space on the partition. If you multiply the number of files for the minimum allocation you get a rough idea of why so much space is being taken: 9843 x 4 = 39372

If I learned anything in my years of working with file systems is that disk usage with small files gets often out of hand due to the default block size, NCDU is actually great for this, with tree or df you would be chasing ghosts as they just list the actual size.

 

This also doesn't happen with 6.12.6, so it must be some change made between that and 6.12.8

run.txt

Edited by bubbl3
Link to comment
2 hours ago, Amane said:

I found a solution:

 


Thanks, but this is not a solution, just a workaround. Would also suggest you to check your syslog after the change as you may also have DBUS failing (it starts before the GO file), which will cause all sort of underlaying issues:

On 2/17/2024 at 11:31 AM, bubbl3 said:

Adding these lines to the go file seems to be working and no services complaining about dbus:

#increase /run size
/bin/mount -t tmpfs tmpfs /run -o remount,size=64M
#start dbus
/usr/bin/dbus-uuidgen --ensure
/usr/bin/dbus-daemon --system
#start elogind
/etc/rc.d/rc.elogind start

 

Still don't think this is a great solution, ideally the size of /run should be increased at the first mount before any service starts.

 

At least this confirms why the bug is affecting us, you're on a 64 cores Threadripper and I'm on a 64 cores EPYC, we have more devices creating UDEV files than others, I bet their /run/udev is probably abnormally big as well even if it doesn't affect them.

Link to comment
  • bubbl3 changed the title to Services not starting after 6.12.8 upgrade (pid files cannot be created)
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.