bubbl3 Posted February 17 Share Posted February 17 (edited) Rebooted after upgrade from 6.12.6, seems services cannot create pid files for lack of space on the device, not sure on which storage they are created, the USB is only 13% used. Of course since nothing starts I guess rolling back will be a pita. Feb 17 08:46:24 Tower emhttpd: shcmd (35): /etc/rc.d/rc.avahidaemon start Feb 17 08:46:24 Tower root: Starting Avahi mDNS/DNS-SD Daemon: /usr/sbin/avahi-daemon -D Feb 17 08:46:24 Tower avahi-daemon[13749]: Found user 'avahi' (UID 61) and group 'avahi' (GID 214). Feb 17 08:46:25 Tower winbindd[13609]: [2024/02/17 08:46:25.024111, 0] ../../source3/winbindd/winbindd_samr.c:71(open_internal_samr_conn) Feb 17 08:46:25 Tower winbindd[13609]: open_internal_samr_conn: Could not connect to samr pipe: NT_STATUS_CONNECTION_DISCONNECTED Feb 17 08:46:25 Tower avahi-daemon[13749]: Successfully dropped root privileges. Feb 17 08:46:25 Tower avahi-daemon[13749]: write(): No space left on device Feb 17 08:46:25 Tower avahi-daemon[13749]: Failed to create PID file: No space left on device Feb 17 08:46:25 Tower emhttpd: shcmd (35): exit status: 1 Feb 17 08:46:25 Tower emhttpd: shcmd (36): /etc/rc.d/rc.avahidnsconfd start Feb 17 08:46:25 Tower root: Starting Avahi mDNS/DNS-SD DNS Server Configuration Daemon: /usr/sbin/avahi-dnsconfd -D Feb 17 08:46:25 Tower avahi-dnsconfd[13766]: write(): No space left on device Feb 17 08:46:25 Tower avahi-dnsconfd[13766]: Failed to create PID file: No space left on device Feb 17 08:46:25 Tower emhttpd: shcmd (36): exit status: 1 Feb 17 08:46:25 Tower emhttpd: Autostart enabled Feb 17 08:46:27 Tower emhttpd: shcmd (41): /etc/rc.d/rc.php-fpm start Feb 17 08:46:27 Tower php-fpm[13795]: [ERROR] Unable to write to the PID file.: No space left on device (28) Feb 17 08:46:27 Tower php-fpm[13795]: [ERROR] FPM initialization failed Feb 17 08:46:27 Tower root: Starting php-fpm [ERROR] Unable to write to the PID file.: No space left on device (28) Feb 17 08:46:27 Tower root: [ERROR] FPM initialization failed Feb 17 08:46:27 Tower root: failed Feb 17 08:46:27 Tower emhttpd: shcmd (41): exit status: 1 Feb 17 08:46:27 Tower emhttpd: shcmd (42): /etc/rc.d/rc.unraid-api install Feb 17 08:46:28 Tower root: unraid-api installed Feb 17 08:46:28 Tower emhttpd: shcmd (43): /etc/rc.d/rc.nginx start Feb 17 08:46:29 Tower root: Starting Nginx server daemon... Feb 17 08:46:29 Tower nginx: 2024/02/17 08:46:29 [crit] 13925#13925: pwrite() "/var/run/nginx.pid" failed (28: No space left on device) Edited February 19 by bubbl3 fixed title Quote Link to comment
bubbl3 Posted February 17 Author Share Posted February 17 In case you were going to ask /var/run is writable: lrwxrwxrwx 1 root root 4 Feb 16 01:14 run -> /run/ Quote Link to comment
bubbl3 Posted February 17 Author Share Posted February 17 (edited) But it does look like tmpfs is not large enough: Edited February 17 by bubbl3 1 Quote Link to comment
bubbl3 Posted February 17 Author Share Posted February 17 What's the best way to increase the size of /run at boot? Quote Link to comment
bubbl3 Posted February 17 Author Share Posted February 17 Added this to the go file and mostly solved the issue: #increase /run size /bin/mount -t tmpfs tmpfs /run -o remount,size=64M But dbus and elogind still fail since they run before the go file: Feb 17 09:24:14 Tower dbus-daemon[3719]: Failed to start message bus: Failed to close "/var/run/dbus/dbus.pid": No space left on device Feb 17 09:24:14 Tower elogind-daemon[3731]: Failed to write PID file /run/elogind.pid: No space left on device Anywhere else the size of the /run mount can be changed so that it applies before anything else? 1 Quote Link to comment
bubbl3 Posted February 17 Author Share Posted February 17 Adding these lines to the go file seems to be working and no services complaining about dbus: #increase /run size /bin/mount -t tmpfs tmpfs /run -o remount,size=64M #start dbus /usr/bin/dbus-uuidgen --ensure /usr/bin/dbus-daemon --system #start elogind /etc/rc.d/rc.elogind start Still don't think this is a great solution, ideally the size of /run should be increased at the first mount before any service starts. 1 Quote Link to comment
bubbl3 Posted February 17 Author Share Posted February 17 Diagnostic file attached here. tower-diagnostics-20240217-1149.zip Quote Link to comment
dlandon Posted February 17 Share Posted February 17 6 hours ago, bubbl3 said: Still don't think this is a great solution, ideally the size of /run should be increased at the first mount before any service starts. PID files should not really take that much space. Post the output of this command: ls -la /run/ Quote Link to comment
bubbl3 Posted February 17 Author Share Posted February 17 13 minutes ago, dlandon said: PID files should not really take that much space. Post the output of this command: ls -la /run/ root@Tower:/# ls -la /run/ total 80 drwxr-xr-x 19 root root 1040 Feb 17 17:59 ./ drwxr-xr-x 21 root root 460 Feb 17 14:55 ../ -rw-r--r-- 1 root root 0 Feb 17 14:52 acpid.pid srw-rw-rw- 1 root root 0 Feb 17 14:52 acpid.socket= -rw------- 1 root root 0 Feb 17 14:53 agetty.reload -rw-r--r-- 1 root root 6 Feb 17 14:53 apcupsd.pid -rw-r--r-- 1 root root 0 Feb 17 14:52 atd.pid drwxr-xr-x 2 avahi avahi 80 Feb 17 14:54 avahi-daemon/ -rw-r--r-- 1 root root 6 Feb 17 14:54 avahi-dnsconfd.pid drwxr-xr-x 2 root root 80 Feb 17 14:55 blkid/ drw------- 3 root root 60 Feb 17 14:54 containerd/ drwxr-xr-x 2 root root 40 Feb 17 14:52 cron/ drwx------ 2 root root 40 Feb 17 14:53 cryptsetup/ drwxr-xr-x 2 root root 80 Feb 17 14:53 dbus/ drwx------ 8 root root 180 Feb 17 14:54 docker/ srw-rw---- 1 root docker 0 Feb 17 14:54 docker.sock= -rw-r--r-- 1 root root 5 Feb 17 14:54 dockerd.pid drwxr-xr-x 8 root root 160 Feb 17 14:54 elogind/ -rw-r--r-- 1 root root 6 Feb 17 14:53 elogind.pid srw-rw-rw- 1 root root 0 Feb 17 14:53 emhttpd.socket= drwxr-xr-x 2 root root 40 Feb 17 14:52 faillock/ -rw-r--r-- 1 root root 6 Feb 17 14:53 inetd.pid drwxr-xr-x 12 root root 440 Feb 17 14:54 libvirt/ drwxr-xr-x 4 root root 80 Feb 17 14:52 lock/ drwx------ 2 root root 40 Feb 17 14:52 lvm/ drwxr-xr-x 2 root root 40 Feb 17 14:51 mount/ -rw-r--r-- 1 root root 216 Feb 17 17:30 nchan.pid -rw-r--r-- 1 root root 6 Feb 17 14:53 nginx.pid srw-rw-rw- 1 root root 0 Feb 17 14:53 nginx.socket= -rw-r--r-- 1 root root 0 Feb 17 14:52 ntpd.pid -rw-r--r-- 1 root root 5 Feb 17 14:53 php-fpm.pid srw-rw---- 1 root users 0 Feb 17 14:53 php5-fpm.sock= -rw------- 1 root root 25 Feb 17 14:53 qga.state -rw-r--r-- 1 rpc rpc 6 Feb 17 14:53 rpc.statd.pid drwxr-x--- 2 rpc root 40 Feb 17 14:53 rpcbind/ -r--r--r-- 1 root root 0 Feb 17 14:53 rpcbind.lock srw-rw-rw- 1 root root 0 Feb 17 14:53 rpcbind.sock= -rw-r--r-- 1 root root 5 Feb 17 14:54 rsyslogd.pid -rw-r--r-- 1 root root 1 Feb 17 14:52 runlevel drwxr-xr-x 4 root root 80 Feb 17 14:52 samba/ -rw-r--r-- 1 root root 6 Feb 17 17:58 samba-dcerpcd.pid -rw------- 1 root root 6 Feb 17 14:53 sm-notify.pid -rw-r--r-- 1 root root 6 Feb 17 14:54 smbd.pid -rw-r--r-- 1 root root 6 Feb 17 14:53 sshd.pid -rw-r--r-- 1 root root 5 Feb 17 14:54 syslogd.pid lrwxrwxrwx 1 root root 7 Feb 17 14:52 systemd -> elogind/ drwxr-xr-x 8 root root 180 Feb 17 16:56 udev/ srwxr-xr-x 1 root root 0 Feb 17 14:53 unraid-api.sock= drwxr-xr-x 3 root root 60 Feb 17 14:55 user/ -rw-rw-r-- 1 root utmp 4224 Feb 17 14:55 utmp -rw-r--r-- 1 root root 6 Feb 17 14:54 winbindd.pid -rw------- 1 root root 0 Feb 17 14:52 xtables.lock Quote Link to comment
bubbl3 Posted February 17 Author Share Posted February 17 Looks like udev is taking all the space: Quote Link to comment
bubbl3 Posted February 17 Author Share Posted February 17 Bunch of 4KB files in there, weird: Quote Link to comment
dlandon Posted February 17 Share Posted February 17 We're going to take a look at this. I believe it may be a bug. 1 Quote Link to comment
dlandon Posted February 17 Share Posted February 17 3 hours ago, bubbl3 said: Bunch of 4KB files in there, weird: Can you take a little closer look and see if you can figure out what is taking all the space? I'm seeing Apparent size: 511.3 KiB. Quote Link to comment
bubbl3 Posted February 18 Author Share Posted February 18 (edited) 10 hours ago, dlandon said: Can you take a little closer look and see if you can figure out what is taking all the space? I'm seeing Apparent size: 511.3 KiB. That is correct, most of the files are 0 bytes and the total data that can be read from them is 512KB (confirmed by the output of tree --du -h -a /run > run.txt , attached here) , that said the block size is 4KB so every file will consume that amount of space on the partition. If you multiply the number of files for the minimum allocation you get a rough idea of why so much space is being taken: 9843 x 4 = 39372 If I learned anything in my years of working with file systems is that disk usage with small files gets often out of hand due to the default block size, NCDU is actually great for this, with tree or df you would be chasing ghosts as they just list the actual size. This also doesn't happen with 6.12.6, so it must be some change made between that and 6.12.8 run.txt Edited February 18 by bubbl3 Quote Link to comment
bubbl3 Posted February 18 Author Share Posted February 18 Also look here, all 0 bytes files in /run/udev/data are using 4KB: Quote Link to comment
bubbl3 Posted February 18 Author Share Posted February 18 2 hours ago, Amane said: I found a solution: Thanks, but this is not a solution, just a workaround. Would also suggest you to check your syslog after the change as you may also have DBUS failing (it starts before the GO file), which will cause all sort of underlaying issues: On 2/17/2024 at 11:31 AM, bubbl3 said: Adding these lines to the go file seems to be working and no services complaining about dbus: #increase /run size /bin/mount -t tmpfs tmpfs /run -o remount,size=64M #start dbus /usr/bin/dbus-uuidgen --ensure /usr/bin/dbus-daemon --system #start elogind /etc/rc.d/rc.elogind start Still don't think this is a great solution, ideally the size of /run should be increased at the first mount before any service starts. At least this confirms why the bug is affecting us, you're on a 64 cores Threadripper and I'm on a 64 cores EPYC, we have more devices creating UDEV files than others, I bet their /run/udev is probably abnormally big as well even if it doesn't affect them. Quote Link to comment
dlandon Posted February 18 Share Posted February 18 The remount is only a work around. We will have to come up with a solution. Quote Link to comment
Amane Posted February 18 Share Posted February 18 Yes, of course you're right, it's probably because I'm happy that my system is working again that I called it a solution.. 😅 Quote Link to comment
bubbl3 Posted February 19 Author Share Posted February 19 For sake of completion, disabling SMT seems to be a better workaround if that's an option in your setup: https://forums.unraid.net/bug-reports/stable-releases/6128-no-webgui-run-full-r2854/?do=findComment&comment=27345 Quote Link to comment
Willmsy32 Posted February 26 Share Posted February 26 @dlandonRan into a similar problem myself on a private server. Was able to SSH in, confirmed /udev getting too big, tried increasing it using the steps above but never quite got it to a point where it would work. Grabbed some stuff before eventually deciding to roll back to 6.12.4. Figured I would share some logs in case it helps devs investigate. deepthought-diagnostics-20240225-1149.zip Quote Link to comment
JorgeB Posted February 26 Share Posted February 26 10 hours ago, Willmsy32 said: Ran into a similar problem It's the same issue, seems to affect users with a 128 thread CPU, it should be fixed for next release, for now you can resize /run or disable SMT. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.