sazrocks Posted June 17, 2023 Share Posted June 17, 2023 My logs are being spammed with messages about a "Stale file handle" to the point that /var/log is 100% filled and I have to restart my server. Here's an excerpt: Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278312: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278314: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278316: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278318: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278321: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278322: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278324: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278326: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278329: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278330: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278331: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278335: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278337: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278338: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278339: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278342: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278799: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278800: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/278802: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/280171: Stale file handle Jun 16 04:40:16 Tower root: Cannot stat file /proc/10916/fd/280172: Stale file handle How can I fix this and/or stop it from filling up my logs? Quote Link to comment
JorgeB Posted June 17, 2023 Share Posted June 17, 2023 Please post the diagnostics. Quote Link to comment
sazrocks Posted June 18, 2023 Author Share Posted June 18, 2023 18 hours ago, JorgeB said: Please post the diagnostics. Trying, but every time I try to download the diagnostics the tab in chrome eventually crashes with an out of memory error. Trying now on Edge, but it's been probably about 20 minutes so far with memory usage slowly increasing. Seems like it's trying to run sed on a bunch of files in my cache drive: If it ever finishes I'll go ahead and upload the diagnostics. Quote Link to comment
sazrocks Posted June 18, 2023 Author Share Posted June 18, 2023 18 hours ago, JorgeB said: Please post the diagnostics. Here we go, about 2GB of ram and 40 minutes later, here are the diagnostics: tower-diagnostics-20230617-2056.zip Quote Link to comment
JorgeB Posted June 18, 2023 Share Posted June 18, 2023 Nothing obvious that I can see, are you using Time Machine with that share? Also is NFS enabled for any shares? Quote Link to comment
dlandon Posted June 18, 2023 Share Posted June 18, 2023 You have multiple things going on: Jun 16 19:45:29 Tower rc.docker: pihole: Error response from daemon: network br0 not found Jun 16 19:45:29 Tower rc.docker: Error: failed to start containers: pihole Jun 16 19:46:15 Tower root: Minecraft-polytech-direwolf20-1.19: Could not download icon https://vignette.wikia.nocookie.net/minecraft/images/7/7b/Grass_block.png Jun 16 19:46:15 Tower root: urBackup-Server: Could not download icon http://192.168.1.90:55414/images/urbackup.png Jun 16 19:46:24 Tower smbd[17971]: [2023/06/16 19:46:24.485175, 0] ../../source3/smbd/close.c:1397(close_directory) Jun 16 19:46:24 Tower smbd[17971]: close_directory: Could not get share mode lock for polytech-direwolf20-1.19 Jun 16 19:46:24 Tower smbd[17971]: [2023/06/16 19:46:24.485224, 0] ../../source3/smbd/fd_handle.c:39(fd_handle_destructor) Jun 16 19:51:00 Tower root: Fix Common Problems Version 2023.04.26 Jun 16 19:51:00 Tower root: Fix Common Problems: Error: eth0 does not have a valid IP address You have networking issues you need to straighten out. I'd also get all your Docker Containers updated and turn off mover logging. 1 Quote Link to comment
sazrocks Posted June 18, 2023 Author Share Posted June 18, 2023 3 hours ago, dlandon said: You have networking issues you need to straighten out. Working on it already: 3 hours ago, dlandon said: I'd also get all your Docker Containers updated and turn off mover logging. I'll update the containers and try turning off mover logging. Are you saying the stale file handle messages are from the mover? 5 hours ago, JorgeB said: Nothing obvious that I can see, are you using Time Machine with that share? Also is NFS enabled for any shares? I have time machine enabled for that share in anticipation of potentially using it with time machine, but I don't have anything actually using it for time machine. NFS is disabled for all shares. Quote Link to comment
dlandon Posted June 18, 2023 Share Posted June 18, 2023 1 minute ago, sazrocks said: I'll update the containers and try turning off mover logging. Are you saying the stale file handle messages are from the mover? Not necessarily, the mover is very chatty and makes the log hard to read though. That is not helping your log filling issue. That being said, the stale file handle messages seem to appear when the mover is running. Once you get your networking sorted out, post the diagnostics when the stale file handle messages appear again. I would also try running your system without any Docker Containers running to see if that's contributing to the stale file handle messages. I saw some log messages about the minecraft Docker Container that didn't look right. Quote Link to comment
sazrocks Posted June 18, 2023 Author Share Posted June 18, 2023 16 minutes ago, dlandon said: Not necessarily, the mover is very chatty and makes the log hard to read though. That is not helping your log filling issue. That being said, the stale file handle messages seem to appear when the mover is running. Once you get your networking sorted out, post the diagnostics when the stale file handle messages appear again. Makes sense. I’ll monitor the log now that I’ve disabled mover logging and will post diagnostics if/when it happens again. 17 minutes ago, dlandon said: I would also try running your system without any Docker Containers running to see if that's contributing to the stale file handle messages. I saw some log messages about the minecraft Docker Container that didn't look right. Unfortunately I run a number of services not only just for myself that I try to minimize downtime for, so turning off all docker containers would be something I’d like to save for later if possible. You say you saw some logs from a minecraft docker that don’t look right, is this the message you’re talking about? Quote close_directory: Could not get share mode lock for polytech-direwolf20-1.19 This is one of my currently running containers, not sure what the message means exactly. I haven’t had any strange behavior from the container. Quote Link to comment
sazrocks Posted June 29, 2023 Author Share Posted June 29, 2023 @dlandonTried to start up a VM today and got the message that it couldn't because there was no space left on the log device: /var/log is indeed completely full: I was able to get the diagnostics downloaded, and have attached them to this post. I have *not* rebooted yet, so the system is still in this state if we need to do any further investigation. tower-diagnostics-20230629-0106.zip Quote Link to comment
dlandon Posted June 29, 2023 Share Posted June 29, 2023 Here is what I see: Jun 23 04:40:02 Tower root: Fix Common Problems Version 2023.04.26 Jun 23 04:40:03 Tower root: Fix Common Problems: Warning: Unraid OS not up to date Jun 23 04:40:03 Tower root: Fix Common Problems: Warning: Docker Application thespaghettidetective_redis_1 has an update available for it Jun 23 04:40:03 Tower root: Fix Common Problems: Warning: Share Secure-storage is set for both included (disk4) and excluded (disk1,disk2,disk3) disks Jun 23 04:40:03 Tower root: Fix Common Problems: Warning: Share VMs is set for both included (disk3) and excluded (disk1,disk2) disks Jun 23 04:40:03 Tower root: Fix Common Problems: Warning: Deprecated plugin ca.backup2.plg Jun 23 04:40:05 Tower root: Fix Common Problems: Warning: Docker application Dolphin has moderator comments listed Jun 23 04:40:05 Tower root: Fix Common Problems: Warning: Docker application PlexMediaServer has moderator comments listed Jun 23 04:40:05 Tower root: Fix Common Problems: Warning: Docker application steamcache-DNS has moderator comments listed ** Ignored Jun 23 04:40:06 Tower root: Fix Common Problems: Warning: NerdPack.plg Not Compatible with Unraid version 6.11.5 Jun 23 04:40:07 Tower root: minecraft-James-revelation: Could not download icon https://vignette.wikia.nocookie.net/minecraft/images/7/7b/Grass_block.png Jun 23 04:40:07 Tower root: Minecraft-polytech-direwolf20-1.19: Could not download icon https://vignette.wikia.nocookie.net/minecraft/images/7/7b/Grass_block.png Jun 23 04:40:08 Tower root: urBackup-Server: Could not download icon http://192.168.1.90:55414/images/urbackup.png Jun 23 04:40:10 Tower root: minecraft-James-revelation: Could not download icon https://vignette.wikia.nocookie.net/minecraft/images/7/7b/Grass_block.png Jun 23 04:40:11 Tower root: Minecraft-polytech-direwolf20-1.19: Could not download icon https://vignette.wikia.nocookie.net/minecraft/images/7/7b/Grass_block.png Jun 23 04:40:11 Tower root: urBackup-Server: Could not download icon http://192.168.1.90:55414/images/urbackup.png Jun 23 04:40:11 Tower root: minecraft-James-revelation: Could not download icon https://vignette.wikia.nocookie.net/minecraft/images/7/7b/Grass_block.png Jun 23 04:40:11 Tower root: Minecraft-polytech-direwolf20-1.19: Could not download icon https://vignette.wikia.nocookie.net/minecraft/images/7/7b/Grass_block.png Jun 23 04:40:12 Tower root: urBackup-Server: Could not download icon http://192.168.1.90:55414/images/urbackup.png Jun 23 04:40:12 Tower root: Fix Common Problems: Warning: Write Cache is disabled on disk5 Jun 23 04:40:13 Tower root: Fix Common Problems: Other Warning: Unassigned Devices Plus not installed Jun 23 04:40:14 Tower root: Fix Common Problems: Warning: Docker Update Patch not installed Jun 23 04:47:31 Tower apcupsd[7934]: Communications with UPS lost. Here are my suggestions: Go through all the FCP issues and get them sorted out. Be sure to remove the NerdPack plugin as it is not compatible with 6.11. You have some disk mapping issues. Find out why your UPS keeps losing communications. Run Unraid without any Docker containers running and see if the log messages and filling of the log stops. Quote Link to comment
sazrocks Posted June 29, 2023 Author Share Posted June 29, 2023 2 hours ago, dlandon said: Here are my suggestions: Go through all the FCP issues and get them sorted out. Be sure to remove the NerdPack plugin as it is not compatible with 6.11. You have some disk mapping issues. Done, except for migrating to 6.12. I see way too many issues with that here in these forums and I can't afford to deal with that right now. Quote Find out why your UPS keeps losing communications. It's not connected. I went ahead and turned off the daemon. Quote Run Unraid without any Docker containers running and see if the log messages and filling of the log stops. This is not really an option. It takes the better part of 10 days for the issue to manifest after a reboot, and the primary purpose of this server is to run a number of services using docker. I might as well just have the server turned off for 9 days. Is there any way at all to investigate if docker is the problem without rendering my server useless for a week? I also went ahead and rebooted so that at least my VM can run. Will post here again if the issue recurs. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.