mtftl Posted July 26 Share Posted July 26 This is my first issue in over two years that I can't figure out, so hopefully I am sharing the right details. Situation: my Unraid 6.12.10 server goes unresponsive after 1-2 days. The GUI eventually gives 503 errors. The only "resolution" is to power cycle the box. This was the first case of instability in ~2 years of operation behind a UPS. It's been rock solid, with only an occasional clean shutdown/power up during bad storms. This started happening within a day after I made a change to my Jellyfin docker config. I have a single remote SMB share attached to it. I was given a warning for the first time that I should use slave mode. I made that change and a day later this unresponsive issue happened the first time ever. After repeated issues I've kept the container off and it doesn't fix anything. I've been through logs and either missed or couldn't find anything amiss other than an SMTP auth error for notifications that I have since fixed. In case it was a docker issue, I deleted the docker img and rebuilt it, adding in all previous apps using the recommended add container feature. Today, for the first time in over a month of these errors, I actually caught an error message from the GUI that the box ran out of memory and was killing low priority processes (it's happened silently before today). I managed to generate the attached diagnostics with the last gasp of my server before it went unresponsive again. If anyone can see what might be going on, I'd be greatly appreciative. I can't imagine it was that docker config, it was just spooky that my server went from 99.9999... reliability to broken on a bi-daily basis the next day. I was thinking it had to be hardware, but I can't find what is failing and the fact that I got an out of mem warning today vs. just the server going down has me completely confused. Thanks so much. tower-diagnostics-20240726-0858.zip Quote Link to comment
JorgeB Posted July 26 Share Posted July 26 Server is running OOM, it appears to be a container issue spawning endless processes, see if this helps with the Jellyfin container if you suspect that is the problem: https://forums.unraid.net/bug-reports/stable-releases/61210-cannot-fork-resource-temporarily-unavailable-r3020/?do=findComment&comment=28505 If not, recommend starting the containers one by one until you find the culprit. Quote Link to comment
mtftl Posted July 26 Author Share Posted July 26 Thanks, Jorge. I'll give that a shot. Since this is the first time I've recorded the out of memory error, is there any chance that this could have "failed silently" the other times in a way that I would not see in the logs? Quote Link to comment
JorgeB Posted July 26 Share Posted July 26 If it's a docker fork bomb issue, it usually crashes before there are OOM issues. Quote Link to comment
mtftl Posted July 28 Author Share Posted July 28 Still testing, but I just had my first unresponsive crash since making the change to Jellyfin, so that didn’t work directly. It’s going to be a long testing cycle since it takes a couple days each time but I’ll have to go one by one on my containers. Quote Link to comment
mtftl Posted July 30 Author Share Posted July 30 Another 2 days, another crash. I do not have logs from before the event, I did now turn on mirroring the syslog to the flash again. No out of memory error this time, but I did get an email alerts that appdata backup failed, along with docker stop reporting an error: Event: Appdata Backup Subject: [AppdataBackup] Error! Description: Please check the backup log! Importance: alert docker stop variant was unsuccessful as well! Docker said: I'm now planning to leave docker disabled to see if this fixes things. Quote Link to comment
mtftl Posted August 3 Author Share Posted August 3 After turning docker off completely, I’m at about 4 days uptime, the first time I have gone beyond 2 since this started. What are my suggested next moves?Assuming there is something wrong with docker or an app, what should my next move be? - I can move data and reformat my cache drive (where appdata is located). - I already killed and remade my docker.img so I doubt that will help - I can spend weeks enabling single docker services and wait for a crash. But what can I do if I find one is breaking things? - I can upgrade or downgrade Unraid. Oddly it’s been nothing but problems with this version despite it containing a fix seemingly related to my mounted remote smb share. I’m still baffled since nothing in my logs or seemingly diagnostics show anything wrong. Quote Link to comment
JonathanM Posted August 3 Share Posted August 3 1 hour ago, mtftl said: I can spend weeks enabling single docker services and wait for a crash. But what can I do if I find one is breaking things? Shouldn't take weeks if it crashes before 2 days. Enable half of your normally running containers. If it crashes, divide those in half. If it doesn't, disable that half and enable the other set. I recommend printing a list and noting the start time of each container and notate crash times, keeping track of which containers were running at that point. Shouldn't take more than a few cycles to narrow it down, unless it's a combination of containers that only crash when they interact, or you have 100's of containers. Bonus is, you get to continue using critical containers. Quote Link to comment
mtftl Posted August 8 Author Share Posted August 8 I remain baffled. If docker remains off, everything is okay; if any docker container is up, the system crashes out after 2 days, almost to the minute. I'm now able to see alerts related to app backup failing. This time it was specific to a backup file, not the docker error from before: Event: Appdata Backup Subject: [AppdataBackup] Error! Description: Please check the backup log! Importance: alert tar verification failed! Tar said: tar: Removing leading /' from hard link targets; mnt/user/appdata/pbs/logs/tasks/DF/UPID\:Tower\:00000009\:0AA590DF\:0000000F\:65D05DDB\:backup\:......\:: Contents differ It seems like something in my app data perhaps is corrupted. In absence of other ideas, I guess I will try to move app data to Array and back to see if this cleans it up. The cache drive seems okay with everything else which has me confused. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.