dukiethecorgi Posted March 9, 2017 Share Posted March 9, 2017 Serving up files Ok, but web interface is very slow. Logs show call traces and out of memory. Suggestions? Reboot and hope for the best? Logs attached tower-diagnostics-20170308-2237.zip Quote Link to comment
dukiethecorgi Posted March 9, 2017 Author Share Posted March 9, 2017 rebooting made everything work fine again, but I'm really curious why I had all those errors. Can someone take a look and explain what was happening? Quote Link to comment
RobJ Posted March 10, 2017 Share Posted March 10, 2017 This looks nearly identical to the issues that @al_uk had recently, looks like a huge memory leak. You had run fine without issues for quite a few days, then during a Mover session, with movie files being moved (probably very large ones), java and nginx and php all report "page allocation stalls", of 10 to 12 seconds, that continues and gets longer, then there's an OOM, with the largest java getting killed, which allowed it to continue for about 12.5 minutes without trouble, then it apparently runs out of free memory again, and the stalls are nearly continuous, and keep getting longer, over 30 seconds each. Then the mover finishes, and there's no more trouble. (But I would strongly recommend you reboot after that, which I believe you did.) I don't know for sure, but I *think* the stalls are garbage collection related, which would implicate java. I don't know if you can run without whatever is java-based for a week or 2, but that might end up being what you have to do. al_uk's post, problems and syslog look much like yours My response follows that one. I don't know exactly what the solution is here, but it would be good to hear from al_uk, what is working better for him. Quote Link to comment
al_uk Posted March 10, 2017 Share Posted March 10, 2017 Hello. I started a new post for my problems here During the problems, cadvisor was saying that my "hot" memory was close to my total memory. The problems remained even when running no VMs or dockers. I then had a bit of a cleanup and removed plugins such as cache dirs, and file integrity, and the problems went away. I also fixed all the problems shown by the "fix common errors". Now I am down to about 30% hot which is normal. I have 64GB RAM. I have had problems with Crashplan in the past, but that wasn't the issue this time. Quote Link to comment
dukiethecorgi Posted March 10, 2017 Author Share Posted March 10, 2017 (edited) First, rebooting the server seemed to fix the problem for the time being. I did disable the cache dirs plugin, which did have a noticeable effect on CPU usage, not so much on memory. Just prior to this happening, I added quite a few large (>12GB) video files to the server. I noticed the cache disk nearly filled. Mover is set to run every couple of hours, so perhaps the mover had issues with the big files. I'm also reconsidering caching my media share, since those files are mostly read and rarely write I appreciate the responses Edited March 10, 2017 by dukiethecorgi Quote Link to comment
RobJ Posted March 10, 2017 Share Posted March 10, 2017 6 hours ago, dukiethecorgi said: Mover is set to run every couple of hours, so perhaps the mover had issues with the big files. I'm also reconsidering caching my media share, since those files are mostly read and rarely write According to your syslog, your Mover is set to run every hour! As you are already reconsidering whether to cache it or not, until this issue is fully resolved, I agree it might be a good idea to not cache the largest files, for now. In both of your cases, I believe it took 3 or more days before the issue came to a head. I suspect therefore you could get away with rebooting every 2 or 3 days, and not see the problem. Obviously, that's an undesirable workaround, but hopefully it's only temporary, until someone spots what needs to change. Quote Link to comment
dukiethecorgi Posted March 19, 2017 Author Share Posted March 19, 2017 Thanks RobJ Changed the mover to once a day, removed some plugins, stopped caching files, and only use the cache drive for docker storage After all that, it still grinds to a stop after every 3-4 days. I'm seriously considering pulling all the data off and switching to another OS, this system just isn't usable as it exists. Is there any progress to finding the root cause of this problem? The forum has quite a few posts by people that are seeing call traces and unresponsive GUI, surely I'm not the only one seeing this problem. Could I roll back to an earlier version that doesn't have this issue? Quote Link to comment
RobJ Posted March 19, 2017 Share Posted March 19, 2017 28 minutes ago, dukiethecorgi said: Is there any progress to finding the root cause of this problem? The forum has quite a few posts by people that are seeing call traces and unresponsive GUI, surely I'm not the only one seeing this problem. Could I roll back to an earlier version that doesn't have this issue? You can roll back to 6.2.4, from the LimeTech download page, just replace the bz* files and reboot. However, users are reporting the the Docker support in 6.3 is not backward compatible with 6.2, so you will have to redo the Docker setup. I imagine you won't have to redo the Dockers themselves, just the main docker config and docker.img. It will be interesting to see if an earlier kernel works better for you. Quote Link to comment
dukiethecorgi Posted March 19, 2017 Author Share Posted March 19, 2017 That's certainly worth a try. The docker containers are trivial to recreate. I'll report back in a few days. Quote Link to comment
dukiethecorgi Posted April 8, 2017 Author Share Posted April 8, 2017 Just wanted to update, after upgrading to 6.3.3, I have not had any issues with call traces and out of memory. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.