March 9, 20179 yr Serving up files Ok, but web interface is very slow. Logs show call traces and out of memory. Suggestions? Reboot and hope for the best? Logs attached tower-diagnostics-20170308-2237.zip
March 9, 20179 yr Author rebooting made everything work fine again, but I'm really curious why I had all those errors. Can someone take a look and explain what was happening?
March 10, 20179 yr This looks nearly identical to the issues that @al_uk had recently, looks like a huge memory leak. You had run fine without issues for quite a few days, then during a Mover session, with movie files being moved (probably very large ones), java and nginx and php all report "page allocation stalls", of 10 to 12 seconds, that continues and gets longer, then there's an OOM, with the largest java getting killed, which allowed it to continue for about 12.5 minutes without trouble, then it apparently runs out of free memory again, and the stalls are nearly continuous, and keep getting longer, over 30 seconds each. Then the mover finishes, and there's no more trouble. (But I would strongly recommend you reboot after that, which I believe you did.) I don't know for sure, but I *think* the stalls are garbage collection related, which would implicate java. I don't know if you can run without whatever is java-based for a week or 2, but that might end up being what you have to do. al_uk's post, problems and syslog look much like yours My response follows that one. I don't know exactly what the solution is here, but it would be good to hear from al_uk, what is working better for him.
March 10, 20179 yr Hello. I started a new post for my problems here During the problems, cadvisor was saying that my "hot" memory was close to my total memory. The problems remained even when running no VMs or dockers. I then had a bit of a cleanup and removed plugins such as cache dirs, and file integrity, and the problems went away. I also fixed all the problems shown by the "fix common errors". Now I am down to about 30% hot which is normal. I have 64GB RAM. I have had problems with Crashplan in the past, but that wasn't the issue this time.
March 10, 20179 yr Author First, rebooting the server seemed to fix the problem for the time being. I did disable the cache dirs plugin, which did have a noticeable effect on CPU usage, not so much on memory. Just prior to this happening, I added quite a few large (>12GB) video files to the server. I noticed the cache disk nearly filled. Mover is set to run every couple of hours, so perhaps the mover had issues with the big files. I'm also reconsidering caching my media share, since those files are mostly read and rarely write I appreciate the responses Edited March 10, 20179 yr by dukiethecorgi
March 10, 20179 yr 6 hours ago, dukiethecorgi said: Mover is set to run every couple of hours, so perhaps the mover had issues with the big files. I'm also reconsidering caching my media share, since those files are mostly read and rarely write According to your syslog, your Mover is set to run every hour! As you are already reconsidering whether to cache it or not, until this issue is fully resolved, I agree it might be a good idea to not cache the largest files, for now. In both of your cases, I believe it took 3 or more days before the issue came to a head. I suspect therefore you could get away with rebooting every 2 or 3 days, and not see the problem. Obviously, that's an undesirable workaround, but hopefully it's only temporary, until someone spots what needs to change.
March 19, 20179 yr Author Thanks RobJ Changed the mover to once a day, removed some plugins, stopped caching files, and only use the cache drive for docker storage After all that, it still grinds to a stop after every 3-4 days. I'm seriously considering pulling all the data off and switching to another OS, this system just isn't usable as it exists. Is there any progress to finding the root cause of this problem? The forum has quite a few posts by people that are seeing call traces and unresponsive GUI, surely I'm not the only one seeing this problem. Could I roll back to an earlier version that doesn't have this issue?
March 19, 20179 yr 28 minutes ago, dukiethecorgi said: Is there any progress to finding the root cause of this problem? The forum has quite a few posts by people that are seeing call traces and unresponsive GUI, surely I'm not the only one seeing this problem. Could I roll back to an earlier version that doesn't have this issue? You can roll back to 6.2.4, from the LimeTech download page, just replace the bz* files and reboot. However, users are reporting the the Docker support in 6.3 is not backward compatible with 6.2, so you will have to redo the Docker setup. I imagine you won't have to redo the Dockers themselves, just the main docker config and docker.img. It will be interesting to see if an earlier kernel works better for you.
March 19, 20179 yr Author That's certainly worth a try. The docker containers are trivial to recreate. I'll report back in a few days.
April 8, 20179 yr Author Just wanted to update, after upgrading to 6.3.3, I have not had any issues with call traces and out of memory.
Archived
This topic is now archived and is closed to further replies.