kl0wn Posted October 23, 2018 Share Posted October 23, 2018 Ever morning my processor goes to 100% and just hangs there for hours until I reboot the server. This was never an issue before and is now a daily thing....the logs are giving me jack for troubleshooting Any thoughts?: Oct 22 20:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 22 20:53:04 Tower sshd[13608]: SSH: Server;Ltype: Kex;Remote: 192.168.2.109-10539;Enc: [email protected];MAC: <implicit>;Comp: [email protected] Oct 22 21:00:01 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 22 22:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 22 23:00:10 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 00:00:01 Tower Plugin Auto Update: Checking for available plugin updates Oct 23 00:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 00:00:28 Tower Plugin Auto Update: Community Applications Plugin Auto Update finished Oct 23 01:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 02:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 03:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 04:00:14 Tower root: /mnt/cache: 8 GiB (8576638976 bytes) trimmed Oct 23 04:00:14 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 05:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 06:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 07:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 08:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 09:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Oct 23 10:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Quote Link to comment
Delarius Posted October 23, 2018 Share Posted October 23, 2018 If you're able to login when the cpu is spiked - login/open terminal - however you do this and type: top This should show you the command that's using up your cpu cycles (press q to exit.) You should be able to see its PID number. You can definitely gather further information by checking the running processes. To see the exact command that's using up the cpu based on what you find in top: ps aux | grep PID# | grep -v grep example: ps aux | grep 1234 | grep -v grep That lists running processes - filters using the PID you found in top - then the second grep just strips out the grep process - which will find itself. This should give you some ideas as to what is causing your cpu spikes. Del Quote Link to comment
kl0wn Posted October 23, 2018 Author Share Posted October 23, 2018 I forgot to mention I did login to check TOP and nothing was jumping out at me as unusually high, which makes this thing even more confusing. I did notice that there was an IOWAIT the first time I had to bounce the box, which leads me to believe that there are some IO Operations hanging, thus causing the Kernel to go crazy. This did not happen prior to 6.6 so I'm wondering what changes were made that could cause this. Here is a screen of TOP when everything is normal, which is basically a mirror image of what it looks like when things are going haywire... Quote Link to comment
cybrnook Posted October 23, 2018 Share Posted October 23, 2018 (edited) That Plex Transcoder doesn't pop out at you? Memory looks a bit low too. Edited October 23, 2018 by cybrnook Quote Link to comment
kl0wn Posted October 23, 2018 Author Share Posted October 23, 2018 The Transcoder is going to fluctuate all day long but you're right 344% is a bit much haha. I've played around with Docker pinning but Plex seems to leak into other cores/ht regardless of what is set. I'll see what the Transcoder shows the next time this happens but I have 6-7 streams (some of those being transcodes) running every night with no issues. This is only happening in the morning so it would be nice to see a more verbose log output to identify what is kicking off or possibly causing this to happen. Quote Link to comment
kl0wn Posted October 23, 2018 Author Share Posted October 23, 2018 (edited) Edited October 23, 2018 by kl0wn added netdata screen Quote Link to comment
kl0wn Posted October 23, 2018 Author Share Posted October 23, 2018 There it is...top with a screen of Unraid showing 100% Quote Link to comment
kl0wn Posted October 23, 2018 Author Share Posted October 23, 2018 (edited) I'm starting to think this has to do with Mover causing the IOWAIT. I changed this to run every 4 hours, rather than every 1 hour and enabled logging. I'll report back with what I find. If anyone has other ideas, please let me know. EDIT: I found that my pihole docker, that was writing to a cache that was set to ONLY use the cache drive, somehow had files living on every disk in my environment....not sure how that's possible but it happened. I set the share to Cache Prefer --> Ran Mover --> All files were moved back to cache. I now switched the share back to Cache only --> Invoked Mover --> No crazy spike in CPU. My theory is when Mover was invoked it was touching all of the drives, thus causing the IOWAIT. I may be totally wrong but it's the best I got for now. Edited October 23, 2018 by kl0wn Quote Link to comment
cybrnook Posted October 23, 2018 Share Posted October 23, 2018 (edited) Just out of the gate, I think your system is under powered for what you're doing. If your CPU sit's in the %100 usage range, and memory is floating at %98 consumed, you are maxed out. And then if mover kicks in, you are likely staving other process of either CPU or MEM, and they are likely dying off. I would venture more to say it's time for a bigger boat. 🙂 No offense intended. Edited October 23, 2018 by cybrnook Quote Link to comment
kl0wn Posted October 23, 2018 Author Share Posted October 23, 2018 No offense taken my friend lol. I know that I DEFINITELY need a better/beefier box but it's just not in the cards right now. I could up the RAM but I don't want to dump funds into an old box that will eventually be upgraded to a platform that won't even support the RAM from this one. After reboot, my memory is at 37% so something was definitely hung. I do however plan to up the size of my Cache drive, that way I can just kick off Mover every morning at say 2AM rather than having it run every hour. Thanks for the input bud. Quote Link to comment
kl0wn Posted October 27, 2018 Author Share Posted October 27, 2018 The issue popped up again, so I submitted a bug and rolled back to 6.5.3, everything is now stable....so it's definitely something going on with that version. I'll hang out in 6.5.3 land Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.