6.6.3 - Processor Stuck @ 100% Ever Morning

kl0wn · October 23, 2018

Ever morning my processor goes to 100% and just hangs there for hours until I reboot the server. This was never an issue before and is now a daily thing....the logs are giving me jack for troubleshooting Any thoughts?:

Oct 22 20:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 22 20:53:04 Tower sshd[13608]: SSH: Server;Ltype: Kex;Remote: 192.168.2.109-10539;Enc: [email protected];MAC: <implicit>;Comp: [email protected]
Oct 22 21:00:01 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 22 22:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 22 23:00:10 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 00:00:01 Tower Plugin Auto Update: Checking for available plugin updates
Oct 23 00:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 00:00:28 Tower Plugin Auto Update: Community Applications Plugin Auto Update finished
Oct 23 01:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 02:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 03:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 04:00:14 Tower root: /mnt/cache: 8 GiB (8576638976 bytes) trimmed
Oct 23 04:00:14 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 05:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 06:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 07:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 08:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 09:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 10:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null

kl0wn · October 23, 2018

Delarius · October 23, 2018

If you're able to login when the cpu is spiked - login/open terminal - however you do this and type:

top

This should show you the command that's using up your cpu cycles (press q to exit.) You should be able to see its PID number. You can definitely gather further information by checking the running processes. To see the exact command that's using up the cpu based on what you find in top:

ps aux | grep PID# | grep -v grep

example: ps aux | grep 1234 | grep -v grep

That lists running processes - filters using the PID you found in top - then the second grep just strips out the grep process - which will find itself.

This should give you some ideas as to what is causing your cpu spikes.

Del

kl0wn · October 23, 2018

I forgot to mention I did login to check TOP and nothing was jumping out at me as unusually high, which makes this thing even more confusing. I did notice that there was an IOWAIT the first time I had to bounce the box, which leads me to believe that there are some IO Operations hanging, thus causing the Kernel to go crazy. This did not happen prior to 6.6 so I'm wondering what changes were made that could cause this. Here is a screen of TOP when everything is normal, which is basically a mirror image of what it looks like when things are going haywire...

cybrnook · October 23, 2018

That Plex Transcoder doesn't pop out at you? Memory looks a bit low too.

Edited October 23, 2018 by cybrnook

kl0wn · October 23, 2018

The Transcoder is going to fluctuate all day long but you're right 344% is a bit much haha. I've played around with Docker pinning but Plex seems to leak into other cores/ht regardless of what is set. I'll see what the Transcoder shows the next time this happens but I have 6-7 streams (some of those being transcodes) running every night with no issues. This is only happening in the morning so it would be nice to see a more verbose log output to identify what is kicking off or possibly causing this to happen.

kl0wn · October 23, 2018

image.png.3ec1f7240d76b3ebf740842825218b30.png

Edited October 23, 2018 by kl0wn
added netdata screen

kl0wn · October 23, 2018

There it is...top with a screen of Unraid showing 100%

kl0wn · October 23, 2018

I'm starting to think this has to do with Mover causing the IOWAIT. I changed this to run every 4 hours, rather than every 1 hour and enabled logging. I'll report back with what I find. If anyone has other ideas, please let me know.

EDIT: I found that my pihole docker, that was writing to a cache that was set to ONLY use the cache drive, somehow had files living on every disk in my environment....not sure how that's possible but it happened. I set the share to Cache Prefer --> Ran Mover --> All files were moved back to cache. I now switched the share back to Cache only --> Invoked Mover --> No crazy spike in CPU. My theory is when Mover was invoked it was touching all of the drives, thus causing the IOWAIT. I may be totally wrong but it's the best I got for now.

Edited October 23, 2018 by kl0wn

cybrnook · October 23, 2018

Just out of the gate, I think your system is under powered for what you're doing. If your CPU sit's in the %100 usage range, and memory is floating at %98 consumed, you are maxed out. And then if mover kicks in, you are likely staving other process of either CPU or MEM, and they are likely dying off.

I would venture more to say it's time for a bigger boat. 🙂 No offense intended.

Edited October 23, 2018 by cybrnook

kl0wn · October 23, 2018

No offense taken my friend lol. I know that I DEFINITELY need a better/beefier box but it's just not in the cards right now. I could up the RAM but I don't want to dump funds into an old box that will eventually be upgraded to a platform that won't even support the RAM from this one. After reboot, my memory is at 37% so something was definitely hung. I do however plan to up the size of my Cache drive, that way I can just kick off Mover every morning at say 2AM rather than having it run every hour. Thanks for the input bud.

kl0wn · October 27, 2018

The issue popped up again, so I submitted a bug and rolled back to 6.5.3, everything is now stable....so it's definitely something going on with that version. I'll hang out in 6.5.3 land

6.6.3 - Processor Stuck @ 100% Ever Morning

Recommended Posts

kl0wn

Link to comment

kl0wn

Link to comment

Delarius

Link to comment

kl0wn

Link to comment

cybrnook

Link to comment

kl0wn

Link to comment

kl0wn

Link to comment

kl0wn

Link to comment

kl0wn

Link to comment

cybrnook

Link to comment

kl0wn

Link to comment

kl0wn

Link to comment

Join the conversation