Jump to content
kl0wn

6.6.3 - Processor Stuck @ 100% Ever Morning

12 posts in this topic Last Reply

Recommended Posts

Ever morning my processor goes to 100% and just hangs there for hours until I reboot the server. This was never an issue before and is now a daily thing....the logs are giving me jack for troubleshooting Any thoughts?:

 

Oct 22 20:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 22 20:53:04 Tower sshd[13608]: SSH: Server;Ltype: Kex;Remote: 192.168.2.109-10539;Enc: chacha20-poly1305@openssh.com;MAC: <implicit>;Comp: zlib@openssh.com
Oct 22 21:00:01 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 22 22:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 22 23:00:10 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 00:00:01 Tower Plugin Auto Update: Checking for available plugin updates
Oct 23 00:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 00:00:28 Tower Plugin Auto Update: Community Applications Plugin Auto Update finished
Oct 23 01:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 02:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 03:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 04:00:14 Tower root: /mnt/cache: 8 GiB (8576638976 bytes) trimmed
Oct 23 04:00:14 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 05:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 06:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 07:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 08:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 09:00:16 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Oct 23 10:00:07 Tower crond[1673]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null

Share this post


Link to post

If you're able to login when the cpu is spiked - login/open terminal - however you do this and type:

top

This should show you the command that's using up your cpu cycles (press q to exit.) You should be able to see its PID number. You can definitely gather further information by checking the running processes. To see the exact command that's using up the cpu based on what you find in top:

ps aux | grep PID# | grep -v grep

example: ps aux | grep 1234 | grep -v grep

That lists running processes - filters using the PID you found in top - then the second grep just strips out the grep process - which will find itself.

This should give you some ideas as to what is causing your cpu spikes.

Del

 

Share this post


Link to post

I forgot to mention I did login to check TOP and nothing was jumping out at me as unusually high, which makes this thing even more confusing. I did notice that there was an IOWAIT the first time I had to bounce the box, which leads me to believe that there are some IO Operations hanging, thus causing the Kernel to go crazy. This did not happen prior to 6.6 so I'm wondering what changes were made that could cause this. Here is a screen of TOP when everything is normal, which is basically a mirror image of what it looks like when things are going haywire...

 

image.thumb.png.e805b4171ca72f501b6d524d1c42c698.png

 

 

Share this post


Link to post

That Plex Transcoder doesn't pop out at you? Memory looks a bit low too.

Edited by cybrnook

Share this post


Link to post

The Transcoder is going to fluctuate all day long but you're right 344% is a bit much haha. I've played around with Docker pinning but Plex seems to leak into other cores/ht regardless of what is set. I'll see what the Transcoder shows the next time this happens but I have 6-7 streams (some of those being transcodes) running every night with no issues. This is only happening in the morning so it would be nice to see a more verbose log output to identify what is kicking off or possibly causing this to happen.

Share this post


Link to post

I'm starting to think this has to do with Mover causing the IOWAIT. I changed this to run every 4 hours, rather than every 1 hour and enabled logging. I'll report back with what I find. If anyone has other ideas, please let me know.

 

EDIT: I found that my pihole docker, that was writing to a cache that was set to ONLY use the cache drive, somehow had files living on every disk in my environment....not sure how that's possible but it happened. I set the share to Cache Prefer --> Ran Mover --> All files were moved back to cache. I now switched the share back to Cache only --> Invoked Mover --> No crazy spike in CPU. My theory is when Mover was invoked it was touching all of the drives, thus causing the IOWAIT. I may be totally wrong but it's the best I got for now.

Edited by kl0wn

Share this post


Link to post

Just out of the gate, I think your system is under powered for what you're doing. If your CPU sit's in the %100 usage range, and memory is floating at %98 consumed, you are maxed out. And then if mover kicks in, you are likely staving other process of either CPU or MEM, and they are likely dying off.

 

I would venture more to say it's time for a bigger boat. 🙂 No offense intended.

Edited by cybrnook

Share this post


Link to post

No offense taken my friend lol. I know that I DEFINITELY need a better/beefier box but it's just not in the cards right now. I could up the RAM but I don't want to dump funds into an old box that will eventually be upgraded to a platform that won't even support the RAM from this one. After reboot, my memory is at 37% so something was definitely hung. I do however plan to up the size of my Cache drive, that way I can just kick off Mover every morning at say 2AM rather than having it run every hour. Thanks for the input bud.

Share this post


Link to post

The issue popped up again, so I submitted a bug and rolled back to 6.5.3, everything is now stable....so it's definitely something going on with that version. I'll hang out in 6.5.3 land

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.