Unraid 6.8.3 Random bouts of 100% CPU useage. Making server useless.

June 12, 20206 yr

Hey guys, this is the second time this has happened this week. The wife and kids are watching Plex and suddenly its starts to stutter and eventually stops playing. I remote into the server from work to see 100% CPU usage on all cores of my Ryzen 5 2600.

The first time I assumed it was the parity check that was running that was causing the issues, so I stopped it and rescheduled it for later and after a reboot everything was fine. But this time there was no parity check running, the mover wasn't running, Im not sure whats causing this issue. I decided NOT to reboot this time, and instead downloaded the diag (attached) and let it run it course. It did eventually stop and go back to normal CPU usage...but this shouldn't happen to begin with, and idk whats causing it.

UPDATE: Wife just told me its been fine all day until around 2:00-2:30 this afternoon. Its currently 5:13 as Im typing this where I am.

serverus-diagnostics-20200612-1702.zip

Quote

June 12, 20206 yr

The "TOP" log has all of your CPU usage in. You can also run "TOP" in the terminal to view it live next time you're maxing out your CPU. A copy of the top of your CPU usage:


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 8279 root      20   0 2978932 399240 127800 S  43.8   1.2 913:52.27 Web Conte+
 8143 root      20   0 4245108   1.8g 153688 S  18.8   5.8 226:54.19 firefox
 8326 root       0 -20       0      0      0 R  18.8   0.0  58:38.44 loop2
   10 root      20   0       0      0      0 I   6.2   0.0   6:17.75 rcu_sched
  855 root      20   0       0      0      0 S   6.2   0.0  75:47.48 kswapd0
 3740 root      20   0       0      0      0 I   6.2   0.0   0:49.65 kworker/u+
 6814 root      20   0  187516  64380  50764 S   6.2   0.2  98:08.30 Xorg
 7865 root      20   0       0      0      0 I   6.2   0.0   0:01.89 kworker/u+
15190 root      20   0    6788   3092   2180 R   6.2   0.0   0:00.01 top
15471 root      20   0 1631980 832512      0 S   6.2   2.5  59:29.26 xteve
20238 root      20   0   81144  11156   3840 D   6.2   0.0  18:19.88 Xvfb
31898 root      20   0       0      0      0 I   6.2   0.0   0:17.14 kworker/u+

Command on the far right shows what is using the CPU.

Quote

June 13, 20206 yr

Author

hmm, ok. I just ssh in and ran "top" im seeing several things with useage in the 20's, but im not sure what to do about any of them. They don't appear to be any of my containers that i have running. One of them is Firefox, could that be because im booted in GUI mode?

Interestingly enough if I run htop instead my CPU usage looks normal. But that doesn't change the fact that something is clearly wrong, my server is so over loaded Im going to have to reboot soon or noone can watch TV. But I really need to figure out why this keeps happening. Its so bad that not only is Plex not working, but the web UI for Unraid is just barely responsive.

Quote

June 13, 20206 yr

Community Expert

6 hours ago, relink said:

One of them is Firefox, could that be because im booted in GUI mode?

Yes, but the main culprit appears to be this one:

nobody   20324 47.4  3.5 10036260 1181400 ?    SNsl Jun09 2054:52  |   |       |   \_ /usr/local/crashplan/bin/CrashPlanService

Try shutting sown CrashPlan when it happens again.

Quote

June 13, 20206 yr

Author

I believe you may have been right. I had to try and do something so I just started shutting down containers until my CPU usage dropped. When I saw how big of a difference Crashplan made, I pinned it to a single core. now that single core is maxed out, and the rest of my CPU looks normal.

But I had this problem once before with crashplan, probably close to 2 years ago. I fixed it back then and it has not been an issue since. Any idea why it Would suddenly became a problem again?

Quote

June 13, 20206 yr

Community Expert

1 hour ago, relink said:

Any idea why it Would suddenly became a problem again?

Sorry, no, but you can ask on the Crashplan support thread.

Quote

June 13, 20206 yr

Author

I signed up for Backblaze B2 and the problem just magically went away. lol

This was not the only issue I've had with Crashplan, but it will be last. Thank you for your help.

Quote

June 16, 20206 yr

Author

Ok, I guess I spoke too soon. The issue just crept back up within the last hour. My son was watching a movie and I noticed it just stopped playing and when I checked the server, sure enough 100% usage on all cores. I attached an updated diag.

Here the kicker though, I went into the CPU pinning screen and set every single container and VM to a specific number of cores, and there is not one single thing that I have running on here that is able to use all the CPU cores. Most things are limited to 2-4 cores, plex is the most at 10 out of 12 cores.

Luckily I have learned that stopping and re-starting the array seems to fix the issues, so at least I don't have to perform a full reboot. But I have to get this fixed, unfortunately Im not sure whats causing it, especially since "top" and "htop" don't appear to be showing the whole picture.

serverus-diagnostics-20200615-2106.zip

Quote

June 16, 20206 yr

Community Expert

Firefox is again one the worse, have you tried not booting in GUI mode?

Quote

June 16, 20206 yr

Author

5 hours ago, johnnie.black said:

Firefox is again one the worse, have you tried not booting in GUI mode?

I havent, only because I heavily use GUI mode. But I suppose the next time this happens it wouldn't hurt to reboot without it.

Quote

June 16, 20206 yr

8 minutes ago, relink said:

I havent, only because I heavily use GUI mode.

What do you use the GUI for? It's supposed to only be used for server management, it's not meant for general website browsing.

Quote

June 16, 20206 yr

Author

Server Management. Its just the primary way I manage my server when im at home. Plus I keep the Unraid dash up 24/7 so I can see whats going on at a glance.

Quote

June 20, 20206 yr

Author

Ok so yesterday I rebooted without GUI mode, and today, just now, it happened again. I still cannot figure out whats causing this, but when it happens everything grinds to a halt.

Attached updated diag.

serverus-diagnostics-20200619-2042.zip

Quote

June 20, 20206 yr

Community Expert

You could try disabling all dockers and let it run for a few days, if all OK then start enabling one by one.

Quote

June 22, 20206 yr

Author

On 6/20/2020 at 2:41 AM, johnnie.black said:

You could try disabling all dockers and let it run for a few days, if all OK then start enabling one by one.

Ouch. There must be a better way to find out whats causing this.

Is there not a more accurate task manager that could possibly show whats causing 100% CPU useage? Also the last time around I noticed near 100% RAM usage too.

Quote

June 26, 20206 yr

Hi there,

Saw your email into support and wanted to chime in on your thread here. Unfortunately johnnie.black is right in that you're going to need to take the "one at a time" approach to figure out the root cause. The main problem here is that there wasn't some "event" that occurred prior to these issues that we can point to. Everything was fine until it wasn't. When issues like that happen, 99 times out of 100 it's because of something amiss with the hardware or a plugin/container update that broke something. Do you have your containers set to auto-update or do you manually update them?

You can absolutely check out HTOP through a command line (just type htop from a terminal session) and see a more detailed process reporting, but even then, you will likely still have to resort to shutting down all your containers, letting the system run for a while to see if the CPU usage spikes just randomly and if not, start slowly turning on containers one by one until you find the culprit. I wish I had better advice for you, but again, when the issues just come out of nowhere like this and there wasn't some event that occurred right before the issues manifested, there is just no other way to narrow it down.

Quote

June 26, 20206 yr

Author

As of the crash yesterday, I now only have the bare essential containers running and no VMs. If I can go a few days without another issue then I will start re-enabling things. If I crash again, then I will disable all containers and see what happens.

The part the I find confusing about this is that there is not a single container or VM in my system that has access to all CPU threads. Plex has access to the most and even its capped at 10 out of 12, and everything else is limited to between 2 and 4.

Quote

June 29, 20206 yr

I haven't read the thread fully, so apologies for that, but I'm curious:

Have you seen a 100% CPU crash from top/htop, or just from the GUI?

I ask because the GUI also takes into account iowait in the CPU usage. This will spike any time the system is waiting on I/O (ie, disks), so I'm wondering if you've got a dodgy HBA or similar causing crazy latency on your disks. This can look like high CPU, because you'll see the graphs max out, and everything will slow to a crawl, but it's actually just that nothing can pull the data it needs.

Quote

June 29, 20206 yr

Try this:

# screen (install it from nerdools if you don't have it)

#screen

#cd /

# while true; do ps -eocomm,pcpu | egrep -v '(0.0)|(%CPU)' >> cpu.log; echo "do a little dance, get down tonight"; sleep 1; done &

If the servers dies and your reboot get back to and do

# cd /

# tail -f cpu.log

# cat cpu.log |more and look for the app taking the most CPU?

But the idea of elimination as suggested here is the way to go, turn everything off and then turn each docker/container/vm on one at the time.

Edited June 29, 20206 yr by johnwhicker

Quote

June 30, 20206 yr

Author

So I think I managed to catch things as they were falling apart this time. It seems that the issue is coming from running out of RAM. I don't know how unraid handles that, does it have a swap file? if so where is it?

Anyway, I immediately ssh into unraid and ran htop and just simply didn't see anything using that much ram, same when running top...I just don't see anything using that much ram. Despite this, even with all containers and VMs stopped the ram usage never dropped below 54%. After restarting the array with all my main containers running I haven't gone over 19% ram usage.

I have attached 2 diags this time. The first one is from before I restarted the array with everything stopped except pihole, and unbound. The other is after restarting the array and with my main containers running.

serverus-diagnostics-20200629-2017.zip serverus-diagnostics-20200629-2013.zip

Quote

June 30, 20206 yr

Relink,

Hello... im not going to be a big help to you here... I can only share what happens with my system... I have the ryzen 2300g and 8gigs of ram...

I run there dockers only... and after nearly a year running with out issues I started to notice that my ram usage was 80%+ I would shutdown/reboot my dockers and it would bring things back inline to about 50%... and within a few days it would be back up to 80%...

So I dont know it's it's like a memory creep of unraid or not... but I just elected to buy more ram and since DDR4 prices have dropped so much I bought 16gigs more...

So what your describing is a bit more extreme then my situation, but i hope it might help

Quote

June 30, 20206 yr

Author

14 hours ago, -Daedalus said:

Have you seen a 100% CPU crash from top/htop, or just from the GUI?

This is exactly what I see. I only see the 100% usage in the GUI. In htop everything looks normal. But that still doesn't stop docker from becoming completely unresponsive.

I actually have had an issue with either my HBA or extender, im not sure which. But its an issue ive had for quite a while now, and this problem im having now is fairly new. But anyway, any time I go to reboot my unraid server I will generally have to reboot a minimum of 1-2 times to actually get all my disks to show up. On the first boot im guaranteed to have several disks missing from the array. However once I get all the disks to show up again, everything always seemed to have ran ok.

Quote

June 30, 20206 yr

Author

Im checking up on my server this morning and I'm already seeing the RAM usage getting up-to 72%, however htop shows the process using the most ram is Plex at only 8.7%, and the Plex dashboard confirms this number...CPU useage is between 20-30% which for the current load is only slightly above average, and isn't anything that would freak me out.

Quote

July 1, 20206 yr

Author

So Ive been going through every single line and setting on every single page of my unraid server trying to see if anything jumps out at me. One thing did, I have a plugin installed called "Dynamix Cache Directories", I don't remember if this comes with unraid or if I installed it. But anyway I read up on what it does and decided to try disabling it. Also this is by far the oldest plugin on my system showing the most current version to be "2018.12.04".

Since disabling it, which was only 2 days ago, I haven't crashed, and I've had RAM usage in the 50% range instead of 80+%, and CPU usage seems to be staying around or under 20%.

Quote

July 6, 20206 yr

Author

Im beginning to think it may be related to some disk problems I have been having. I have looked through my syslog server and see pretty consistent CRC errors from all of my drives. So I have all new cables on the way for my HBA and SAS expander.

I noticed a crash happened a couple minutes after adding a new series to sonar, so just as the new episodes began flooding into the array is when it locked up. That's what it seemed like anyway.

Cables will be here Wednesday, I guess Ill see what happens.

Quote

Unraid 6.8.3 Random bouts of 100% CPU useage. Making server useless.

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)