100% CPU in Dashboard and unresponsive, but top shows otherwise

boyd91 · December 30, 2018

Sometimes (all) my services running in Docker containers are totally unresponsive and giving timeouts on every http request. Only the unraid GUI still works as fast as always, except for the Docker page, that one takes very long to load. But it will load eventually and there's no single container that's using much CPU or mem. I've experienced this 4-6 times already and only a reboot would fix it.

When this happens I can however log in to the unraid web GUI without any trouble. The dashboard bars show 100% CPU and memory usage on all my cores. But when I run `top` in the terminal, I see very different (and normal) CPU and mem usage.

I was hoping to be able to provide some diagnostics but it happens only every few months or so (will do it the next time it happens). I did look at the syslog last time and there was nothing unusual to see, only the timeouts that are caused by the unresponsiveness of the containers.

Does anyone recognize these symptoms? Or can anyone give me advice on how to pinpoint the issue, because this is seriously hurting the wife acceptance factor, considering Home Assistant is running on it.

MulletBoy · January 6, 2019

Im seeing similar symptoms, extremely high CPU usage in the UNRAID dashboard, by very normal cpu usage when i run 'top' or 'htop' in terminal.

Restarts didnt fix it for me, and upgrading the latest unraid did not fix it either. ( I went from 6.6.3 to 6.6.6).

I am running standard stuff, plex, sonarr, couch potato, sabnzbd, letsencrypt (with some wordpress sites), mysql, mariadb, nextcloud, rutorrent, unifi, muximux.

I have found the culprit in my case to be couchpotato, it is running the mover on my completed torrents folder on repeat erroneously.. as in copy completed torrent xyz to the library location leaving a copy in the completed folder (as it should for seeding), however then not remembering that its processed that file already, and just doing it again and again, cycling through all the files in the folder....

I havent figured out what is wrong with couchpotato or how to fix it yet, but at least i know what the issue is and disabling couchpotato docker for now has stopped the super high CPU usage.

Johan76 · January 14, 2019

Have you found out anything else about this?

I have the EXAKT same problem as you describe. Sometimes it however resolves itself.

I dont need to reboot, but I can actullay enter setting and disable dockers. This forces everything to stop and then I can restart docker agan and it works normally.

This is really annoying and I am not sure how to check what is causing this.

I am not running Couchpotato but Radarr (and Sonarr). I have not made this connection. I have however sometimes files moving around which could seem to be forever but they dont use all this CPU when that has been occuring.

Most annoying things is that Plex is non responsive so my movies for the kids does not work (if I am not home to "fix" it).

-------------------------

Ok. Thanks for the tip. I logged into console. Killed off Sonarr docker with docker command.

CPU dropped down to normal use in a few minutes (guess when all processes belonging to Sonarr where all killed).

I will try to disable Sonarr for the moment.

Edited January 14, 2019 by Johan76
New test - killing off containers

Zonediver · January 14, 2019

Diagnostics???

Logs???

Something else???

Edited January 14, 2019 by Zonediver

Johan76 · January 14, 2019

2 hours ago, Zonediver said:

Diagnostics???

Logs???

Something else???

Good point!

Been at 100% cpu all day. Around 17:00 today I killed the Sonarr docker and everything went back to normal.

I think the logs are a while back so it should be during the problems.

Let me know if you need something else.

nas2-diagnostics-20190114-1843.zip

Zonediver · January 14, 2019

4 hours ago, Johan76 said:

Good point!

The diagnostics file is "always" important so the specialists can analyse and see what happens 😉

Edited January 14, 2019 by Zonediver

zyrmpg · June 3, 2019

Did you guys happen to figure this out? I've been seeing this exact same problem for a week now. Its been so frustrating!

Timbiotic · November 12, 2019

same thing today on latest unraid any answers?

Attaching diags before rebooting

lillis.69.mu-diagnostics-20191112-1454.zip

Edited November 12, 2019 by Timbiotic

Timbiotic · November 12, 2019

and top screenshot and gui

glennv · November 12, 2019

Although you mistakenly may think that top shows no activity, check the cpu wait . Its at around 50% indicating cpu is waiting on something (typicaly i/o). So its in line with what the gui is showing namely about 50% load (2 of the 4 cores are in wait state)

Timbiotic · November 12, 2019

seeing that as i cannot reboot, plex and duplicati wont shut down will probably have to fat finger after work.

Timbiotic · November 12, 2019

is the wait the "wa" sorry dont use linux that often

glennv · November 12, 2019

3 minutes ago, Timbiotic said:

is the wait the "wa" sorry dont use linux that often

yes

Cpu usage is always split in user/system/wait/idle . Together its ~100% (ignoring the other smaller indicators)

Meaning its x% of the time either serving the user, bussy with internal system activities, waiting for something , or idle doing nothing.

Edited November 12, 2019 by glennv

Timbiotic · November 12, 2019

can i find out what its waiting on and kill it? i hate having to fat finger it. All dockers stopped but plex and duplicati and I cant stop them from command line.

glennv · November 12, 2019

did you try killing them like normal linux processes with kill pid or kill -9 pid. ?

If they wont die with this , its difficult as likely completely hanging.

You can see this effect also when a process is hanging on an hard nfs mount that is not there anymore. Typicaly only a reboot can kill these sessions. You dont happen to have any mounted external shares that they can be hanging on ?

Timbiotic · November 12, 2019

i killed dockerd how can i see what specific is it waiting on? I also unmounted some external unassigned disks.

Dissones4U · November 12, 2019

1 hour ago, Timbiotic said:

how can i see what specific is it waiting on

have you tried: ~~ps l~~ (try ps -x instead this gives a better list with the current state of the process and the pid)

anything waiting should be in the (uninterruptible) D state I think... all of my processes are in the (interruptible) S sleep state.

Edited November 12, 2019 by Dissones4U
corrected

glennv · November 12, 2019

if killing the specific docker still did not clear the iowait situation then its not as simple to find the culprit.

check this for some ideas how to approach it https://bencane.com/2012/08/06/troubleshooting-high-io-wait-in-linux/

The required commands/tools for deeper troubleshooting you can install using the nerdtools plugin. Like iotop for example, which is in there and which may be usefull.

Its not a simple problem that can easily be identified remote.

edit: iostat is part of the sysstat package of nerdtools

Edited November 12, 2019 by glennv

Osiris · February 1, 2021

I experience the EXACT same behaviour.

All docker containers are irresponsive, but still running.

seth-diagnostics-20210201-0154.zip

Solved it by killing 2 factorio-docker containers (kill -9 on the processes involved) & 2 vms (windows console & ubuntu idle test system).

Other containers became available again.

Edited February 1, 2021 by Osiris

DrSpaldo · July 9

This is an old post and may be unrelated. But, if people google search like I did and come across this. Check to make sure if you are using rsync to transfer that you do not use the compress flag (-z) as this will cause this exact behaviour. I was getting unresponsive dockers, total CPU usage reported within the unraid gui but little usage in htop. Turned it after I stopped the rysnc transfer and resumed without the compression flag, it fixed the issue.

100% CPU in Dashboard and unresponsive, but top shows otherwise

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation