• [6.7.0-rc7] extreme high cpu usage in dashboard but not top/htop


    DaLeberkasPepi
    • Minor

    idk why but i sometimes get really high cpu usages in dashboard, which i cant track in top nor htop or any other monitoring tool i could think of (netdata/cadvisor).

     

    The problem is it seems its not only a visual anomaly because when the cpu usage hovers between 75-100% like here the docker page "hangs" for a few seconds and also the dockers seem to hange every few seconds .

     

    Does anybody have a way to track down which process produces this high cpu usage? How does the dashboard tetermine the cpu usage?

     

    thx in advance for your help!

    image.thumb.png.ae965c30125a368a13d78681cf22bb17.png

    image.thumb.png.b1fdf4961604564701c055fc7ffe8587.png

    • Like 1



    User Feedback

    Recommended Comments

    I've read that post before creating mine. But for me it seems like an different issue because it's not only a graphical bug but a misbehavior that effects the web ui and Docker containers etc. 

     

    What I found was that after updating a container the cpu usage was normal again but that container had a max cpu usage of 5% so this couldn't be really the culprit.

     

    I've read that squid had a same behavior which he fixed by changing a Docker from lio to binhex (I belive sonarr). The thing is I don't even have that sonarr container installed. 

     

    The weirdest thing is that I can't catch that cpu usage anywhere but it certainly effects the server performance anyway... 

    Link to comment

    This looks similar also to this issue, which I opened a while ago. It was redirected into the "virtualizing" sub-forum because my Unraid runs under ESXi but my sense was, and still is, that it is unrelated to the virtualization. In my case, Unraid has one docker container (and obviously no VMs).

     

    Can you check whether, during the few seconds this is happening, "top" reports high CPU in IO wait (third line from top, "wa" percentage)? This might further indicate similarity.

     

    I feel we keep seeing reports about recent versions of Unraid getting into a few seconds (5-15) of 100% CPU usage where everything locks up, and then it goes back to normal.

    Link to comment

    I forgot to check yesterday but I had that problem again before updating to rc6. I checked systemstats yesterday and there the near 100% cpu usage was nowhere to be seen. So atm the only really indicator for high cpu usage and therefore hangs with anything related to Docker is the webui dashboard. 

     

    The weird thing is even in the dashboard everything loads normal only the Docker "widget" needs in those cases multiple seconds to show. The same with the dedicated Docker page. 

     

    I haven't found yet a proper way to reproduce.

     

    I did update to rc6 maybe it's fixed with this. 

    Link to comment
    On 3/26/2019 at 2:25 PM, doron said:

    This looks similar also to this issue, which I opened a while ago. It was redirected into the "virtualizing" sub-forum because my Unraid runs under ESXi but my sense was, and still is, that it is unrelated to the virtualization. In my case, Unraid has one docker container (and obviously no VMs).

     

    Can you check whether, during the few seconds this is happening, "top" reports high CPU in IO wait (third line from top, "wa" percentage)? This might further indicate similarity.

     

    I feel we keep seeing reports about recent versions of Unraid getting into a few seconds (5-15) of 100% CPU usage where everything locks up, and then it goes back to normal.

    Yes indeed when this happens i have also a high "wa" numberup to 82 ive seen atm.

     

    And it happened again with rc6 right now. I've got the feeling that it has something to do with mover moving files of cache drive to the array while duplicate access them.

     

    Is there a way to monitor cpu usage of the mover process? Because even in system stats the high cpu usage isnt present. Only in the dashboard the cpu usage can be seen. And obviously you notice it with all the docker container running slow or halting for a few seconds.

     

    And when i have this symptomes i have a high cpu usage between 95-100% on all cores, not only one core pegging at 100% for the whole time so i guess its not related to @jbartlett's problem.

    Link to comment
    1 hour ago, DaLeberkasPepi said:

    And it happened again with rc6 right now. I've got the feeling that it has something to do with mover moving files of cache drive to the array while duplicate access them.

     

    It might; however, I don't even have mover running (and no cache drive configured).

     

    1 hour ago, DaLeberkasPepi said:

    Is there a way to monitor cpu usage of the mover process? Because even in system stats the high cpu usage isnt present. Only in the dashboard the cpu usage can be seen. And obviously you notice it with all the docker container running slow or halting for a few seconds.

     

    And when i have this symptomes i have a high cpu usage between 95-100% on all cores, not only one core pegging at 100% for the whole time so i guess its not related to @jbartlett's problem.

     

    Well, you do say that you have high I/O wait. That's also CPU usage - adds up to user and system times. 

     

    I, too, see this happen on all available cores.

    Link to comment

    I don't think it has anything to do with actual CPU usage. My guess is that there's a bug in the data gathering/display formatting process.

    Link to comment
    10 hours ago, jbartlett said:

    I don't think it has anything to do with actual CPU usage. My guess is that there's a bug in the data gathering/display formatting process.

    At least in my case it appears to be very real. All clients actively accessing the server's shares freeze for the duration of the "storm", which ranges from 5-12 seconds. The server itself remains accessible over CLI and web GUI, while showing 100% CPU (e.g. using "top"), with most of the usage in "I/O wait".

    Once the graphic display cools down from red (both CPUs at 100%) to normal (1%-5% on each), they all return to normal operation.

    Link to comment
    13 hours ago, jbartlett said:

    I don't think it has anything to do with actual CPU usage. My guess is that there's a bug in the data gathering/display formatting process.

    The cpu calculation includes the iowait component as part of the load percentage.

    Not a bug, but by design.

    Link to comment

    @bonienl thx that explains alot.

     

    i've got the same problem right now again and it looks like i could have something to do with my btrfs cache pool?
    image.png.5ddec51666b904a7761a2fcb58409e6a.png

    Edited by DaLeberkasPepi
    Link to comment

    I have the same problem, but my system is much less strong and it takes so long for the state to normalize again that I'd rather restart the server so I can use it again. And I use a stable release.

    So far I don't know what the trigger of this behavior is and follow this thread and hope for someone who gets to the bottom of it.

    Link to comment

    Are you also using a btrfs cache? If yes what constellation are you using? 

    I use 2 ssds in raid 0 for data and raid 1 for metadata and the rest. 

    Link to comment

    Yes, I am using 2x 120GB SSDs as my cache pool, formatted in btrfs. And an array of 9x4TB HDDs.

    Edited by pappaq
    Link to comment

    I am noticing that the dashboard is correct and top is wrong.  Example.  I am running mprime which is maxing out my machine and the dashboard is correct.  Top shows CPU usage at <3% busy.

    Link to comment
    44 minutes ago, jjslegacy said:

    I am noticing that the dashboard is correct and top is wrong.  Example.  I am running mprime which is maxing out my machine and the dashboard is correct.  Top shows CPU usage at <3% busy.

    This sounds a bit odd - can you post a screenshot of top at the time of the stress?

    Link to comment

    had some other stuff running so it's not perfect but you can see mprime using 400% CPU and top reporting it wrong in the totals. 

    mprime.jpg

    Link to comment
    6 minutes ago, jjslegacy said:

    had some other stuff running so it's not perfect but you can see mprime using 400% CPU and top reporting it wrong in the totals. 

     

    How many cores do you have on this machine?

    Link to comment
    2 minutes ago, jjslegacy said:

    4 cores 8 threads

    Which would be your answer right there. The per-process lines in "top" show %CPU in single core units, so 400% would make sense. The top line sums it all and adjusts it as a total of all available logical processors. So the ~80% you see (your process is "niced" so it appears under ni) is 80% your total processor units.

    Specifically, the ~50% you see in the "ni" part is in fact your 400% in the per-process line (4 out of 8).

    Link to comment

    it's just happening again...

    i tried the command in the following stackoverflow post in the console since "iotop" and "latencytop" are not installed on unraid:

    while true; do date; ps auxf | awk '{if($8=="D") print $0;}'; sleep 1; done

    Source: https://stackoverflow.com/a/12126674

     

    And it gives me this as a result. Left side standard "top" right side the cmd from stackoverflow. From there it seems the bulk of the iowait indeed comes from kworker/btrfs background processes.
    image.thumb.png.d8cf4a4bce2a80fb8eb937b45d0eaf48.png

     

    Edit: and im using rc7 atm

    Edited by DaLeberkasPepi
    specified current running os version
    Link to comment

    I disabled the old deprecated unifi controller container by linuxserver.io and the high usage of the CPU an RAM was gone within 5 seconds after that...maybe that done the trick. Maybe it will reoccur...I will report back if that happens.

    Edited by pappaq
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.