100% CPU Usage - lsof


boozel

Recommended Posts

Hi

 

I have a windows 10 VM configured and it worked fine in the past however now when it runs the remaining CPUs on my unraid server that are not allocated to the VM got to 100%. If i manage to force stop the VM this goes back to normal. I have assumed it was an issue with the VM however I'm not sure why this would impact CPUs on the host that were not allocated to the VM if there was an issue with the VM.

 

 I recently had an uncontrolled shutdown. I booted up the server and left it to run its parity check overnight. The next day i saw the CPU was pegged at 100% (With the VM off) and the parity check had an estimated duration of 20 days. After stopping the parity check the CPU was still stuck at 100% a few hours after. 

 

In both cases when the CPU is at 100% i have run htop and noticed there are multiple instances of 

lsof -Owl /mnt/Disk1 /mnt/Disk2

using a lot of the CPU. Can anyone help me see what might be happening here? I suspect it's something related to my cache disk but ran smart diagnostics and there were not issues.

 

Thanks,

htop.png

Link to comment
  • 8 months later...

I just found that the WebUI makes requests to wss://xxxxxxxxx.unraid.net/sub/cpuload,update1,update2,update3,wireguard

update1 return something like

 

image.png.e6cec8d355c34fa4b41f2ddfcaa037a6.png

the percentage numbers matches with the Memory part of the Dashboard, the RPM numbers matches with the Airflow tab. The 0s are the numbers of streams (last column) of the Shares tab of the Dashboard. 

Link to comment
  • 1 month later...

Did you figure out what "feature" causes this? I see an lsof task running every 5-10 seconds as a process spawned by update1. I have made sure I closed all WebUI interfaces (closed browser completely, made sure no processes still running) but they persist. This is the command that keeps running

 

sh -c LANG='en_US.UTF8' lsof -Owl /mnt/disk[0-9]* 2>/dev/null|awk '/^shfs/ && $0!~/\.AppleD(B|ouble)/ && $5=="REG"'|awk -F/ '{print $4}'

 

I don't have any Apple devices so whilst mine's not pegging the CPU at 100% for extended periods it's not necessary. Just runs 100% for maybe a second each time. Seems to be look for hidden files left by Macs and then prints them somewhere. Seems kind of pointless to me.

 

I see a sleep 2, so perhaps running every two seconds:

 ps aux | grep ls[o]f -A5 -B5
root     17957  0.0  0.1  88744 29040 ?        SL   Jul03  24:23 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/wg_poller
root     17960  0.0  0.1  88744 29000 ?        SL   Jul03   6:40 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_1
root     17963  0.0  0.1  88944 29716 ?        SL   Jul03  34:08 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_2
root     17966  0.1  0.1  88884 29408 ?        SL   Jul03  64:21 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_3
root     18095  0.0  0.0      0     0 ?        I    17:31   0:00 [kworker/2:0]
root     18149  0.0  0.0   3936  2984 ?        S    17:31   0:00 sh -c LANG='en_US.UTF8' lsof -Owl /mnt/disk[0-9]* 2>/dev/null|awk '/^shfs/ && $0!~/\.AppleD(B|ouble)/ && $5=="REG"'|awk -F/ '{print $4}'
root     18150 13.0  0.0   5340  3292 ?        S    17:31   0:00 lsof -Owl /mnt/disk1 /mnt/disk2 /mnt/disk3 /mnt/disk4 /mnt/disk5
root     18151  0.0  0.0   8376  2560 ?        S    17:31   0:00 awk /^shfs/ && $0!~/\.AppleD(B|ouble)/ && $5=="REG"
root     18152  0.0  0.0   8244  2496 ?        S    17:31   0:00 awk -F/ {print $4}
root     18155  0.0  0.0   2464   732 ?        S    17:31   0:00 sleep 2
root     18156  0.0  0.0   4860  2908 pts/27   R+   17:31   0:00 ps aux
root     18157  0.0  0.0   3980  2228 pts/27   S+   17:31   0:00 grep ls[o]f -A5 -B5

 

Edited by Shonky
Link to comment

Did some more digging and found the code in /usr/local/emhttp/webGui/nchan/update1

 

#!/usr/bin/php -q
<?PHP
/* Copyright 2005-2021, Lime Technology
 * Copyright 2012-2021, Bergware International.
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License version 2,
 * as published by the Free Software Foundation.
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 */
?>
<?
$docroot = '/usr/local/emhttp';
$varroot = '/var/local/emhttp';
require_once "$docroot/webGui/include/publish.php";

while (true) {
  unset($memory,$sys,$rpms,$lsof);
  exec("grep -Po '^Mem(Total|Available):\s+\K\d+' /proc/meminfo",$memory);
  exec("df /boot /var/log /var/lib/docker|grep -Po '\d+%'",$sys);
  exec("sensors -uA 2>/dev/null|grep -Po 'fan\d_input: \K\d+'",$rpms);
  $info = max(round((1-$memory[1]/$memory[0])*100),0)."%\0".implode("\0",$sys);
  $rpms = count($rpms) ? implode(" RPM\0",$rpms).' RPM' : '';

  $names = array_keys((array)parse_ini_file("$varroot/shares.ini"));
  exec("LANG='en_US.UTF8' lsof -Owl /mnt/disk[0-9]* 2>/dev/null|awk '/^shfs/ && \$0!~/\.AppleD(B|ouble)/ && \$5==\"REG\"'|awk -F/ '{print \$4}'",$lsof);
  $counts = array_count_values($lsof); $count = [];
  foreach ($names as $name) $count[] = $counts[$name] ?? 0;
  $count = implode("\0",$count);

  publish('update1', "$info\1$rpms\1$count");
  sleep(5);
}
?>

 

My regex skills aren't the best so perhaps excluding the Apple files and just counting open files? But why does it even do this if there's no WebGui reading it?

Link to comment
  • 3 weeks later...

Sorry, but you didn't find anything more than we already discovered. The regex is use to count the number of open files (streams) in the dashboard that are not Apple related. The thing is that unraid should stop the process when the websocket connection has no more subscribers (kind of, I'm not expert). It's not the case.

Link to comment
  • 2 months later...

Hi all,
i have unraid on very old and weak hardware and "lsof" problem is bothering me too. I tried to delete the line from "/usr/local/emhttp/webGui/nchan/update_1" file, which is invoking this command, but i think web server needs to be restarted or something, because this is changing nothing. 

 

I came with next solution to stop lsof using my old cpu: just kill it and watch what will broke.

pkill -f "/usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_1"

So far everything works fine. Not spotted any difference.

 

Another observation that I made:

If you reboot the NAS and do not login in WebUI there is no periodic lsof process and no "/usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_1" process too.

root@box3:~# ps -aux | grep update
root      7345  0.0  0.0   4048  2288 pts/0    S+   16:29   0:00 grep update

 

As you log in and open "Dashboard" tab all updates comes up and never stops.

root@box3:~# ps -aux | grep update
root      7390  1.5  0.1  91896 28628 ?        SL   16:29   0:00 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_1
root      7392  2.5  0.1  92140 29196 ?        SL   16:29   0:00 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_2
root      7394  2.0  0.1  92076 29472 ?        SL   16:29   0:00 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_3
root      7489  0.0  0.0   4048  2228 pts/0    S+   16:29   0:00 grep update

 

Now you can kill just "udate_1" that envokes lsof command with it's id or kill all /usr/bin/php processes with killall /usr/bin/php and they will not come up until reboot. 

 

So i came out with this "user script" runing once-a-day:

image.thumb.png.43c2c3ed4dbe0b4601fb845db5861468.png

Edited by mchus
Link to comment
  • 2 weeks later...
3 hours ago, Seanco said:

I was looking at this exact issue on my Unraid server and saw the command "lsof -0wl /mnt/disk1 ..." (using htop) using 100% of a single core. The solution for me was to remove 2 plugins, "File Activity" and "Open Files". CPU usage is back down to normal.

What I found is not related to theses plugins (that I even don't have). It's related to the Unraid Dashboard page in the streams section. But the 6.11.4 update should have fix this by stopping background processes when all browsers are closed (but not sure it's really fix)

Link to comment
  • 6 months later...

I still have this issue on 6.12.0-RC6 🤔

 

image.thumb.png.b67ea16edacb9fcd492042e4f1f9c942.png

 

also ive seen the dashboad taking its sweet time to show my started containers and memory usage 
image.thumb.png.2f5006752e548fdce53661b8a4acf11d.png

 

so my guess is the issue is indeed linked to the dashboard

Edit:
okay just tried to restart. doesnt work either. i cant reach the dashboard anymore and cant connect though ssh. just the already opened webterminal 

image.thumb.png.45fca40ff81974d63d0372c6aac18c49.png

Edited by hoschy
Link to comment
20 hours ago, hoschy said:

so my guess is the issue is indeed linked to the dashboard

Sure it is. I already proved that it's related to the "Stream count" in the "Shares" box in the Dashboard. But it should be stopped when the WebUI is closed (display another tab like Main or Docker doesn't stop it. You really need to close all the unraid's tabs)

Edited by dada051
Link to comment
  • 6 months later...

image.png.610ab24dc00bad5a1faf8ce953862c88.png

 

I always have the problem. losf use lot of CPU for a long moment (even with a 5950X 64GB and only 5 disks). I think it was not here for a moment, but even with 6.12.6 it's here. Very annoying. It's even more annoying because I never see anything else than 0 in the stream counts in the dashboard. @limetech maybe if you don't want to remove that feature (that I personaly don't care) add an option to deactivate it.

Link to comment

Was getting sick of seeing a CPU spike to +60% on lsof.  I edited /usr/local/emhttp/plugins/dynamix/nchan/update_1 and commented out the following lines

 

#  exec("LANG='en_US.UTF8' lsof -Owl /mnt/disk[0-9]* 2>/dev/null|awk '/^shfs/ && \$0!~/\.AppleD(B|ouble)/ && \$5==\"REG\"'|awk -F/ '{print \$4}'",$lsof);
#  $counts = array_count_values($lsof); $count = [];
#  foreach ($names as $name) $count[] = $counts[$name] ?? 0;
#  $count = implode("\0",$count);
  $count = "0";

 

Rather than killing the entire script.  This way it still works with percentages and fan speeds on the dashboard, but doesn't spike CPU when looking for share streams.

 

5 minute load average dropped from about 1.3 to 0.9 and 15 minute average from about 1.8 to 1.2, so this definitely makes a difference.

  • Like 1
Link to comment
  • 3 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.