boozel Posted September 27, 2021 Share Posted September 27, 2021 Hi I have a windows 10 VM configured and it worked fine in the past however now when it runs the remaining CPUs on my unraid server that are not allocated to the VM got to 100%. If i manage to force stop the VM this goes back to normal. I have assumed it was an issue with the VM however I'm not sure why this would impact CPUs on the host that were not allocated to the VM if there was an issue with the VM. I recently had an uncontrolled shutdown. I booted up the server and left it to run its parity check overnight. The next day i saw the CPU was pegged at 100% (With the VM off) and the parity check had an estimated duration of 20 days. After stopping the parity check the CPU was still stuck at 100% a few hours after. In both cases when the CPU is at 100% i have run htop and noticed there are multiple instances of lsof -Owl /mnt/Disk1 /mnt/Disk2 using a lot of the CPU. Can anyone help me see what might be happening here? I suspect it's something related to my cache disk but ran smart diagnostics and there were not issues. Thanks, Quote Link to comment
dada051 Posted May 30, 2022 Share Posted May 30, 2022 Same problem here. I found that the lsof command is started from this file: /usr/local/emhttp/plugins/dynamix/nchan/update_1 I think it's related to the WebUI, but I'm not sure. Quote Link to comment
dada051 Posted May 30, 2022 Share Posted May 30, 2022 I just found that the WebUI makes requests to wss://xxxxxxxxx.unraid.net/sub/cpuload,update1,update2,update3,wireguard update1 return something like : the percentage numbers matches with the Memory part of the Dashboard, the RPM numbers matches with the Airflow tab. The 0s are the numbers of streams (last column) of the Shares tab of the Dashboard. Quote Link to comment
Shonky Posted July 29, 2022 Share Posted July 29, 2022 (edited) Did you figure out what "feature" causes this? I see an lsof task running every 5-10 seconds as a process spawned by update1. I have made sure I closed all WebUI interfaces (closed browser completely, made sure no processes still running) but they persist. This is the command that keeps running sh -c LANG='en_US.UTF8' lsof -Owl /mnt/disk[0-9]* 2>/dev/null|awk '/^shfs/ && $0!~/\.AppleD(B|ouble)/ && $5=="REG"'|awk -F/ '{print $4}' I don't have any Apple devices so whilst mine's not pegging the CPU at 100% for extended periods it's not necessary. Just runs 100% for maybe a second each time. Seems to be look for hidden files left by Macs and then prints them somewhere. Seems kind of pointless to me. I see a sleep 2, so perhaps running every two seconds: ps aux | grep ls[o]f -A5 -B5 root 17957 0.0 0.1 88744 29040 ? SL Jul03 24:23 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/wg_poller root 17960 0.0 0.1 88744 29000 ? SL Jul03 6:40 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_1 root 17963 0.0 0.1 88944 29716 ? SL Jul03 34:08 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_2 root 17966 0.1 0.1 88884 29408 ? SL Jul03 64:21 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_3 root 18095 0.0 0.0 0 0 ? I 17:31 0:00 [kworker/2:0] root 18149 0.0 0.0 3936 2984 ? S 17:31 0:00 sh -c LANG='en_US.UTF8' lsof -Owl /mnt/disk[0-9]* 2>/dev/null|awk '/^shfs/ && $0!~/\.AppleD(B|ouble)/ && $5=="REG"'|awk -F/ '{print $4}' root 18150 13.0 0.0 5340 3292 ? S 17:31 0:00 lsof -Owl /mnt/disk1 /mnt/disk2 /mnt/disk3 /mnt/disk4 /mnt/disk5 root 18151 0.0 0.0 8376 2560 ? S 17:31 0:00 awk /^shfs/ && $0!~/\.AppleD(B|ouble)/ && $5=="REG" root 18152 0.0 0.0 8244 2496 ? S 17:31 0:00 awk -F/ {print $4} root 18155 0.0 0.0 2464 732 ? S 17:31 0:00 sleep 2 root 18156 0.0 0.0 4860 2908 pts/27 R+ 17:31 0:00 ps aux root 18157 0.0 0.0 3980 2228 pts/27 S+ 17:31 0:00 grep ls[o]f -A5 -B5 Edited July 29, 2022 by Shonky Quote Link to comment
Shonky Posted July 29, 2022 Share Posted July 29, 2022 Did some more digging and found the code in /usr/local/emhttp/webGui/nchan/update1 #!/usr/bin/php -q <?PHP /* Copyright 2005-2021, Lime Technology * Copyright 2012-2021, Bergware International. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version 2, * as published by the Free Software Foundation. * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. */ ?> <? $docroot = '/usr/local/emhttp'; $varroot = '/var/local/emhttp'; require_once "$docroot/webGui/include/publish.php"; while (true) { unset($memory,$sys,$rpms,$lsof); exec("grep -Po '^Mem(Total|Available):\s+\K\d+' /proc/meminfo",$memory); exec("df /boot /var/log /var/lib/docker|grep -Po '\d+%'",$sys); exec("sensors -uA 2>/dev/null|grep -Po 'fan\d_input: \K\d+'",$rpms); $info = max(round((1-$memory[1]/$memory[0])*100),0)."%\0".implode("\0",$sys); $rpms = count($rpms) ? implode(" RPM\0",$rpms).' RPM' : ''; $names = array_keys((array)parse_ini_file("$varroot/shares.ini")); exec("LANG='en_US.UTF8' lsof -Owl /mnt/disk[0-9]* 2>/dev/null|awk '/^shfs/ && \$0!~/\.AppleD(B|ouble)/ && \$5==\"REG\"'|awk -F/ '{print \$4}'",$lsof); $counts = array_count_values($lsof); $count = []; foreach ($names as $name) $count[] = $counts[$name] ?? 0; $count = implode("\0",$count); publish('update1', "$info\1$rpms\1$count"); sleep(5); } ?> My regex skills aren't the best so perhaps excluding the Apple files and just counting open files? But why does it even do this if there's no WebGui reading it? Quote Link to comment
dada051 Posted August 17, 2022 Share Posted August 17, 2022 Sorry, but you didn't find anything more than we already discovered. The regex is use to count the number of open files (streams) in the dashboard that are not Apple related. The thing is that unraid should stop the process when the websocket connection has no more subscribers (kind of, I'm not expert). It's not the case. Quote Link to comment
mchus Posted November 9, 2022 Share Posted November 9, 2022 (edited) Hi all, i have unraid on very old and weak hardware and "lsof" problem is bothering me too. I tried to delete the line from "/usr/local/emhttp/webGui/nchan/update_1" file, which is invoking this command, but i think web server needs to be restarted or something, because this is changing nothing. I came with next solution to stop lsof using my old cpu: just kill it and watch what will broke. pkill -f "/usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_1" So far everything works fine. Not spotted any difference. Another observation that I made: If you reboot the NAS and do not login in WebUI there is no periodic lsof process and no "/usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_1" process too. root@box3:~# ps -aux | grep update root 7345 0.0 0.0 4048 2288 pts/0 S+ 16:29 0:00 grep update As you log in and open "Dashboard" tab all updates comes up and never stops. root@box3:~# ps -aux | grep update root 7390 1.5 0.1 91896 28628 ? SL 16:29 0:00 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_1 root 7392 2.5 0.1 92140 29196 ? SL 16:29 0:00 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_2 root 7394 2.0 0.1 92076 29472 ? SL 16:29 0:00 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_3 root 7489 0.0 0.0 4048 2228 pts/0 S+ 16:29 0:00 grep update Now you can kill just "udate_1" that envokes lsof command with it's id or kill all /usr/bin/php processes with killall /usr/bin/php and they will not come up until reboot. So i came out with this "user script" runing once-a-day: Edited November 9, 2022 by mchus Quote Link to comment
Seanco Posted November 22, 2022 Share Posted November 22, 2022 I was looking at this exact issue on my Unraid server and saw the command "lsof -0wl /mnt/disk1 ..." (using htop) using 100% of a single core. The solution for me was to remove 2 plugins, "File Activity" and "Open Files". CPU usage is back down to normal. Quote Link to comment
dada051 Posted November 22, 2022 Share Posted November 22, 2022 3 hours ago, Seanco said: I was looking at this exact issue on my Unraid server and saw the command "lsof -0wl /mnt/disk1 ..." (using htop) using 100% of a single core. The solution for me was to remove 2 plugins, "File Activity" and "Open Files". CPU usage is back down to normal. What I found is not related to theses plugins (that I even don't have). It's related to the Unraid Dashboard page in the streams section. But the 6.11.4 update should have fix this by stopping background processes when all browsers are closed (but not sure it's really fix) Quote Link to comment
dada051 Posted November 22, 2022 Share Posted November 22, 2022 Just tested, seems working well. I have no more "lsof" in htop when I close my browser. 1 Quote Link to comment
hoschy Posted May 29, 2023 Share Posted May 29, 2023 (edited) I still have this issue on 6.12.0-RC6 🤔 also ive seen the dashboad taking its sweet time to show my started containers and memory usage so my guess is the issue is indeed linked to the dashboard Edit: okay just tried to restart. doesnt work either. i cant reach the dashboard anymore and cant connect though ssh. just the already opened webterminal Edited May 29, 2023 by hoschy Quote Link to comment
dada051 Posted May 30, 2023 Share Posted May 30, 2023 (edited) 20 hours ago, hoschy said: so my guess is the issue is indeed linked to the dashboard Sure it is. I already proved that it's related to the "Stream count" in the "Shares" box in the Dashboard. But it should be stopped when the WebUI is closed (display another tab like Main or Docker doesn't stop it. You really need to close all the unraid's tabs) Edited May 30, 2023 by dada051 Quote Link to comment
dada051 Posted December 21, 2023 Share Posted December 21, 2023 I always have the problem. losf use lot of CPU for a long moment (even with a 5950X 64GB and only 5 disks). I think it was not here for a moment, but even with 6.12.6 it's here. Very annoying. It's even more annoying because I never see anything else than 0 in the stream counts in the dashboard. @limetech maybe if you don't want to remove that feature (that I personaly don't care) add an option to deactivate it. Quote Link to comment
grants169 Posted December 26, 2023 Share Posted December 26, 2023 Was getting sick of seeing a CPU spike to +60% on lsof. I edited /usr/local/emhttp/plugins/dynamix/nchan/update_1 and commented out the following lines # exec("LANG='en_US.UTF8' lsof -Owl /mnt/disk[0-9]* 2>/dev/null|awk '/^shfs/ && \$0!~/\.AppleD(B|ouble)/ && \$5==\"REG\"'|awk -F/ '{print \$4}'",$lsof); # $counts = array_count_values($lsof); $count = []; # foreach ($names as $name) $count[] = $counts[$name] ?? 0; # $count = implode("\0",$count); $count = "0"; Rather than killing the entire script. This way it still works with percentages and fan speeds on the dashboard, but doesn't spike CPU when looking for share streams. 5 minute load average dropped from about 1.3 to 0.9 and 15 minute average from about 1.8 to 1.2, so this definitely makes a difference. 1 Quote Link to comment
dopeytree Posted April 15 Share Posted April 15 (edited) @limetech can we please backport this LSOF cpu spike fix above? Edited April 15 by dopeytree Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.