-C- Posted September 3, 2023 Share Posted September 3, 2023 I was updating some docker containers through the Docker GUI page when the page froze. Checked top via SSH and got this: top - 17:58:46 up 1 day, 21:02, 1 user, load average: 53.92, 53.54, 53.07 Tasks: 1107 total, 3 running, 1104 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.9 us, 2.6 sy, 0.0 ni, 54.9 id, 41.4 wa, 0.0 hi, 0.2 si, 0.0 st MiB Mem : 31872.3 total, 5800.0 free, 11092.2 used, 14980.1 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 19566.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24398 root 20 0 34316 32632 1960 R 21.5 0.1 0:05.21 find 12074 root 20 0 974656 409920 532 S 10.9 1.3 215:55.56 shfs 18749 nobody 20 0 455592 102776 80756 S 5.3 0.3 0:02.04 php-fpm82 21905 nobody 20 0 386252 77244 61756 R 4.6 0.2 0:00.31 php-fpm82 18604 nobody 20 0 455788 105656 83604 S 4.0 0.3 0:02.86 php-fpm82 11495 root 0 -20 0 0 0 S 3.3 0.0 41:37.86 z_rd_int_0 11496 root 0 -20 0 0 0 S 3.3 0.0 41:39.55 z_rd_int_1 11497 root 0 -20 0 0 0 S 3.3 0.0 41:38.24 z_rd_int_2 7491 nobody 20 0 2711124 259320 24452 S 1.7 0.8 0:11.99 mariadbd Thing is, I'm part way through an array rebuild having replaced a failed HDD. Usually I would restart the server if the GUI becomes snafu, but in this case, is it safe to do so? (I have the Parity Check Tuning plugin installed) or is there a CLI command I can try to bring things back? Quote Link to comment
-C- Posted September 3, 2023 Author Share Posted September 3, 2023 (edited) I checked the logs and found the crash happened around here: Sep 3 17:00:54 Tower webGUI: Successful login user root from 192.168.34.42 Sep 3 17:01:25 Tower php-fpm[7836]: [WARNING] [pool www] server reached max_children setting (50), consider raising it Doing some further digging on that error, I found this post: In which the poster found the issue was due to the GPU Statistics plugin. I had just installed that a couple of days ago, so it would seem that this is likely the cause of my problem too. I successfully removed the plugin via CLI with plugin remove gpustat.plg ...but after a few minutes the sys load remains high and still no GUI. Looking like a reboot's my only option, but Status: Parity Sync/Data Rebuild (65.3% completed) Edited September 3, 2023 by -C- Quote Link to comment
-C- Posted September 3, 2023 Author Share Posted September 3, 2023 Load is still climbing: load average: 57.57, 57.49, 57.00 Looks like it could be related Docker: root@Tower:/mnt/user/system# umount /var/lib/docker umount: /var/lib/docker: target is busy. Parity rebuild seems to be going much slower than it should be, guess it's due to the high load. So I ran parity.check stop after doing so the GUI's now loading fine, but I'm getting a "Retry unmounting user share(s)" in the GUI footer. I tried a reboot but it's hung. Via SSH I tried stopping Docker service, but it doesn't seem to be that: root@Tower:/mnt/disks# umount /var/lib/docker umount: /var/lib/docker: not mounted. I left it (wasn't sure what else to try) and eventually it restarted and things seem to be back to normal. I've now discovered that the Parity Tuning plugin doesn't/ can't continue a Parity Sync/Data Rebuild in the same way that it can a correcting parity check, so it's back to the beginning with that. I'm going to avoid touching anything until the rebuild's finished. Quote Link to comment
-C- Posted September 5, 2023 Author Share Posted September 5, 2023 Just tried logging into the Unraid GUI and am now getting a Load is pegged again: top - 12:24:40 up 1 day, 12:32, 1 user, load average: 52.86, 52.47, 52.31 Tasks: 1152 total, 1 running, 1151 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.9 us, 5.2 sy, 0.0 ni, 82.7 id, 9.1 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 31872.3 total, 6157.1 free, 12252.2 used, 13463.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 18476.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11805 root 20 0 975184 412108 516 S 93.7 1.3 158:06.42 /usr/local/bin/shfs /mnt/user -disks 31 -o default_permissions,allow_other,noatime -o remember=0 29298 nobody 20 0 226844 109888 32124 S 22.5 0.3 12:33.84 /usr/lib/plexmediaserver/Plex Media Server 12138 nobody 20 0 386324 70936 55404 S 6.0 0.2 0:00.28 php-fpm: pool www 9015 root 20 0 0 0 0 S 4.3 0.0 137:11.80 [unraidd0] 13271 nobody 20 0 386216 65896 50496 S 4.3 0.2 0:00.16 php-fpm: pool www 22798 nobody 20 0 386744 79348 63384 S 4.3 0.2 0:17.22 php-fpm: pool www 7495 root 20 0 0 0 0 D 1.0 0.0 26:23.74 [mdrecoveryd] Here's a list of installed plugins: root@Tower:~# ls /var/log/plugins/ Python3.plg@ dynamix.cache.dirs.plg@ dynamix.system.temp.plg@ open.files.plg@ unRAIDServer.plg@ zfs.master.plg@ appdata.backup.plg@ dynamix.file.integrity.plg@ dynamix.unraid.net.plg@ parity.check.tuning.plg@ unassigned.devices-plus.plg@ community.applications.plg@ dynamix.file.manager.plg@ file.activity.plg@ qnap-ec.plg@ unassigned.devices.plg@ disklocation-master.plg@ dynamix.s3.sleep.plg@ fix.common.problems.plg@ tips.and.tweaks.plg@ unbalance.plg@ dynamix.active.streams.plg@ dynamix.system.autofan.plg@ intel-gpu-top.plg@ unRAID6-Sanoid.plg@ user.scripts.plg@ I can access files on the array OK over the network, rebuild is still running, albeit very slowly: root@Tower:~# parity.check status Status: Parity Sync/Data Rebuild (65.2% completed) Any advice on what I can try to get the load back down? Quote Link to comment
JorgeB Posted September 5, 2023 Share Posted September 5, 2023 Probably only a reboot will help. Quote Link to comment
itimpi Posted September 5, 2023 Share Posted September 5, 2023 On 9/4/2023 at 12:10 AM, -C- said: I've now discovered that the Parity Tuning plugin doesn't/ can't continue a Parity Sync/Data Rebuild in the same way that it can a correcting parity check, so it's back to the beginning with that. Are you sure? I am sure it used to. I will have to check this out again. Quote Link to comment
-C- Posted September 5, 2023 Author Share Posted September 5, 2023 2 hours ago, JorgeB said: Probably only a reboot will help. In which case I'm stuck in a loop, for now- I rebooted the first time it happened and everything was stable for a day or so before it happened again, without a reason I can find. What's painful is that the rebuild is happening slowly- when I could last access the GUI I was getting around 10-30 MB/s, so I'll likely be stuck without a GUI for another day at least. I've not had a disk fail without warning before, so not had to rebuild from parity like this and am not sure whether that's normal. It's certainly running a lot slower than a correcting check. 38 minutes ago, itimpi said: Are you sure? I am sure it used to. I will have to check this out again. That's what happened when I rebooted part way through the rebuild yesterday. Not sure if that's normal though. I've disabled mover as the rebuild was stopping for the daily move and not restarting afterwards. Quote Link to comment
itimpi Posted September 5, 2023 Share Posted September 5, 2023 11 minutes ago, -C- said: That's what happened when I rebooted part way through the rebuild yesterday. Not sure if that's normal though. I wonder if you got an unclean shutdown (or the plugin erroneously thought one had happened) as that would stop anything being restarted. Quote Link to comment
-C- Posted September 5, 2023 Author Share Posted September 5, 2023 Just now, itimpi said: I wonder if you got an unclean shutdown (or the plugin erroneously thought one had happened) as that would stop anything being restarted. Certainly possible. The system wasn't happy when I rebooted it (which was the reason for the reboot) and it may have killed hung processes in order to reboot. It certainly took longer than usual. (I used powerdown -r to restart in case that makes any difference.) Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.