WebUI is semi-unresponsive, load average > 400

top - 10:00:03 up 1 day, 11:54,  1 user,  load average: 447.11, 443.73, 433.41
Tasks: 954 total,   2 running, 949 sleeping,   0 stopped,   3 zombie
%Cpu(s):  0.2 us,  0.3 sy,  0.0 ni, 15.6 id, 84.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  64418.8 total,    928.1 free,  11473.8 used,  52016.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  51404.9 avail Mem


I have been having some trouble lately, where seemingly at random the server gets overburdened and stops responding to most of my connections. I can however open the webUI and some of the content loads (including the diagnostics zip attached).


Docker/VM views are unresponsive, and the dashboard doesn't load. I can see the array in the main tab though. I tried force shutdown on my Gitlab-CE virtual machine, but that changed nothing. This all happened after I started hosting my own instance.


If I reboot the machine it runs perfectly well again.




Version 6.8.3


Ryzen 3950x

Asrock Rack x470d4u

64gb ddr4 ECC

2x 970 evo 1tb (cache)
2x 970 evo 500gb (1 for plex, 1 empty)

11x 3/4 tb wd red for data


Running a bunch of dockers, from memory:



Plex (with its own ssd)





Two VMs, both ubuntu 18.04/20.04


Gitlab-CE VM (tried running in Docker first, but tried moving it to a VM, in case it was causing the high loads)

Backup VM (basic VM running a shell script taking backups of a mySQL server)


Attached a screenshot of my plugins



8 minutes ago, johnnie.black said:

There are a lot of checksum errors, that suggest a hardware issue causing data corruption, like bad RAM, or the NVMe devices are dropping alternatively.



Yeah, I'm currently trying to shut the server down, but it's stuck at "Forcing shutdown". I'm gonna run memtest86+ on it for a while to verify.

