Rodael Posted July 6, 2020 Share Posted July 6, 2020 (edited) Hello, top - 10:00:03 up 1 day, 11:54, 1 user, load average: 447.11, 443.73, 433.41 Tasks: 954 total, 2 running, 949 sleeping, 0 stopped, 3 zombie %Cpu(s): 0.2 us, 0.3 sy, 0.0 ni, 15.6 id, 84.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 64418.8 total, 928.1 free, 11473.8 used, 52016.9 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 51404.9 avail Mem I have been having some trouble lately, where seemingly at random the server gets overburdened and stops responding to most of my connections. I can however open the webUI and some of the content loads (including the diagnostics zip attached). Docker/VM views are unresponsive, and the dashboard doesn't load. I can see the array in the main tab though. I tried force shutdown on my Gitlab-CE virtual machine, but that changed nothing. This all happened after I started hosting my own instance. If I reboot the machine it runs perfectly well again. Edit: Version 6.8.3 Ryzen 3950x Asrock Rack x470d4u 64gb ddr4 ECC 2x 970 evo 1tb (cache) 2x 970 evo 500gb (1 for plex, 1 empty) 11x 3/4 tb wd red for data Running a bunch of dockers, from memory: NginxProxyManager Plex (with its own ssd) QbittorrentVPN SickChill Gitlab-Runner Two VMs, both ubuntu 18.04/20.04 Gitlab-CE VM (tried running in Docker first, but tried moving it to a VM, in case it was causing the high loads) Backup VM (basic VM running a shell script taking backups of a mySQL server) Attached a screenshot of my plugins ryzen-diagnostics-20200706-0950.zip Edited July 6, 2020 by Rodael Quote Link to comment
Rodael Posted July 6, 2020 Author Share Posted July 6, 2020 Upon closer inspection, it seems like my second cache drive may be faulty? dmesg returns this: https://pastebin.com/raw/U1pzFRSW I snipped when it started spewing errors Should a faulty drive in a mirrored configuration take down the entire system? Quote Link to comment
JorgeB Posted July 6, 2020 Share Posted July 6, 2020 There are a lot of checksum errors, that suggest a hardware issue causing data corruption, like bad RAM, or the NVMe devices are dropping alternatively. Quote Link to comment
Rodael Posted July 6, 2020 Author Share Posted July 6, 2020 8 minutes ago, johnnie.black said: There are a lot of checksum errors, that suggest a hardware issue causing data corruption, like bad RAM, or the NVMe devices are dropping alternatively. Yeah, I'm currently trying to shut the server down, but it's stuck at "Forcing shutdown". I'm gonna run memtest86+ on it for a while to verify. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.