relink Posted June 12, 2020 Posted June 12, 2020 Hey guys, this is the second time this has happened this week. The wife and kids are watching Plex and suddenly its starts to stutter and eventually stops playing. I remote into the server from work to see 100% CPU usage on all cores of my Ryzen 5 2600. The first time I assumed it was the parity check that was running that was causing the issues, so I stopped it and rescheduled it for later and after a reboot everything was fine. But this time there was no parity check running, the mover wasn't running, Im not sure whats causing this issue. I decided NOT to reboot this time, and instead downloaded the diag (attached) and let it run it course. It did eventually stop and go back to normal CPU usage...but this shouldn't happen to begin with, and idk whats causing it. UPDATE: Wife just told me its been fine all day until around 2:00-2:30 this afternoon. Its currently 5:13 as Im typing this where I am. serverus-diagnostics-20200612-1702.zip Quote
Nelinski Posted June 12, 2020 Posted June 12, 2020 The "TOP" log has all of your CPU usage in. You can also run "TOP" in the terminal to view it live next time you're maxing out your CPU. A copy of the top of your CPU usage: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8279 root 20 0 2978932 399240 127800 S 43.8 1.2 913:52.27 Web Conte+ 8143 root 20 0 4245108 1.8g 153688 S 18.8 5.8 226:54.19 firefox 8326 root 0 -20 0 0 0 R 18.8 0.0 58:38.44 loop2 10 root 20 0 0 0 0 I 6.2 0.0 6:17.75 rcu_sched 855 root 20 0 0 0 0 S 6.2 0.0 75:47.48 kswapd0 3740 root 20 0 0 0 0 I 6.2 0.0 0:49.65 kworker/u+ 6814 root 20 0 187516 64380 50764 S 6.2 0.2 98:08.30 Xorg 7865 root 20 0 0 0 0 I 6.2 0.0 0:01.89 kworker/u+ 15190 root 20 0 6788 3092 2180 R 6.2 0.0 0:00.01 top 15471 root 20 0 1631980 832512 0 S 6.2 2.5 59:29.26 xteve 20238 root 20 0 81144 11156 3840 D 6.2 0.0 18:19.88 Xvfb 31898 root 20 0 0 0 0 I 6.2 0.0 0:17.14 kworker/u+ Command on the far right shows what is using the CPU. Quote
relink Posted June 13, 2020 Author Posted June 13, 2020 hmm, ok. I just ssh in and ran "top" im seeing several things with useage in the 20's, but im not sure what to do about any of them. They don't appear to be any of my containers that i have running. One of them is Firefox, could that be because im booted in GUI mode? Interestingly enough if I run htop instead my CPU usage looks normal. But that doesn't change the fact that something is clearly wrong, my server is so over loaded Im going to have to reboot soon or noone can watch TV. But I really need to figure out why this keeps happening. Its so bad that not only is Plex not working, but the web UI for Unraid is just barely responsive. Quote
JorgeB Posted June 13, 2020 Posted June 13, 2020 6 hours ago, relink said: One of them is Firefox, could that be because im booted in GUI mode? Yes, but the main culprit appears to be this one: nobody 20324 47.4 3.5 10036260 1181400 ? SNsl Jun09 2054:52 | | | \_ /usr/local/crashplan/bin/CrashPlanService Try shutting sown CrashPlan when it happens again. Quote
relink Posted June 13, 2020 Author Posted June 13, 2020 I believe you may have been right. I had to try and do something so I just started shutting down containers until my CPU usage dropped. When I saw how big of a difference Crashplan made, I pinned it to a single core. now that single core is maxed out, and the rest of my CPU looks normal. But I had this problem once before with crashplan, probably close to 2 years ago. I fixed it back then and it has not been an issue since. Any idea why it Would suddenly became a problem again? Quote
JorgeB Posted June 13, 2020 Posted June 13, 2020 1 hour ago, relink said: Any idea why it Would suddenly became a problem again? Sorry, no, but you can ask on the Crashplan support thread. Quote
relink Posted June 13, 2020 Author Posted June 13, 2020 I signed up for Backblaze B2 and the problem just magically went away. lol This was not the only issue I've had with Crashplan, but it will be last. Thank you for your help. Quote
relink Posted June 16, 2020 Author Posted June 16, 2020 Ok, I guess I spoke too soon. The issue just crept back up within the last hour. My son was watching a movie and I noticed it just stopped playing and when I checked the server, sure enough 100% usage on all cores. I attached an updated diag. Here the kicker though, I went into the CPU pinning screen and set every single container and VM to a specific number of cores, and there is not one single thing that I have running on here that is able to use all the CPU cores. Most things are limited to 2-4 cores, plex is the most at 10 out of 12 cores. Luckily I have learned that stopping and re-starting the array seems to fix the issues, so at least I don't have to perform a full reboot. But I have to get this fixed, unfortunately Im not sure whats causing it, especially since "top" and "htop" don't appear to be showing the whole picture. serverus-diagnostics-20200615-2106.zip Quote
JorgeB Posted June 16, 2020 Posted June 16, 2020 Firefox is again one the worse, have you tried not booting in GUI mode? Quote
relink Posted June 16, 2020 Author Posted June 16, 2020 5 hours ago, johnnie.black said: Firefox is again one the worse, have you tried not booting in GUI mode? I havent, only because I heavily use GUI mode. But I suppose the next time this happens it wouldn't hurt to reboot without it. Quote
JonathanM Posted June 16, 2020 Posted June 16, 2020 8 minutes ago, relink said: I havent, only because I heavily use GUI mode. What do you use the GUI for? It's supposed to only be used for server management, it's not meant for general website browsing. Quote
relink Posted June 16, 2020 Author Posted June 16, 2020 Server Management. Its just the primary way I manage my server when im at home. Plus I keep the Unraid dash up 24/7 so I can see whats going on at a glance. Quote
relink Posted June 20, 2020 Author Posted June 20, 2020 Ok so yesterday I rebooted without GUI mode, and today, just now, it happened again. I still cannot figure out whats causing this, but when it happens everything grinds to a halt. Attached updated diag. serverus-diagnostics-20200619-2042.zip Quote
JorgeB Posted June 20, 2020 Posted June 20, 2020 You could try disabling all dockers and let it run for a few days, if all OK then start enabling one by one. Quote
relink Posted June 22, 2020 Author Posted June 22, 2020 On 6/20/2020 at 2:41 AM, johnnie.black said: You could try disabling all dockers and let it run for a few days, if all OK then start enabling one by one. Ouch. There must be a better way to find out whats causing this. Is there not a more accurate task manager that could possibly show whats causing 100% CPU useage? Also the last time around I noticed near 100% RAM usage too. Quote
jonp Posted June 26, 2020 Posted June 26, 2020 Hi there, Saw your email into support and wanted to chime in on your thread here. Unfortunately johnnie.black is right in that you're going to need to take the "one at a time" approach to figure out the root cause. The main problem here is that there wasn't some "event" that occurred prior to these issues that we can point to. Everything was fine until it wasn't. When issues like that happen, 99 times out of 100 it's because of something amiss with the hardware or a plugin/container update that broke something. Do you have your containers set to auto-update or do you manually update them? You can absolutely check out HTOP through a command line (just type htop from a terminal session) and see a more detailed process reporting, but even then, you will likely still have to resort to shutting down all your containers, letting the system run for a while to see if the CPU usage spikes just randomly and if not, start slowly turning on containers one by one until you find the culprit. I wish I had better advice for you, but again, when the issues just come out of nowhere like this and there wasn't some event that occurred right before the issues manifested, there is just no other way to narrow it down. Quote
relink Posted June 26, 2020 Author Posted June 26, 2020 As of the crash yesterday, I now only have the bare essential containers running and no VMs. If I can go a few days without another issue then I will start re-enabling things. If I crash again, then I will disable all containers and see what happens. The part the I find confusing about this is that there is not a single container or VM in my system that has access to all CPU threads. Plex has access to the most and even its capped at 10 out of 12, and everything else is limited to between 2 and 4. Quote
-Daedalus Posted June 29, 2020 Posted June 29, 2020 I haven't read the thread fully, so apologies for that, but I'm curious: Have you seen a 100% CPU crash from top/htop, or just from the GUI? I ask because the GUI also takes into account iowait in the CPU usage. This will spike any time the system is waiting on I/O (ie, disks), so I'm wondering if you've got a dodgy HBA or similar causing crazy latency on your disks. This can look like high CPU, because you'll see the graphs max out, and everything will slow to a crawl, but it's actually just that nothing can pull the data it needs. Quote
DivideBy0 Posted June 29, 2020 Posted June 29, 2020 (edited) Try this: # screen (install it from nerdools if you don't have it) #screen #cd / # while true; do ps -eocomm,pcpu | egrep -v '(0.0)|(%CPU)' >> cpu.log; echo "do a little dance, get down tonight"; sleep 1; done & If the servers dies and your reboot get back to and do # cd / # tail -f cpu.log # cat cpu.log |more and look for the app taking the most CPU? But the idea of elimination as suggested here is the way to go, turn everything off and then turn each docker/container/vm on one at the time. Edited June 29, 2020 by johnwhicker Quote
relink Posted June 30, 2020 Author Posted June 30, 2020 So I think I managed to catch things as they were falling apart this time. It seems that the issue is coming from running out of RAM. I don't know how unraid handles that, does it have a swap file? if so where is it? Anyway, I immediately ssh into unraid and ran htop and just simply didn't see anything using that much ram, same when running top...I just don't see anything using that much ram. Despite this, even with all containers and VMs stopped the ram usage never dropped below 54%. After restarting the array with all my main containers running I haven't gone over 19% ram usage. I have attached 2 diags this time. The first one is from before I restarted the array with everything stopped except pihole, and unbound. The other is after restarting the array and with my main containers running. serverus-diagnostics-20200629-2017.zip serverus-diagnostics-20200629-2013.zip Quote
mathomas3 Posted June 30, 2020 Posted June 30, 2020 Relink, Hello... im not going to be a big help to you here... I can only share what happens with my system... I have the ryzen 2300g and 8gigs of ram... I run there dockers only... and after nearly a year running with out issues I started to notice that my ram usage was 80%+ I would shutdown/reboot my dockers and it would bring things back inline to about 50%... and within a few days it would be back up to 80%... So I dont know it's it's like a memory creep of unraid or not... but I just elected to buy more ram and since DDR4 prices have dropped so much I bought 16gigs more... So what your describing is a bit more extreme then my situation, but i hope it might help Quote
relink Posted June 30, 2020 Author Posted June 30, 2020 14 hours ago, -Daedalus said: Have you seen a 100% CPU crash from top/htop, or just from the GUI? This is exactly what I see. I only see the 100% usage in the GUI. In htop everything looks normal. But that still doesn't stop docker from becoming completely unresponsive. I actually have had an issue with either my HBA or extender, im not sure which. But its an issue ive had for quite a while now, and this problem im having now is fairly new. But anyway, any time I go to reboot my unraid server I will generally have to reboot a minimum of 1-2 times to actually get all my disks to show up. On the first boot im guaranteed to have several disks missing from the array. However once I get all the disks to show up again, everything always seemed to have ran ok. Quote
relink Posted June 30, 2020 Author Posted June 30, 2020 Im checking up on my server this morning and I'm already seeing the RAM usage getting up-to 72%, however htop shows the process using the most ram is Plex at only 8.7%, and the Plex dashboard confirms this number...CPU useage is between 20-30% which for the current load is only slightly above average, and isn't anything that would freak me out. Quote
relink Posted July 1, 2020 Author Posted July 1, 2020 So Ive been going through every single line and setting on every single page of my unraid server trying to see if anything jumps out at me. One thing did, I have a plugin installed called "Dynamix Cache Directories", I don't remember if this comes with unraid or if I installed it. But anyway I read up on what it does and decided to try disabling it. Also this is by far the oldest plugin on my system showing the most current version to be "2018.12.04". Since disabling it, which was only 2 days ago, I haven't crashed, and I've had RAM usage in the 50% range instead of 80+%, and CPU usage seems to be staying around or under 20%. Quote
relink Posted July 6, 2020 Author Posted July 6, 2020 Im beginning to think it may be related to some disk problems I have been having. I have looked through my syslog server and see pretty consistent CRC errors from all of my drives. So I have all new cables on the way for my HBA and SAS expander. I noticed a crash happened a couple minutes after adding a new series to sonar, so just as the new episodes began flooding into the array is when it locked up. That's what it seemed like anyway. Cables will be here Wednesday, I guess Ill see what happens. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.