Scythe Posted November 9, 2018 Share Posted November 9, 2018 I'm trying to get to the bottom of an issue where unraid seems to get stuck at 100% CPU usage semi-regularly. What's odd is that the unraid main screen shows 100% CPU usage across all cores while the Dynamix System Stats plugin only seems to show 50% usage. Opening a terminal and running top shows that the CPU usage seems to be coming from 3 main processes: kswapd0, unraidd & loop2. What are the kswapd0 and loop2 processes? Should they be using this much CPU? Can they be safely killed/restarted? Quote Link to comment
John_M Posted November 9, 2018 Share Posted November 9, 2018 (edited) kswapd0 is the process that manages virtual memory (swap space). Since most Unraid users don't have swap space configured I would expect the process to be pretty idle. loop2 refers to the mounting of a disk image. In my case it's the docker.img mounted via loop2 on /var/lib/docker so it probably is in your case too. More worrying than CPU usage spikes up to 100% is your load average (the top line of top) which, unless you have a lot of processor cores (I mean Threadripper or dual Xeons), is excessive and has been for some time. I have a feeling that the problem might be associated with loop2, which is waiting for disk I/O and also has a very low nice value. That might point to corruption of your docker.img. You also have "only" 4 GB RAM and a few docker containers running so you'll be getting quite low on memory. Your diagnostics zip would reveal more. Edited November 9, 2018 by John_M Corrected mount point 1 Quote Link to comment
Scythe Posted November 9, 2018 Author Share Posted November 9, 2018 It's only an i3 CPU (Just running a NAS box with some docker containers for managing content/smart home). Every time I've had this 100% CPU issue it seems to take up to 2-3 hours to clear out and return to normal. Ram wise I know I only have 4GB and I do plan on upping this to 8GB but from what I was seeing in the UI I didn't think I was actually hitting on the RAM limit just yet with what was running. I've included my diagnostics below if that helps. tower-diagnostics-20181110-0041.zip Quote Link to comment
John_M Posted November 9, 2018 Share Posted November 9, 2018 Do the diagnostics you posted cover one of the problem periods? You could try running without any containers for a while as a test. Quote Link to comment
John_M Posted November 9, 2018 Share Posted November 9, 2018 Yes, you're running out of memory: Nov 7 21:56:39 Tower kernel: Plex Media Scan invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 One thing you might want to read is this: Quote Link to comment
John_M Posted November 9, 2018 Share Posted November 9, 2018 Your syslog is in fact full of OOMs. So try @Frank1940's advice. If that doesn't free up enough RAM you could try running as a plain old NAS without any docker containers running until you can add some more. Quote Link to comment
brmcdonald68s Posted August 20, 2019 Share Posted August 20, 2019 (edited) I have been having issues as well..... I have tried a few different things I've found not that is done much Im running an I-7 4790k not overclocked and just sitting idle with plex open I can get anywhere from 50% of my cores to include my HyperT cores pegged to 100% is 4 cores with 8 including the hyper threading on a Z-97 asus saber-tooth not strong enough... I mean Im fighting CRC errors as well and Ive tried moving and re-seating the sata connectors on my 2 Samsung 500GB's 860evo's with no luck Ive already gotten one error in 10 minutes of boot up im going to try narrowing it down and see if it is just on one drive and go from there but any help with my CPU issues would help because I cant even game on it at this point with the CPU Spikes.. Edited August 20, 2019 by brmcdonald68s Quote Link to comment
trurl Posted August 20, 2019 Share Posted August 20, 2019 46 minutes ago, brmcdonald68s said: I have been having issues as well. This thread is several months old. I guess since nobody else is still using it you can have it. Post diagnostics. Quote Link to comment
xxbigfootxx Posted October 16, 2020 Share Posted October 16, 2020 Same here. I've been noticing the server run quite sluggish and i even installed more RAM into it. I can see in the logs there are a few erros with a PCIe bus, can someone help with what the exact issue is? zeus-diagnostics-20201016-0920.zip Quote Link to comment
Xaero Posted October 16, 2020 Share Posted October 16, 2020 35 minutes ago, xxbigfootxx said: Same here. I've been noticing the server run quite sluggish and i even installed more RAM into it. I can see in the logs there are a few erros with a PCIe bus, can someone help with what the exact issue is? zeus-diagnostics-20201016-0920.zip 217.54 kB · 0 downloads Since all of the errors are with AER and they are all Corrected - it would be safe to disable AER - however, I would not recommend doing so. Instead, since this issue is being triggered when attempting to access the memory mapped PCI Configuration; I would use the kernel option to switch back to legacy PCI Configuration you can do so by adding the following kernel parameter: pci=nommconf This will force the machine to ask the device itself for it's configuration parameters rather than mapping the device's configuration to a memory address. There's a completely negligible performance difference, and this will keep AER enabled, which can improve stability (for example, if an actual error occurs AER might be able to correct it on the fly and not result in a crash) Quote Link to comment
xxbigfootxx Posted October 16, 2020 Share Posted October 16, 2020 38 minutes ago, Xaero said: I would use the kernel option to switch back to legacy PCI Configuration you can do so by adding the following kernel parameter: pci=nommconf Would that require shutting down the server and editing the syslinux.cfg file on the USB drive? Sorry, haven't done that before. Quote Link to comment
trurl Posted October 17, 2020 Share Posted October 17, 2020 On 10/15/2020 at 10:43 PM, xxbigfootxx said: editing the syslinux.cfg Main - Boot Device - Flash - Syslinux Configuration Quote Link to comment
xxbigfootxx Posted October 18, 2020 Share Posted October 18, 2020 14 hours ago, trurl said: Main - Boot Device - Flash - Syslinux Configuration Under Global config, or Unraid OS? Quote Link to comment
itimpi Posted October 18, 2020 Share Posted October 18, 2020 8 hours ago, xxbigfootxx said: Under Global config, or Unraid OS? That is off the Main page in the Unraid GUI. Quote Link to comment
DaSlinky Posted January 11, 2021 Share Posted January 11, 2021 (edited) Read somewhere the auto-update in Jackett can cause this. Disabled the autoupdate and restarted jackett, and its down. I have the same issue on occasion. nothing is assigned to use cpu0. supervisord.log Edited January 11, 2021 by DaSlinky Quote Link to comment
Proffles Posted January 11, 2021 Share Posted January 11, 2021 Hello, Just to chime in, I am also occasionally having this issue i think. CPU maxes out (not on all cores), docker containers become unresponsive, memory seems to be all used up. I am just doing a reboot when i notice that it's happened. but obviously this is a crappy workaround. Does anyone have any working solution? Cheers! Quote Link to comment
Soliver84 Posted October 3, 2021 Share Posted October 3, 2021 hab des gleiche, gibt es schon eine Lösung?🤣 This is the problem why the device is then no longer accessible after a while. First the surface poops from then nothing works anymore. Quote Link to comment
Soliver84 Posted October 3, 2021 Share Posted October 3, 2021 store the solution, DOCKER img, on a LOCAL disk or stick. is a known ZFS bug: https://github.com/openzfs/zfs/issues/11523 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.