March 24, 20251 yr Hello, I've been an Unraid user for quite a few years. Recently I've had issues with the CPU running away and pegging at 100% with system load ramping up to over 2000%. I have the server being monitored with Prometheus and Grafana. Last night 24/03/2025 @ 00:16:00 the CPU ran away. This prevents all dockers from running and makes it very hard to SSH into the machine. I noticed this at about 05:45 ish and managed to SSH in. I have learnt from experience that if I restart docker with /etc/rc.d/rc.docker restart the system usually recovers. Image below showing the CPU and RAM usage during this episode. Diagnostic files are also attached. Any help in diagnosing this issue would be very much appreciated. tower-diagnostics-20250324-2139.zip
March 25, 20251 yr If it recovers after a docker restart, it suggests one of the containers is causing the problem, try running the server with just half of the containers enabled, if the same, try the other half, then keep drilling down.
March 25, 20251 yr Plex most likely doing it's nightly scheduled tasks like generating thumbnails/detecting credits/etc. I would start there.
March 25, 20251 yr Author 5 hours ago, JorgeB said: If it recovers after a docker restart, it suggests one of the containers is causing the problem, try running the server with just half of the containers enabled, if the same, try the other half, then keep drilling down. There must be a more targeted approach available than this. With the number of dockers I have and the frequency of the lockups this will take 1.5 to 2 years.
March 25, 20251 yr Author 11 minutes ago, MowMdown said: Plex most likely doing it's nightly scheduled tasks like generating thumbnails/detecting credits/etc. I would start there. That is a guess, can I have a more targeted approach please.
March 25, 20251 yr you could add some config around your promotheus to monitor containers. You could be inspired by that
March 25, 20251 yr Author 1 hour ago, caplam said: you could add some config around your promotheus to monitor containers. You could be inspired by that I do monitor my containers with cAdvisor, prior to the CPU runaway, the monitoring does not show any tell tail signs, but because cAdvisor is itself a container, as soon as the issue hits cAdvisor goes unresponsive and does not provide any data.
March 25, 20251 yr 6 hours ago, Keith Ellis said: That is a guess, can I have a more targeted approach please. next time it occurs, SSH in and run the "top" command. see what shows up at the top of the list. otherwise start by turning off/disable plex during those hours and see if the issue still persists. There is no magic solution to your issue.
March 30, 20251 yr Author I’ve tried this, but as the images show in my original post, ALL dockers ramp up.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.