Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

CPU Runaway

Featured Replies

Good Evening,

 

About the same time every day ~2200 CST my Unraid CPU "runs away". I have grabbed diagnostics as event occurred, before it locked up and I couldn't. Also grabbed a few HTOP Screen shots. I don't really understand them but I would very appreciative if anyone could help me pinpoint what exactly is causing everything to hard crash at about the same time every day. It happened last night, and today I had MANY of my containers off thinking it was one of the containers causing the crash. Often it locks up, and never recovers without hard reset. 

 

Attached are my Diagnostics as event was happening and some various screenshots that hopefully someone will find useful. In system log I see nothing that "sticks out" but I am far from expert.

 

SOS

449738384_ScreenShot2023-03-04at10_23_52PM.png.cbcd9103f3164f4fb519559f7b970b39.png

 

1265880590_ScreenShot2023-03-04at10_13_17PM.thumb.png.c354efd683a839ff3a5359a26fd66834.png

 

 

1521427719_ScreenShot2023-03-04at10_14_12PM.thumb.png.ac9ddbf409cc9d602e481ab430bc613b.png

 

 

1517023012_ScreenShot2023-03-04at10_15_03PM.thumb.png.c1bdd3e5f352fdaaf80ae1c32158dfd6.png

 

tower-diagnostics-20230304-2222.zip

  • Author
ar  5 08:06:17 Tower kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=8acaeec5cfd427a5dc7efe8f924e23706eefe68bf4115f6bfd00aa4b8354dcb6,mems_allowed=0-1,oom_memcg=/docker/8acaeec5cfd427a5dc7efe8f924e23706eefe68bf4115f6bfd00aa4b8354dcb6,task_memcg=/docker/8acaeec5cfd427a5dc7efe8f924e23706eefe68bf4115f6bfd00aa4b8354dcb6,task=nginx,pid=5455,uid=0

Mar  5 08:06:17 Tower kernel: Memory cgroup out of memory: Killed process 5455 (nginx) total-vm:188444kB, anon-rss:90488kB, file-rss:0kB, shmem-rss:344kB, UID:0 pgtables:240kB oom_score_adj:0

Mar  5 08:11:48 Tower webGUI: Successful login user root from 192.168.1.3

Mar  5 08:12:50 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Mar  5 08:12:50 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Mar  5 08:12:50 Tower kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Odd, shortly after my comment above it "ran away again." I have never saw the SCSI ioctl thing above before; could it be related?

 

Another diagnostics as it was running away is attached.

tower-diagnostics-20230305-0823.zip

  • Author

Anyone think recreating docker.img might be beneficial?

I am on mobile, so haven't checked your diagnostics 

 

Can you check docker tab, advanced view to see if a specific container is eating up all the CPU? Or type docker stats in a terminal

 

The memory being full is also not good. The first step is to separate the cause and effect. 

  • Author
3 hours ago, apandey said:

I am on mobile, so haven't checked your diagnostics 

 

Can you check docker tab, advanced view to see if a specific container is eating up all the CPU? Or type docker stats in a terminal

 

The memory being full is also not good. The first step is to separate the cause and effect. 

Last night re-created docker image(& converted from file to folder) and haven’t had any issue, yet. But we will see. If I see more, and I’d say there’s a good chance, I will run docker stats. 
 

if you do get time to look through diagnostics still and see anything helpful Please let me know. 

Edited by blaine07

the main thing I see is constant app crashes due to lack of memory. Are you running some sort of webserver that is exposed to other users? any chance you are getting unexpected high traffic

 

if it happens again, try to look at docker resource utilization. I have a grafana dashboard setp up which helps a lot to look back at any trending data

  • Author
26 minutes ago, apandey said:

the main thing I see is constant app crashes due to lack of memory. Are you running some sort of webserver that is exposed to other users? any chance you are getting unexpected high traffic

 

if it happens again, try to look at docker resource utilization. I have a grafana dashboard setp up which helps a lot to look back at any trending data

I had had a GRAV server. But when troubleshooting it should’ve been “off”. 
 

y

Yeah, once cpu would run away though memory would max out utilization as well. 

  • Author
4 hours ago, apandey said:

the main thing I see is constant app crashes due to lack of memory. Are you running some sort of webserver that is exposed to other users? any chance you are getting unexpected high traffic

 

if it happens again, try to look at docker resource utilization. I have a grafana dashboard setp up which helps a lot to look back at any trending data

This happened again this afternoon. Unfortunately it go to “too locked up” before I caught it to get any logs. 
 

Any other ideas?

I have influxdb + grafana setup to capture metrics from my unraid server (using unraid ultimate dashboards as base). That way I can see trending data on cpu / memory / docker resources. Useful to see what is using resources

  • Community Expert

First try to identify what is invoking the OOM killer, possibly a container, disable all containers and then enable one by one to see if you can find the culprit, if you find it limit its resources.

  • Author
1 minute ago, JorgeB said:

First try to identify what is invoking the OOM killer, possibly a container, disable all containers and then enable one by one to see if you can find the culprit, if you find it limit its resources.

I only have basically a core of containers running - been playing with most not running at all. When I enable containers one by one what exactly am I looking for to determine it’s the culprit? Are we positive it’s one single container? (Sorry, genuinely want to understand)

  • Author
5 minutes ago, JorgeB said:

First try to identify what is invoking the OOM killer, possibly a container, disable all containers and then enable one by one to see if you can find the culprit, if you find it limit its resources.

I see this container name referenced with OOM. How can I turn this string into exactly which container?

 

 

DBB97E26-04DB-4DDE-9883-87964E867699.jpeg

  • Community Expert
3 minutes ago, blaine07 said:

what exactly am I looking for to determine it’s the culprit?

See if the OOM killer is still invoked, alternatively limit the amount of RAM for all containers.

  • Author
1 minute ago, JorgeB said:

See if the OOM killer is still invoked, alternatively limit the amount of RAM for all containers.

Pardon my idiocracy: how can I see if OOM killer is invoked?

 

When cpu usage “runs away” memory is climbing to 100% too

  • Author

It appears that string above, it goes to NginxProxy Manager. How much/what should I limit CPU usage too? It's the only time I see OOM though and really wasn't at the time system "crashed"?

 

And it already has "--memory=1G --no-healthcheck" in extra parameters?

Edited by blaine07

  • Community Expert

 

 

14 minutes ago, blaine07 said:

Pardon my idiocracy: how can I see if OOM killer is invoked?

Look for lines like this on the syslog:

 

Mar  6 21:36:36 Tower kernel: nginx invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0

 

  • Author
3 minutes ago, JorgeB said:

 

 

Look for lines like this on the syslog:

 

Mar  6 21:36:36 Tower kernel: nginx invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0

 

I found that, and container listed with it(right above it before OOM) is NginxProxyManager - I restricted CPU cores, but it already has  "--memory=1G --no-healthcheck" in extra parameters?

  • Community Expert
10 minutes ago, blaine07 said:

and container listed with it(right above it before OOM) is NginxProxyManager

Where are you seeing that? Usually there's no reference about what's causing the OOM, just about the app that invoked it because it didn't have enough memory.

  • Author
4 minutes ago, JorgeB said:

Where are you seeing that? Usually there's no reference about what's causing the OOM, just about the app that invoked it because it didn't have enough memory.

Well the long string right ABOVE your excerpt: 

thhh.jpeg.e77eb744dd1f99d6509723d39d334d9a.jpeg

 

I took that and went to "shares" then "appdata" then "system" shares then clicked "docker" then "docker" again then clicked "container" and searched for above long string. Once I did that I went into corresponding folder and downloaded "hostconfig" and was able to determine that that long string was referencing NginxProxyManager. I don't know if that's right or if it's culprit; but thats how I arrived. I did limit CPU for NPM, too though.

 

  • Community Expert

I meant were are you seeing that in the syslog?

  • Author
2 minutes ago, JorgeB said:

I meant were are you seeing that in the syslog?

Well It was:

 

Mar  6 21:36:36 Tower kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=b7bf47074734f898f67616851b6c9c6128f182ef264006024be566416b2d07e1,mems_allowed=0-1,oom_memcg=/docker/b7bf47074734f898f67616851b6c9c6128f182ef264006024be566416b2d07e1,task_memcg=/docker/b7bf47074734f898f67616851b6c9c6128f182ef264006024be566416b2d07e1,task=nginx,pid=25095,uid=0
Mar  6 21:36:36 Tower kernel: Memory cgroup out of memory: Killed process 25095 (nginx) total-vm:274036kB, anon-rss:176240kB, file-rss:4kB, shmem-rss:516kB, UID:0 pgtables:412kB oom_score_adj:0
Mar  6 21:36:38 Tower kernel: oom_reaper: reaped process 25095 (nginx), now anon-rss:0kB, file-rss:0kB, shmem-rss:516kB
Mar  6 21:44:06 Tower root: Fix Common Problems Version 2023.03.04
Mar  6 21:44:08 Tower root: Fix Common Problems: Warning: unRaids built in FTP server is running ** Ignored
Mar  6 21:44:16 Tower root: Fix Common Problems: Error: Out Of Memory errors detected on your server
Mar  6 21:44:29 Tower root: Fix Common Problems: Warning: Wrong DNS entry for host ** Ignored

 

4 minutes ago, JorgeB said:

I meant were are you seeing that in the syslog?

 

  • Author

Just a few ago it tried to lock up: Mar  7 06:19:31 Tower kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=0c85bb041886edc37981442c550f8522b2687c54eddc7769fe20d345c7a32c92,mems_allowed=0-1,oom_memcg=/docker/0c85bb041886edc37981442c550f8522b2687c54eddc7769fe20d345c7a32c92,task_memcg=/docker/0c85bb041886edc37981442c550f8522b2687c54eddc7769fe20d345c7a32c92,task=nginx,pid=16299,uid=0
Mar  7 06:19:31 Tower kernel: Memory cgroup out of memory: Killed process 16299 (nginx) total-vm:271632kB, anon-rss:173940kB, file-rss:0kB, shmem-rss:112kB, UID:0 pgtables:424kB oom_score_adj:0

  • Community Expert

That is after the oom killer invoked line, and if I understand correctly it's the app the was killed, not the app that caused the oom issue.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.