nginx running out of shared memory

November 25, 20241 yr

i feel like this should be solved by just setting some sort of timeout on nchan's? like if tab is in the background for more than x minutes unsubscribe all nchans? maybe some indicator on the dashboard saying all nchan's in hibernation, "click to force a refresh to resubscribe"?

it deffs isnt that simple though, what if they are updating all docker containers that might take longer than x... idk not as easy as you think to solve.

below is my userscript to run each boot to bandaid the issue

#!/bin/bash

sed -i "s/client_body_buffer_size 32k;/client_body_buffer_size 20m;/" "/etc/nginx/nginx.conf"
if [ "$(cat /etc/nginx/nginx.conf | grep "nchan_shared_memory_size" || echo "")" = "" ]; then
    sed -i "/include \/etc\/nginx\/conf.d\/servers.conf;/i nchan_shared_memory_size 512M;" "/etc/nginx/nginx.conf"
fi
/etc/rc.d/rc.nginx restart

i did notice tools like dynamix plugins talk ALOT so i wonder if we should be telling them to chill out a bit? or atleast let the user change the period at which say temp readings are communicated, so instead of spamming every few seconds i can just say hey once a minute?

i feel like this issue may get worse with all the graph stuff they want to do in 7

Edited November 25, 20241 yr by phyzical
code block

Quote

December 20, 20241 yr

I'll contribute with my research.

It is clearly the nchan plugin that is causing trouble. Unfortunately it seems abandonware because there is several issues the author of nchan is aware off and decides not to fix, and instead spin a complete v2 into a new and separate software from nginx.

I have been looking at two solutions.

1. Can we stop nchan from memory leaking?

2. Can we stop nchan from spamming the system log if memory leaks anyway?

As others have posted there is nchan_shared_memory_size that might help. I wanted to replicate the issue fast for my debugging, so I tried setting memory really low (2M instead of the default 128M). I left on some tabs on overnight and to my surprise I have not been getting any nchan memory errors. The nchan author mentions that messages send to nchan are garbage collected. With 128M, the garbage collector doesn't seem to do its job well but with the pressure of only having 2MB of memory it seems it might be managing the limited memory more effectively.

Now 16 hours is perhaps to short to draw any conclusions. I will monitor this more. However I also wanted to see if I can stop errors from reaching the system log. The messages are published by the unraid system to the /pub/ endpoint (in particular /etc/cpuload seems to receive high frequency messages even if nothing is using the WebUI). So I changed the nginx config for this location to log into a separate file. I am not sure if the memory errors also get caught and redirected to this new log file, but time will tell.

If others want to also test my solution and have it persist on each reboot, you can add the following lines at the top of /boot/config/go (after which you should reboot for the changes to take effect):

# Fix nchan memory leak
sed -z 's|nchan publishers\n\t#|nchan publishers\n\t#\n\tnchan_shared_memory_size 2M;|' -i /etc/rc.d/rc.nginx
sed -z 's|nchan_publisher;|nchan_publisher;\n\t        error_log /var/log/nginx/nchan_error.log;|' -i /etc/rc.d/rc.nginx

Edited December 20, 20241 yr by iLaurens

Quote

December 20, 20241 yr

33 minutes ago, iLaurens said:

As others have posted there is nchan_shared_memory_size that might help. I wanted to replicate the issue fast for my debugging, so I tried setting memory really low (2M instead of the default 128M). I left on some tabs on overnight and to my surprise I have not been getting any nchan memory errors. The nchan author mentions that messages send to nchan are garbage collected. With 128M, the garbage collector doesn't seem to do its job well but with the pressure of only having 2MB of memory it seems it might be managing the limited memory more effectively.

That's interesting, please keep us updated

Quote

January 21, 20251 yr

Still in 7 official it appears

Jan 20 22:29:07 unraid nginx: 2025/01/20 22:29:07 [error] 2979871#2979871: nchan: Out of shared memory while allocating channel /disks. Increase nchan_max_reserved_memory.
Jan 20 22:29:07 unraid nginx: 2025/01/20 22:29:07 [error] 2979871#2979871: *2647614 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost"

Quote

2

January 21, 20251 yr

Also still having issues in unraid7, and have been the last couple years

Quote

1

January 24, 20251 yr

+1

Quote

January 26, 20251 yr

Same for me unraid 7.0.0

Quote

January 26, 20251 yr

It's shameful that this has been going on for YEARS without a proper fix.

unraid-diagnostics-20250126-2144.zip

Quote

2

January 27, 20251 yr

Please open another tab and enter this in URL bar:

tower.local/nchan_stub_status

(of course use your hostname)

It will display something like this:

total published messages: 8625168
stored messages: 11
shared memory used: 72K
shared memory limit: 131072K
channels: 16
subscribers: 3
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7

I never see that "shared memory used" value get over 100K.

If you see this number starting to grow, please post this output and let me know what tabs you have open on your server from all sources. thx

Quote

January 27, 20251 yr

7 minutes ago, limetech said:
Please open another tab and enter this in URL bar:

tower.local/nchan_stub_status

(of course use your hostname)

It will display something like this:
total published messages: 8625168
stored messages: 11
shared memory used: 72K
shared memory limit: 131072K
channels: 16
subscribers: 3
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7
I never see that "shared memory used" value get over 100K.

If you see this number starting to grow, please post this output and let me know what tabs you have open on your server from all sources. thx

Is this what it shouldn't look like?:

image.png.275acf05f73f0f7748a4d15ee15c0ff3.png

Quote

January 27, 20251 yr

What Unraid version? What tabs do you have open?

Quote

January 27, 20251 yr

35 minutes ago, limetech said:

What Unraid version? What tabs do you have open?

Unraid 7 and at time just docker tab I believe. Don’t recall leaving any other windows open on other devices. Not that I recall anyways.

Quote

January 27, 20251 yr

Thanks, one more, which browser are you using?

Quote

January 27, 20251 yr

38 minutes ago, limetech said:

Thanks, one more, which browser are you using?

Always Chrome; occasionally Safari on iPhone.

Quote

January 28, 20251 yr

9 hours ago, limetech said:
Please open another tab and enter this in URL bar:

tower.local/nchan_stub_status

(of course use your hostname)

It will display something like this:
total published messages: 8625168
stored messages: 11
shared memory used: 72K
shared memory limit: 131072K
channels: 16
subscribers: 3
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7
I never see that "shared memory used" value get over 100K.

If you see this number starting to grow, please post this output and let me know what tabs you have open on your server from all sources. thx

For me, it has happened on the Dashboard in the past.

I would get the error if it left a page open in a browser on the dashboard (with all the GUI) within hours.
This was on Unraid 6.12 using Firefox.

I have had the error on 7.0 but not sure which tab I left open (or in which system), but very likely still in Firefox

Edited January 28, 20251 yr by martial

Quote

January 28, 20251 yr

We cannot reproduce this issue. One thing we were concentrating on was the Docker page. When you are on this page there is an nchan process started with publishes real-time docker container usage stats (cpu and memory usage). The process accomplishes this by continuously invoking 'docker stats' command. You can see this in

/usr/local/emhttp/plugins/dynamix.docker.manager/nchan/docker_load

This info is only displayed in 'Advanced' view but the process is running whether in 'Basic' or 'Advanced'. When you navigate away from the page, the process is terminated. As a side note this is something to be aware of: if you leave a browser sitting on Docker page, that process is running and consuming resources. Maybe not the best design

Anyway tried opening the page in multiple browsers and tabs, switching pages, etc., could never see any memory leak which starts blowing up shared memory.

Quote

January 29, 20251 yr

total published messages: 1520977
stored messages: 18
shared memory used: 12000K
shared memory limit: 131072K
channels: 23
subscribers: 15
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7

At least I'm oberving a constant increas of "shared memory used:", but I don't know whats causing it.

Quote

January 29, 20251 yr

@Jazer, you may want to post the URL you have open and what browser and version you are using. Also, whether that's the only tab you have open that's pointed at your unraid server.

Quote

January 29, 20251 yr

total published messages: 1731910
stored messages: 157
shared memory used: 25524K
shared memory limit: 131072K
channels: 14
subscribers: 10
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7

Latest FF // 2x Docker Tab (Adv. View) - 20 container running - Folder View plugin

Quote

January 29, 20251 yr

2 hours ago, Jazer said:

Latest FF // 2x Docker Tab (Adv. View) - 20 container running - Folder View plugin

Do you recall if the CPU and Memory Load were being displayed and updated aporox every 3 sec on the Docker Tab?

Quote

January 30, 20251 yr

total published messages: 1830613
stored messages: 0
shared memory used: 130308K
shared memory limit: 131072K
channels: 0
subscribers: 0
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7

Yes - the load was shown. But honestly, I still don't really know what is causing the leak. I tested it with 16x open tabs and could not observe any significant increase for about 2h.

After closing the lid of my laptop for some hours and coming back, am near the limit. *Maybe* messages/memory are not released after stalled connections or reconnecting causes it - im just guessing.

Log:

Jan 30 07:13:33 Tower nginx: 2025/01/30 07:13:33 [crit] 1257890#1257890: ngx_slab_alloc() failed: no memory
Jan 30 07:13:33 Tower nginx: 2025/01/30 07:13:33 [error] 1257890#1257890: shpool alloc failed
Jan 30 07:13:33 Tower nginx: 2025/01/30 07:13:33 [error] 1257890#1257890: nchan: Out of shared memory while allocating channel /shares. Increase nchan_max_reserved_memory.
Jan 30 07:13:33 Tower nginx: 2025/01/30 07:13:33 [error] 1257890#1257890: *3172828 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/shares?buffer_length=1 HTTP/1.1", host: "localhost"

Quote

February 6, 20251 yr

I think I just caught the bug in the act.

Unraid 7.0.0 stable with latest patches.

Firefox 135 (although I'm still pretty sure this is a browser-agnostic bug, I'm reasonably certain that observed the same thing with Chrome previously).

At least the docker tab was left open last night (not sure if there were others), and I resumed my system from standby (as described several times here, that seems to be the, or at the very least a, trigger).

The UI slowly started to fail and started throwing error (community apps kept randomly failing to communicate with the Unraid Server, for example) and is now entirely broken, I can't open syslog via the UI (the popup window closes instantly and opening the URL directly throws a 502 Bad Gateway).

The interesting thing that I noticed:

The bug isn't caused by nginx running eventually out of shared memory, the bug is causing nginx to run out of shared memory.

nchan_stub_status when I started typing this:

total published messages: 2862057
stored messages: 2
shared memory used: 7356K
shared memory limit: 131072K
channels: 7
subscribers: 2
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7

nchan_stub_status a couple of minutes later:

total published messages: 2862831
stored messages: 1
shared memory used: 13204K
shared memory limit: 131072K
channels: 5
subscribers: 1
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7

It's currently rapidly filling up shared memory - but the UI is already (mostly) broken.

Another minute later (all tabs closed, except for nchan_stub_status):

total published messages: 2863246
stored messages: 0
shared memory used: 16228K
shared memory limit: 131072K
channels: 6
subscribers: 3
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7

It looks like "shared memory used" keeps growing faster than in the beginning.

Less than a minute later:

total published messages: 2863626
stored messages: 1
shared memory used: 18776K
shared memory limit: 131072K
channels: 5
subscribers: 1
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7

I just SSHed into the server, this how the syslog looks while it is happening:

Feb  6 08:42:11 Bob nginx: 2025/02/06 08:42:11 [alert] 11264#11264: worker process 3365625 exited on signal 6
Feb  6 08:42:11 Bob nginx: 2025/02/06 08:42:11 [alert] 11264#11264: worker process 3365684 exited on signal 6
Feb  6 08:42:11 Bob nginx: 2025/02/06 08:42:11 [alert] 11264#11264: worker process 3365788 exited on signal 6
Feb  6 08:42:11 Bob nginx: 2025/02/06 08:42:11 [alert] 11264#11264: worker process 3365791 exited on signal 6
Feb  6 08:42:12 Bob nginx: 2025/02/06 08:42:12 [alert] 11264#11264: worker process 3365858 exited on signal 6
Feb  6 08:42:12 Bob nginx: 2025/02/06 08:42:12 [alert] 11264#11264: worker process 3365966 exited on signal 6
Feb  6 08:42:12 Bob nginx: 2025/02/06 08:42:12 [alert] 11264#11264: worker process 3365967 exited on signal 6
Feb  6 08:42:12 Bob nginx: 2025/02/06 08:42:12 [alert] 11264#11264: worker process 3365972 exited on signal 6
Feb  6 08:42:13 Bob nginx: 2025/02/06 08:42:13 [alert] 11264#11264: worker process 3365973 exited on signal 6
Feb  6 08:42:13 Bob nginx: 2025/02/06 08:42:13 [alert] 11264#11264: worker process 3366024 exited on signal 6

EDIT:

Ok, some more interesting observations:

Running

/etc/rc.d/rc.nginx restart

did not resolve the bug. It, naturally, reset nginx and freed up shared memory, but it was rapidly filling up again and syslog was still throwing multiple "nginx: worker process ... exited on signal 6" per second.

Closing all open Firefox tabs to Unraid (in fact, all but this forum post) did not resolve the issue.

Simply restarting Firefox made the whole thing stop instantly. No more log entries.

However, this did not clear up nginx shared memory - it's now hovering at the last number where it was when I restarted Firefox:

total published messages: 2707
stored messages: 18
shared memory used: 1404K
shared memory limit: 131072K
channels: 16
subscribers: 7
redis pending commands: 0
redis connected servers: 0
redis unhealthy upstreams: 0
total redis commands sent: 0
total interprocess alerts received: 0
interprocess alerts in transit: 0
interprocess queued alerts: 0
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.3.7

Edited February 6, 20251 yr by csb
Added more observations

Quote

February 6, 20251 yr

Ha! I think I just managed to reproduce/trigger the bug intentionally:

I had two tabs open, Main and Docker in Firefox.

I restarted nginx while they were open and the whole mess is starting all over again instantly. Syslog is filling up with "nginx: worker process ... exited on signal 6" albeit slower than before ("only" about one entry every two seconds) and shared memory used keeps filling up rapidly.

This time, simply closing the tabs made the log entries stop.

Again, shared memory filled up to several megabytes and is showing no signs of freeing up the already used memory. But it's now hovering at the amount where it was when the tabs were closed.

So far, a viable workaround is: close browser, ssh into server, restart nginx, reopen browser after.

Maybe nginx was restarted by the server last night and all hell broke loose when my PC resumed from standby with the old tabs still open? Isn't there something that can be done to prevent this?

diagnostics.zip

Edited February 6, 20251 yr by csb

Quote

1

February 9, 20251 yr

Is anyones server rebooting because of this?

Quote

February 11, 20251 yr

This is also driving me crazy. It doesnt even fix it if I use
/etc/rc.d/rc.nginx restart
Seems like after 24 hours dashboard breaks consistently on 7.0. /nchan_stub_status wont load either.

Quote

nginx running out of shared memory

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

limetech

Squid

echocabinet

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

limetech

Squid

echocabinet

Posted Images

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)