nginx running out of shared memory


Recommended Posts

I am still running 6.12.4, and prior to today my uptime was over 2 months, but unfortunately i have been working with the unraid server webpage open and forgot about checking  logs. The system locked up within 5 hours and i had to hard reboot. 

 

Are people still seeing this issue on 6.12.6?

 

 

*** NOTE:
My uptime was only that stable because I change the server webpage tab to unraid.net when I am not actively using the web console. 

Edited by semtex41
adding additional info
Link to comment
  • 1 month later...

Has anyone found a fix for this? This is happening in Unraid 6.12.8. I recently added an "on pool first start" script to increase the size of my /var/log folder to 512mb, since I have 64gb of RAM to use, so luckily it didn't crash, but I see easily hundreds of thousands of these errors in my logs.

 

I have noticed over the years that if the dashboard is left open, live polling eventually skyrockets to multiple updates per second. I wonder if this is what's causing it?

Edited by JohnnyGrey
Link to comment
28 minutes ago, kaares said:

I've given up on having the page open. I just have btop running in a terminal to keep an eye on it now

I didn't have a clean shutdown the whole first year I had the server running because of this bug.


One thing that I have noticed is that when I access the Dashboard directly from NPM, I might have it happen.

When I access that same dashboard from a CloudFlare tunnel, I have not seen it happen yet.
 

Link to comment
  • 3 weeks later...
On 3/14/2024 at 6:22 PM, JohnnyGrey said:

Has anyone found a fix for this? This is happening in Unraid 6.12.8. I recently added an "on pool first start" script to increase the size of my /var/log folder to 512mb, since I have 64gb of RAM to use, so luckily it didn't crash, but I see easily hundreds of thousands of these errors in my logs.

 

I have noticed over the years that if the dashboard is left open, live polling eventually skyrockets to multiple updates per second. I wonder if this is what's causing it?



I just recently upgraded from 6.9.2 to 6.12.8 and started running into this similar issue. 

Symptoms are basically where the system becomes unresponsive, and I can't even SSH into it or ping it.  Today after letting it sit for about 3hrs while it's doing a parity check I tried to load the GUI and I got a 404 nginx error - restarting the syslog service that allowed me to load the page correctly. I think once parity is finished I was going to downgrade back to 6.9.2 since it was much stable and see if it persists. 

Is there a way to monitor your log folder size? I am never able to see if that's getting too full before I just have an issue - and what script did you use to increase the size of it?

Edited by Varean
Link to comment
On 10/23/2023 at 11:21 PM, semtex41 said:

After another week, I have determined a few things:

 

  • Closing all tabs prevents the errors from building up/cascading. 
  • The browser type doesnt seem to matter. Crashes/logs growing happens with Edge, Chrome, and Firefox. 
  • My appdata backup (which runs on Monday mornings) has been one of the triggers for the nginx errors in the logs. When the tab is open, the log fills up with the errors while the scheduled job is running. I do not blame the plugin, because when the tab (all tabs) are closed, the errors are not generated. Closing the tab today prevented a hard crash like last week, which required a hard shutdown. 
  • This is a webserver based interface. If the primary mechanism for accessing the OS causes the OS to consistently crash, then it is a bug. 

I totally agree and hope Limetech will step in to fix it as soon as possible

Link to comment
1 hour ago, quack7017 said:

I totally agree and hope Limetech will step in to fix it as soon as possible

We all agree, but the post you are quoting is from six months ago - do we even know if Limetech is addressing it?  There have been several new releases of unRAID since then.

Link to comment

Similar Error messaged filling up syslog on my Server, but not local log.

 

UNRAID 6.12.10

Unraid Connect Plugin was installed, have uninstalled it now.
 

root@srv:~# grep -o 'Increase nchan_max_reserved_memory' /mnt/user/system/syslog-127.0.0.1.log | wc -l
74452

root@srv:~# awk -v phrase="Increase nchan_max_reserved_memory" '{count += gsub(phrase, "")} END {print count}' /mnt/user/system/syslog-127.0.0.1.log
74452

root@srv:~# awk -v phrase="Increase nchan_max_reserved_memory" '{count += gsub(phrase, "")} END {print count}' /mnt/user/system/syslog-127.0.0.1.log.1 
261773

root@srv:~# grep -o '"/usr/local/emhttp/us"' /mnt/user/system/syslog-127.0.0.1.log

root@pd-srv:~# du -h /mnt/user/system/syslog-127.0.0.1.log*
48M     /mnt/user/system/syslog-127.0.0.1.log
159M    /mnt/user/system/syslog-127.0.0.1.log.1
548M    /mnt/user/system/syslog-127.0.0.1.log.2
1.6G    /mnt/user/system/syslog-127.0.0.1.log.3
1.6G    /mnt/user/system/syslog-127.0.0.1.log.4

root@srv:~# du -h -d 1 /var/log
0       /var/log/pwfail
16K     /var/log/unraid-api
0       /var/log/preclear
0       /var/log/swtpm
2.5M    /var/log/samba
0       /var/log/plugins
28K     /var/log/pkgtools
0       /var/log/nginx
0       /var/log/nfsd
16K     /var/log/libvirt
3.1M    /var/log

 

Edited by pixeldoc81
Added more infos.
Link to comment

Hm, am receiving the same error messages and behavior. But in my case it seems, it may be caused by my Windows VM - which performs some heavy memory intensive operations. Given that I have assigned a max of 32Gb as memory out of my 128GB available, it seems like it triggers some leak in the VM / host (?), which consumes all of my host memory, which then leads to nginx not being able to allocate any further memory and ultimately crashes my entire Unraid. Well, a theory, it seems not everybody with the same issue is actually running a Windows VM but if my Windows VM is not running, all seems fine. VirtIO drivers is the last stable version and machine is Q35-7.2.

 

Link to comment

 Hello,

 

Just adding on this. I have the same behavior here,

using NPM plus (Docker)

The system totally crash only when i pass credential for an authentification on HTTPS.


For context: I'm using a GLPI (Ticketing system) in docker + MariaDB on a closed network loop that overlay 2 VLAN.

Docker is config in IPVLAN mode and all my VLANs are defined in the network config (its a quite complex one)

GLPI is facing the intranet, and it has 2 addresses: one HTTP and one HTTPS.

 

Note that i also have 12 other websites actively being proxied on the dockerized NPM with passing credentials and authentification 18h/24h 7d/7d, and the phenomenon only occurs with GLPI in HTTPS. (so when the request goes throught the nginx (of the dockerized NPM))

Now when using GLPI. When i log in with my credentials on HTTP, i have no issues.

Using the HTTPS: the credential goes throught, then it start to crumble bit by bit (sometimes very quickly), the page stop loading (especially the modules querying the DB directly).

 

SOMEHOW, this impact directly the nginx of unraid. and i have to reboot to take control again over my services.

 

sometimes i still manage to have a responding UnraidUI for a few second but i never managed to resart all the services before a complete freeze of the unraid nginx.

 

I now try to use as less as possible the unraid ui, and i tend to do it when few people are working. 

 

It happened recently with another program (also deployed in double docker 1app + 1db on separates vlan)

 

Logs are repeating indefinetly

 

Mar 19 21:38:23 THEMIS nginx: 2024/03/19 21:38:23 [crit] 16723#16723: ngx_slab_alloc() failed: no memory
Mar 19 21:38:23 THEMIS nginx: 2024/03/19 21:38:23 [error] 16723#16723: shpool alloc failed
Mar 19 21:38:23 THEMIS nginx: 2024/03/19 21:38:23 [error] 16723#16723: nchan: Out of shared memory while allocating message of size 5492. Increase nchan_max_reserved_memory.
Mar 19 21:38:23 THEMIS nginx: 2024/03/19 21:38:23 [error] 16723#16723: *717484 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/notify?buffer_length=1 HTTP/1.1", host: "localhost"
Mar 19 21:38:23 THEMIS nginx: 2024/03/19 21:38:23 [error] 16723#16723: MEMSTORE:00: can't create shared message for channel /notify
Mar 19 21:38:24 THEMIS nginx: 2024/03/19 21:38:24 [crit] 16723#16723: ngx_slab_alloc() failed: no memory
Mar 19 21:38:24 THEMIS nginx: 2024/03/19 21:38:24 [error] 16723#16723: shpool alloc failed
Mar 19 21:38:24 THEMIS nginx: 2024/03/19 21:38:24 [error] 16723#16723: nchan: Out of shared memory while allocating message of size 4753. Increase nchan_max_reserved_memory.
Mar 19 21:38:24 THEMIS nginx: 2024/03/19 21:38:24 [error] 16723#16723: *717490 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost"
Mar 19 21:38:24 THEMIS nginx: 2024/03/19 21:38:24 [error] 16723#16723: MEMSTORE:00: can't create shared message for channel /disks

 

PS: I'm using a Dell PowerEdge based on a Xeon 8c16t and 96Gb of Ram

Edited by Yonix
adding machine spec
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.