Jump to content

nginx running out of shared memory


Recommended Posts

6 minutes ago, JorgeB said:

If you restart you lose the setting, please add it after a new boot and re-test.

Will do - it will take a little time due to competing priorities but as soon as I do that I'll post the results. Thanks for the help!

  • Like 1
Link to comment
9 hours ago, JorgeB said:

If you restart you lose the setting, please add it after a new boot and re-test.

Think I got it this time:

 

cat /etc/nginx/nginx.conf | grep size
    client_body_buffer_size 32k;
    types_hash_max_size 2048;
    client_max_body_size 20m;
    server_names_hash_bucket_size 128;
    nchan_shared_memory_size 512M;

 

Now just waiting to see what happens.

  • Like 1
Link to comment
52 minutes ago, JorgeB said:

Could be a hardware or a container issue, does it crash if you leave that container disabled?

It only seems to have issues if I leave the webUI tab open. If I close it and leave it alone, everything is fine.

Edited by furiouslog
Link to comment
3 hours ago, JorgeB said:

I don't think that would be the issue if there's nothing relevant logged in the syslog.

Let me clarify: when I leave the tab open, it starts logging mce errors after a period of time has passed. If I don't open the tab in a browser and leave it alone, it does not log any errors.

 

I also took the server down and ran another memtest on it, and it passed. All of the system diagnostics that I can run from IPMI also pass. It definitely appears to be a problem related to the other reported problems in this thread of keeping a tab open in a browser. 

Link to comment

I'll add my hat to the ring of people who just had "nchan: Out of shared memory....." errors flood my log as well (V: 6.12.10). 
As well: 

Quote

- nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"
- [alert] 47896#47896: *3521146 header already sent while keepalive, client: 10.253.1.4, server: 192.168.11.11:80
- kernel: nginx[47896]: segfault at 0 ip 0000000000000000 sp 00007ffc40c39498 error 14 in nginx[400000+24000] likely on CPU 2 (core 2, socket 0)
- kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
- nginx: 2024/08/14 14:59:29 [alert] 14946#14946: worker process 47896 exited on signal 11
- nginx: 2024/08/14 14:59:29 [crit] 47898#47898: ngx_slab_alloc() failed: no memory
- nginx: 2024/08/14 14:59:29 [error] 47898#47898: shpool alloc failed
- nginx: 2024/08/14 14:59:29 [error] 47898#47898: nchan: Out of shared memory while allocating channel /disks. Increase nchan_max_reserved_memory.


Cant open logs or terminal through the main gui (opens and closes right away).  It says im out of memory, but this system has 384GB of ECC ram!  

Seem to have access to SMB but the gui is slow to load locally.

And yes, i have a bunch of tabs open on my PC at home (remote access through wiregaurd), as i've been troubleshooting possible drive issue (and i have been getting MCE errors too, but i thought that was tied to that).  I'm going to assume if i close the open tabs, and clear the nginx service (or reboot), those errors will stop?  I hope this is fixed in the next release!  

Edited by miicar
Link to comment

I've had this happen twice now in the past couple of weeks,

 

Server continues to operate fine in the background - HomeAssistant/Z2M/Zwave/Plex/SWAG/Nextcloud/MariaDB/all other Dockers seemed unaffected, but GUI was missing parts. Issues such as:

  • All array discs, pool ssds, and even flash drive not showing at all under the 'Main' tab
  • All discs/ssds missing from 'Dashboard' tab
  • Other components of 'Dashboard' tab glitched, such as percentage bars/graph for CPU/RAM etc not working
  • 'Shares' tab seemed unaffected
  • reloading either page within Chrome tab didn't work
  • reloading in an incognito Chrome tab didn't work
  • reloading in either Safari (different browser) or mobile Chrome (different IP) didn't work
  • Logs nor Terminal work via buttons in GUI (top right buttons)
  • Logs are visible if viewed via Tools>System Log however

The logs showed mostly the same 4-6 messages:

  • ngx_slab_alloc() failed: no memory
  • shpool alloc failed
  • nchan: Out of shared memory while allocating channel /temperature. Increase nchan_max_reserved_memory.
  •  *1798536 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"
  • kernel: nginx[25247]: segfault at 0 ip 0000000000000000 sp 00007ffc32198c58 error 14 in nginx[400000+24000] likely on CPU 8 (core 4, socket 0)
  • kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.

Both times, a full restart resolved the GUI issues, and the repeated errors in the log.

Yes, I do leave 1-2 tabs open with Unraid most of the time... Could this be the main cause?

 

Logs from before restart attached

Unraid Syslog 14.08.24 GUI Freeze

  • Upvote 1
Link to comment
On 4/29/2024 at 1:06 AM, martial said:

Instead of killing all nginx, I was looking for the master process (ie not the ones run by containerd-shim-runc-v2)

So far the simpler way I found is to run:

ps -axfo pid,ppid,uname,cmd | grep nginx | grep -v '\\_'

ie get the hierarchy of processes and remove the ones started by another process.

 

This returns only one value so far.

 

You can then kill the process and

/etc/rc.d/rc.nginx start

it again

Just a quick note that as long as you can SSH into your system, the above trick still works for me when this problem occurs and will avoid to have to reboot:

1. find the master nginx process' PID (that ps command above)

2. kill -9 PID

3. restart nginx

4. wait a few seconds and your dashboard should be back functional

 

Link to comment
  • 4 weeks later...

I cannot believe we are still on this issue after 3 years. Has no one on the team resolved this yet? I have had the same problem for the last 2 years and resetting the process is only a temporary fix. It's a bandaid on a bullet wound philosophy here. Also increasing the memory size is not a solution since it will just outgrow the new limit.

"nchan: Out of shared memory while allocating channel /cpuload. Increase nchan_max_reserved_memory.
Sep 12 04:40:10 Servo nginx: 2024/09/12 04:40:10 [error] 2372#2372: *1895414 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"

 

 

  • Upvote 1
Link to comment

I've also been dealing with this issue as a weekly (sometimes daily) nuisance for quite awhile, at this point I created a user script that runs bi-daily just to prevent this annoyance. Can we PLEASE make this a priority-1 goal for Unraid 6.12.14? It seems so absurdly commonplace and I can't imagine this is a difficult problem to solve.

  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...