[6.12] Unraid webui stop responding then Nginx crash

ibookg412 · June 15, 2023

I see similar behavior, but without nginx going unresponsive. Tested in both Chrome and Firefox. Chrome seems to work for about a minute or so after each refresh. Firefox stops working after about 30 seconds or so.

H3ms · June 15, 2023

Tested on Vivaldi (chrome like) and Safari, on Windows & mac.

H3ms · June 16, 2023

This morning, again, unable to join the webgui. Others services are running fine.

H3ms · June 16, 2023

I just put the server in safe mode to see if it's plugin relative.

But those error are still there:

Jun 16 09:19:34 NAS-ICARUS nginx: 2023/06/16 09:19:34 [alert] 20181#20181: *9184 open socket #13 left in connection 13
Jun 16 09:19:34 NAS-ICARUS nginx: 2023/06/16 09:19:34 [alert] 20181#20181: *9186 open socket #15 left in connection 14
Jun 16 09:19:34 NAS-ICARUS nginx: 2023/06/16 09:19:34 [alert] 20181#20181: aborting
Jun 16 09:20:01 NAS-ICARUS nginx: 2023/06/16 09:20:01 [alert] 21945#21945: *9533 open socket #15 left in connection 13

Arizuia · June 16, 2023

Updated to 6.12 last night before heading off to bed, woke up with unresponsive server (+ everything it was running)

TheFuzz · June 16, 2023

Same for me. Works fine after reboot, till the next crash.

Edited June 16, 2023 by TheFuzz

ibookg412 · June 16, 2023

WebGUI is completely non-responsive for me now. Everything else is fine. Tried restarting nginx in CLI with no luck. Running a parity check right now so don't want to restart.

H3ms · June 16, 2023

Try to find the pid of nginx with:

netstat -nlp | grep PORT_WEBGUI

You'll get this:

tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      3832/nginx: master  
tcp        0      0 IP_VPN:8080           0.0.0.0:*               LISTEN      3832/nginx: master  
tcp        0      0 IP_SERVER:8080          0.0.0.0:*               LISTEN      3832/nginx: master  
tcp6       0      0 IPV6:8080 :::*                    LISTEN      3832/nginx: master  
tcp6       0      0 ::1:8080                :::*                    LISTEN      3832/nginx: master

In my case the PID is: 3832

Then kill it with : kill -9 3832 and restart nginx by using: /etc/rc.d/rc.nginx start

With this i can get the WebGui back for a moment.

Edited June 16, 2023 by H3ms

H3ms · June 16, 2023

Maybe we should put this report at Urgent level?

david279 · June 16, 2023

Yeah i just had this happen to me after doing nothing overnight. All dockers and VMs were still working but couldnt get to the gui. Had to kill and restart nginx. Added a diagnostic....

daig.zip

H3ms · June 16, 2023

Changed Status to Open

Changed Priority to Urgent

rolan79 · June 16, 2023

My server had similar behavior, after it was upgraded last nigh the Gui was not accessible in the morning and I was unable to ssh to it. I had to manually restart the server and decided to downgrade it until there is more information on what's happening.

Edited June 16, 2023 by rolan79

JorgeB · June 16, 2023

Please re-retest after booting in safe mode and leaving all docker containers disabled, if OK start enabling one by one and leave it running enough time to confirm it's OK, then enable the next one.

H3ms · June 16, 2023

I'm already in safe mode since this morning, the issue is still there.

I'll test disabling all my dockers for a while and enabling one by one.

JorgeB · June 16, 2023

You can also list all the container you are using, in case someone else affected has the same, it may help find if affected users have something in common, since this AFAIK is not a general issue.

H3ms · June 16, 2023

So, I started shutting down dockers.

I shutdown all my dockers except Plex (family is watching...).

Those errors are still there:

Jun 16 20:01:24 NAS-ICARUS nginx: 2023/06/16 20:01:24 [alert] 24044#24044: *255702 open socket #28 left in connection 17
Jun 16 20:01:24 NAS-ICARUS nginx: 2023/06/16 20:01:24 [alert] 24044#24044: aborting
Jun 16 20:02:05 NAS-ICARUS nginx: 2023/06/16 20:02:05 [alert] 30711#30711: *256410 open socket #5 left in connection 13
Jun 16 20:02:05 NAS-ICARUS nginx: 2023/06/16 20:02:05 [alert] 30711#30711: aborting

I'm in safe mode, the server will stay like that all night (plex will be shutdown too).

this is the list of my dockers:

Authelia
Overseer
Redis
homepage
mineOS-node (never started)
duplicati
MariaDB
Nginx (swag docker)
phpmyadmin (always shudown)
meshcentral
Vaultwarden
Uptimekuma
Shinobi-pro-cctv
plex
plexAnnouncer
sonarr
flaresolver
radarr (twice)
prowlar
pyload
binhex-rtorrentvpn
filerun-ofi
cloud9
speedtest-tracker
glances

I also put a new diag file.

nas-icarus-diagnostics-20230616-2010.zip

JorgeB · June 16, 2023

2 minutes ago, H3ms said:

So, I started shutting down dockers.

If you can it would be better to reboot with the docker service disabled and then start the containers one at a time, if the error is already in the log, shutting down the culprit may still leave the errors.

H3ms · June 16, 2023

Alright.

So I restarted the server (still in safe mode), start the array without any docker or vm.

The issue is still there so with this errors :

Jun 16 23:41:40 NAS-ICARUS nginx: 2023/06/16 23:41:40 [alert] 27223#27223: *760 open socket #5 left in connection 16
Jun 16 23:41:40 NAS-ICARUS nginx: 2023/06/16 23:41:40 [alert] 27223#27223: *762 open socket #10 left in connection 17
Jun 16 23:41:40 NAS-ICARUS nginx: 2023/06/16 23:41:40 [alert] 27223#27223: aborting

I've put the diag file.

There is something else.... =(

It will stay like that all the night to see if it finaly crash Nginx.

nas-icarus-diagnostics-20230616-2344.zip

Edited June 16, 2023 by H3ms

bonienl · June 17, 2023

@H3ms

What are the clients 10.0.0.125 and 10.0.0.184, do these systems some automated calls to the server ?

The error message about "open socket" appears when nginx gets more requests than it can handle.

H3ms · June 17, 2023

Hi,

10.0.0.125 & 10.0.0.184 are the same computer, a mac book pro, by wifi & ethernet.

NGINX did not crash last night, it is still possible to access the server. It is true that this macbook is never turned off, and its browser is always open.

But I've had it for over a year, without ever seeing this error in the logs.

I don't understand why it appeared between rc7 and rc8.

I will try to close any browser on this computer and only connect to the unraid server by ssh to see if the error appears on the log.

thx for the help.

H3ms · June 17, 2023

About 1 hour later.

When the browser is closed on the mac, there is no more errors in the syslog file.

But as soon as I open the webgui on the macbook, the error appears like you can see in the diag file.

nas-icarus-diagnostics-20230617-1042.zip

ukkeman · June 17, 2023

i see the same, its pretty unresponsive after some time, like i can not open up the syslog anymore and the dashboard takes very long to load.

This reminds me of some error unraid had with nchan on, 6.5.x (as far as i remember)

H3ms · June 18, 2023

It crashed again (nginx only).

I looked in the error log of nginx and found alot of

2023/06/18 01:16:34 [alert] 14573#14573: worker process 25310 exited on signal 6
2023/06/18 01:16:34 [alert] 14573#14573: shared memory zone "memstore" was locked by 25310
ter process /usr/sbin/nginx -c /etc/nginx/nginx.conf: ./nchan-1.3.6/src/store/memory/memstore.c:705: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2023/06/18 01:16:34 [alert] 14573#14573: worker process 25351 exited on signal 6
2023/06/18 01:16:34 [alert] 14573#14573: shared memory zone "memstore" was locked by 25351
ter process /usr/sbin/nginx -c /etc/nginx/nginx.conf: ./nchan-1.3.6/src/store/memory/memstore.c:705: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
2023/06/18 01:16:34 [info] 25414#25414: Using 116KiB of shared memory for nchan in /etc/nginx/nginx.conf:161
2023/06/18 01:16:34 [info] 25414#25414: Using 131072KiB of shared memory for nchan in /etc/nginx/nginx.conf:161

Maybe you're right, it's a problem of nchan...

I have a lot of ram available, I'm at 49% (of 96gb). So it's not a lack of ram.

LilDrunkenSmurf · June 18, 2023

I'm having a similar issue. My server is available for ~5-6 hours and then nginx crashes. Killing nginx and restarting it returns access to the WebGUI. I have my server behind a reverse proxy so I wonder if the calls from nginx-ingress are what's overwhelming it? But this wasn't an issue in 6.11.5, this just started after the upgrade to 6.12.

I get a few hundred of these:

Jun 18 18:45:31 <SERVER> nginx: 2023/06/18 18:45:31 [alert] 15440#15440: worker process 18572 exited on signal 6
Jun 18 18:45:31 <SERVER> nginx: 2023/06/18 18:45:31 [alert] 15440#15440: shared memory zone "memstore" was locked by 18572

Edited June 19, 2023 by LilDrunkenSmurf

otakunorth · June 19, 2023

I just started having this issue to after moving to 6.12 after about 1 hour I lose access to the Unraid remote GUI, everything is still functioning correctly in the background

[6.12] Unraid webui stop responding then Nginx crash

User Feedback

Recommended Comments

ibookg412 1

Link to comment

H3ms 12

Link to comment

H3ms 12

Link to comment

H3ms 12

Link to comment

Arizuia 0

Link to comment

TheFuzz 0

Link to comment

ibookg412 1

Link to comment

H3ms 12

Link to comment

H3ms 12

Link to comment

david279 135

Link to comment

H3ms 12

Link to comment

rolan79 1

Link to comment

JorgeB 7519

Link to comment

H3ms 12

Link to comment

JorgeB 7519

Link to comment

H3ms 12

Link to comment

JorgeB 7519

Link to comment

H3ms 12

Link to comment

bonienl 1768

Link to comment

H3ms 12

Link to comment

H3ms 12

Link to comment

ukkeman 2

Link to comment

H3ms 12

Link to comment

LilDrunkenSmurf 1

Link to comment

otakunorth 3

Link to comment

Join the conversation