• [6.12] Unraid webui stop responding then Nginx crash


    H3ms
    • Retest Urgent

    Hi,

     

    Another day, another problem haha.

     

    Since the update to 6.12 (Rc6 was OK) my webui stop working correctly.

    The webui stop refreshing itself automaticaly (need F5 to refresh data on screen).

     

    Then finally, nginx crashes and the interface stops responding (on port 8080 for me). I have to SSH into the server to find the nginx PID, kill it and then start nginx.

     

    Ive set a syslog server earlier but i dont find anything relative to this except: 

    2023/06/15 20:42:50 [alert] 14427#14427: worker process 15209 exited on signal 6

     

    I attached the syslog and the diag file.

     

    Thx in advance.

     

    nas-icarus-diagnostics-20230615-2146.zip syslog




    User Feedback

    Recommended Comments



    I see similar behavior, but without nginx going unresponsive.  Tested in both Chrome and Firefox. Chrome seems to work for about a minute or so after each refresh.  Firefox stops working after about 30 seconds or so. 

    Link to comment

    I just put the server in safe mode to see if it's plugin relative.

     

    But those error are still there:

    Jun 16 09:19:34 NAS-ICARUS nginx: 2023/06/16 09:19:34 [alert] 20181#20181: *9184 open socket #13 left in connection 13
    Jun 16 09:19:34 NAS-ICARUS nginx: 2023/06/16 09:19:34 [alert] 20181#20181: *9186 open socket #15 left in connection 14
    Jun 16 09:19:34 NAS-ICARUS nginx: 2023/06/16 09:19:34 [alert] 20181#20181: aborting
    Jun 16 09:20:01 NAS-ICARUS nginx: 2023/06/16 09:20:01 [alert] 21945#21945: *9533 open socket #15 left in connection 13

     

     

    Link to comment

    Updated to 6.12 last night before heading off to bed, woke up with unresponsive server (+ everything it was running)

    Link to comment

    WebGUI is completely non-responsive for me now.  Everything else is fine.  Tried restarting nginx in CLI with no luck.  Running a parity check right now so don't want to restart.

    • Thanks 1
    Link to comment

    Try to find the pid of nginx with: 

    netstat -nlp | grep PORT_WEBGUI

    You'll get this: 

    tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      3832/nginx: master  
    tcp        0      0 IP_VPN:8080           0.0.0.0:*               LISTEN      3832/nginx: master  
    tcp        0      0 IP_SERVER:8080          0.0.0.0:*               LISTEN      3832/nginx: master  
    tcp6       0      0 IPV6:8080 :::*                    LISTEN      3832/nginx: master  
    tcp6       0      0 ::1:8080                :::*                    LISTEN      3832/nginx: master

     

    In my case the PID is: 3832

    Then kill it with kill -9 3832 and restart nginx by using: /etc/rc.d/rc.nginx start

     

    With this i can get the WebGui back for a moment.

    Edited by H3ms
    • Like 1
    • Upvote 1
    Link to comment

    Yeah i just had this happen to me after doing nothing overnight. All dockers and VMs were still working but couldnt get to the gui. Had to kill and restart nginx. Added a diagnostic....

    daig.zip

    Link to comment

    My server had similar behavior, after it was upgraded last nigh the Gui was not accessible in the morning and I was unable to ssh to it. I had to manually restart the server and decided to downgrade it until there is more information on what's happening.

     

    Edited by rolan79
    Link to comment

    Please re-retest after booting in safe mode and leaving all docker containers disabled, if OK start enabling one by one and leave it running enough time to confirm it's OK, then enable the next one.

    Link to comment

    I'm already in safe mode since this morning, the issue is still there.

    I'll test disabling all my dockers for a while and enabling one by one.

     

     

    • Like 1
    Link to comment

    You can also list all the container you are using, in case someone else affected has the same, it may help find if affected users have something in common, since this AFAIK is not a general issue.

    Link to comment

    So, I started shutting down dockers.

    I shutdown all my dockers except Plex (family is watching...).

    Those errors are still there:

    Jun 16 20:01:24 NAS-ICARUS nginx: 2023/06/16 20:01:24 [alert] 24044#24044: *255702 open socket #28 left in connection 17
    Jun 16 20:01:24 NAS-ICARUS nginx: 2023/06/16 20:01:24 [alert] 24044#24044: aborting
    Jun 16 20:02:05 NAS-ICARUS nginx: 2023/06/16 20:02:05 [alert] 30711#30711: *256410 open socket #5 left in connection 13
    Jun 16 20:02:05 NAS-ICARUS nginx: 2023/06/16 20:02:05 [alert] 30711#30711: aborting

    I'm in safe mode, the server will stay like that all night (plex will be shutdown too).

    this is the list of my dockers:

    • Authelia
    • Overseer
    • Redis
    • homepage
    • mineOS-node (never started)
    • duplicati
    • MariaDB
    • Nginx (swag docker)
    • phpmyadmin (always shudown)
    • meshcentral
    • Vaultwarden
    • Uptimekuma
    • Shinobi-pro-cctv
    • plex
    • plexAnnouncer
    • sonarr
    • flaresolver
    • radarr (twice)
    • prowlar
    • pyload
    • binhex-rtorrentvpn
    • filerun-ofi
    • cloud9
    • speedtest-tracker
    • glances

    I also put a new diag file.

    nas-icarus-diagnostics-20230616-2010.zip

    Link to comment
    2 minutes ago, H3ms said:

    So, I started shutting down dockers.

    If you can it would be better to reboot with the docker service disabled and then start the containers one at a time, if the error is already in the log, shutting down the culprit may still leave the errors.

    • Thanks 1
    Link to comment

    Alright.

    So I restarted the server (still in safe mode), start the array without any docker or vm.

    The issue is still there so with this errors :

    Jun 16 23:41:40 NAS-ICARUS nginx: 2023/06/16 23:41:40 [alert] 27223#27223: *760 open socket #5 left in connection 16
    Jun 16 23:41:40 NAS-ICARUS nginx: 2023/06/16 23:41:40 [alert] 27223#27223: *762 open socket #10 left in connection 17
    Jun 16 23:41:40 NAS-ICARUS nginx: 2023/06/16 23:41:40 [alert] 27223#27223: aborting

     

    I've put the diag file.

     

    There is something else.... =(

     

    It will stay like that all the night to see if it finaly crash Nginx.

    nas-icarus-diagnostics-20230616-2344.zip

    Edited by H3ms
    Link to comment

    @H3ms

    What are the clients 10.0.0.125 and 10.0.0.184, do these systems some automated calls to the server ?

     

    The error message about "open socket" appears when nginx gets more requests than it can handle.

     

    Link to comment

    Hi,

     

    10.0.0.125 & 10.0.0.184 are the same computer, a mac book pro, by wifi & ethernet.

     

    NGINX did not crash last night, it is still possible to access the server. It is true that this macbook is never turned off, and its browser is always open.

     

    But I've had it for over a year, without ever seeing this error in the logs.

    I don't understand why it appeared between rc7 and rc8.

     

    I will try to close any browser on this computer and only connect to the unraid server by ssh to see if the error appears on the log.

     

    thx for the help.

    Link to comment

    i see the same, its pretty unresponsive after some time, like i can not open up the syslog anymore and the dashboard takes very long to load.

     

    This reminds me of some error unraid had with nchan on, 6.5.x (as far as i remember)

    Link to comment

    :o It crashed again (nginx only).

    I looked in the error log of nginx and found alot of 

    2023/06/18 01:16:34 [alert] 14573#14573: worker process 25310 exited on signal 6
    2023/06/18 01:16:34 [alert] 14573#14573: shared memory zone "memstore" was locked by 25310
    ter process /usr/sbin/nginx -c /etc/nginx/nginx.conf: ./nchan-1.3.6/src/store/memory/memstore.c:705: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
    2023/06/18 01:16:34 [alert] 14573#14573: worker process 25351 exited on signal 6
    2023/06/18 01:16:34 [alert] 14573#14573: shared memory zone "memstore" was locked by 25351
    ter process /usr/sbin/nginx -c /etc/nginx/nginx.conf: ./nchan-1.3.6/src/store/memory/memstore.c:705: nchan_store_init_worker: Assertion `procslot_found == 1' failed.
    2023/06/18 01:16:34 [info] 25414#25414: Using 116KiB of shared memory for nchan in /etc/nginx/nginx.conf:161
    2023/06/18 01:16:34 [info] 25414#25414: Using 131072KiB of shared memory for nchan in /etc/nginx/nginx.conf:161

    Maybe you're right, it's a problem of nchan...

    I have a lot of ram available, I'm at 49% (of 96gb). So it's not a lack of ram.

    Link to comment

    I'm having a similar issue. My server is available for ~5-6 hours and then nginx crashes. Killing nginx and restarting it returns access to the WebGUI. I have my server behind a reverse proxy so I wonder if the calls from nginx-ingress are what's overwhelming it? But this wasn't an issue in 6.11.5, this just started after the upgrade to 6.12.

     

    I get a few hundred of these:

    Jun 18 18:45:31 <SERVER> nginx: 2023/06/18 18:45:31 [alert] 15440#15440: worker process 18572 exited on signal 6
    Jun 18 18:45:31 <SERVER> nginx: 2023/06/18 18:45:31 [alert] 15440#15440: shared memory zone "memstore" was locked by 18572

    Edited by LilDrunkenSmurf
    Link to comment

    I just started having this issue to after moving to 6.12 after about 1 hour I lose access to the Unraid remote GUI, everything is still functioning correctly in the background 

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.