• [6.12] Unraid webui stop responding then Nginx crash


    H3ms
    • Retest Urgent

    Hi,

     

    Another day, another problem haha.

     

    Since the update to 6.12 (Rc6 was OK) my webui stop working correctly.

    The webui stop refreshing itself automaticaly (need F5 to refresh data on screen).

     

    Then finally, nginx crashes and the interface stops responding (on port 8080 for me). I have to SSH into the server to find the nginx PID, kill it and then start nginx.

     

    Ive set a syslog server earlier but i dont find anything relative to this except: 

    2023/06/15 20:42:50 [alert] 14427#14427: worker process 15209 exited on signal 6

     

    I attached the syslog and the diag file.

     

    Thx in advance.

     

    nas-icarus-diagnostics-20230615-2146.zip syslog




    User Feedback

    Recommended Comments



    I have been watching this thread for a while since I am having the same errors and behaviour. I experimented recently with disabling all plugins via safe mode. All of my dockers still running, everything else normal, just zero plugins.

     

    I was able to hit 6 days of uptime with no issues. I restarted and turned off safe mode and the issue came back, 1 day of uptime before the server was unresponsive. 

    dane-diagnostics-20230724-0927.zip

    Link to comment
    On 6/16/2023 at 3:54 PM, rolan79 said:

    My server had similar behavior, after it was upgraded last nigh the Gui was not accessible in the morning and I was unable to ssh to it. I had to manually restart the server and decided to downgrade it until there is more information on what's happening.

    (6.12.3)

    Is it easy to downgrade? It worked fine for me for half a year on 6.11 till I hit the upgrade to 6.12.3. Worked for maybe a day.  for me though not only nginx but all services.

    lessons learned. if ain't broken don't...

    Edited by gombihu
    • Like 1
    Link to comment
    39 minutes ago, gombihu said:

    Is it easy to downgrade? It worked fine for me for half a year on 6.11 till I hit the upgrade to 6.12.3. Worked for maybe a day.  for me though not only nginx but all services.

    lessons learned. if ain't broken don't...

    I rolled back to 6.12.1 from 6.12.2 haven't had an issue since then so im gona stay on 6.12.1 untill i see this thread die pretty much lol

    Link to comment
    9 hours ago, gombihu said:

    Is it easy to downgrade?

    It is always easy to manually downgrade (or upgrade) using the manual method described here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.

    • Like 1
    Link to comment

    Same issue here with Unraid 6.12.3. Can't even run diagnostics from command line when nginx is in this state. It just freezes. I also see this thousands of times in my nginx error log:

     

    /usr/sbin/nginx -c /etc/nginx/nginx.conf: ./nchan-1.3.6/src/store/memory/memstore.c:705: nchan_store_init_worker: Assertion `procslot_found == 1' failed.

     

    Edited by Stubbed6815
    Link to comment
    On 7/26/2023 at 8:49 AM, itimpi said:

    It is always easy to manually downgrade (or upgrade) using the manual method described here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.

    It is possible, but none of the docker container starts on 6.11.5 anymore after downgrade.

    Link to comment

    There's a note in one of the first posts that you have to force update all your containers after downgrading due to a change in groups.  I downgraded back to 6.11.5 and after doing a force update on all my containers that wouldn't start everything was fine.

    Link to comment
    On 6/23/2023 at 6:35 PM, H3ms said:

    One full week, still no clue?

    It's really annoying to have to restart nginx every day...

    No, one month and still no clue! 😒 Not only webUI/nginx crashes for me but the whole server, not even ssh, CPU is on 100%, at least the fan is on max.. It should be some docker related issue. I cannot do diagnostic if I'm not able to reach the server.

    Link to comment
    23 hours ago, stanger89 said:

    There's a note in one of the first posts that you have to force update all your containers after downgrading due to a change in groups.  I downgraded back to 6.11.5 and after doing a force update on all my containers that wouldn't start everything was fine.

    I haven't found that post you are referring to, but just partially helps, deluge for example complaining on used port when it is not. 

    Edited by gombihu
    Link to comment

    Can anyone provide guidance on how to work around this issue without shutting down?

    My current solution has been SSH, captures diagnostics, and then "poweroff".  Sometimes Ill wait, nothing seems to happen, and I send poweroff again, and the machine seems to shutdown way too quick (and on boot, when i push power button), it reports a unclean shutdown.  I cancel parity check, and then rinse and repeat in 12-24 hours.

     

    Am I using the wrong command?  Is there a better way to get nginx to restart properly so i dont have to perform this everyday?

    oxygen-diagnostics-20230729-2004.zip

    • Like 1
    Link to comment

    I've downgraded to 6.11.5, but these command may be helpful:

     

    to control nginx:

    /etc/rc.d/rc.nginx <start, stop, or restart>

     

    you might have have to kill the nginx process:

    ps -aux | grep nginx
    kill -9 <process id of nginx master process and maybe the s6-supervise nginx>

     

    I kept the nginx process stopped (/etc/rc.d/rc.nginx stop) unless I needed to use the web ui, in which case I would start it and immediately stop it after I was done.

    Link to comment
    root@oxygen:~# ps -aux | grep nginx
    root      1104  0.0  0.0   7928  5016 ?        Ss   Jul29   0:00 nginx: master process /usr/sbin/nginx
    nobody    1129  0.0  0.0   8520  4852 ?        S    Jul29   0:00 nginx: worker process
    nobody    1130  0.0  0.0   8520  4852 ?        S    Jul29   0:00 nginx: worker process
    nobody    1131  0.0  0.0   8520  4776 ?        S    Jul29   0:00 nginx: worker process
    nobody    1132  0.0  0.0   8520  4780 ?        S    Jul29   0:00 nginx: worker process
    root      9633  0.0  0.0 147024  4016 ?        Ss   Jul29   0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
    root      9634  0.0  0.0 148236  8096 ?        S    Jul29   0:13 nginx: worker process
    root     13064  0.0  0.0   4052  2224 pts/0    S+   13:16   0:00 grep nginx
    root     15478  0.0  0.0    212    20 ?        S    Jul29   0:00 s6-supervise svc-nginx
    root     15826  0.0  0.0   7812  3932 ?        Ss   Jul29   0:00 nginx: master process /usr/sbin/nginx
    nobody   15932  0.0  0.0   8160  2988 ?        S    Jul29   0:00 nginx: worker process
    nobody   15933  0.0  0.0   8160  2160 ?        S    Jul29   0:00 nginx: worker process
    nobody   15934  0.0  0.0   8160  2984 ?        S    Jul29   0:00 nginx: worker process
    nobody   15935  0.0  0.0   8160  2984 ?        S    Jul29   0:00 nginx: worker process
    nobody   21461  0.0  0.0  48488 11428 pts/0    Ss+  Jul29   0:00 nginx: master process nginx
    nobody   26207  0.0  0.0  49152  9440 pts/0    S+   12:20   0:00 nginx: worker process
    nobody   26208  0.0  0.0  48724  6464 pts/0    S+   12:20   0:00 nginx: worker process
    nobody   26209  0.0  0.0  48724  6464 pts/0    S+   12:20   0:00 nginx: worker process
    nobody   26210  0.0  0.0  48724  6464 pts/0    S+   12:20   0:00 nginx: worker process
    nobody   26211  0.0  0.0  48724  6464 pts/0    S+   12:20   0:00 nginx: worker process
    nobody   26212  0.0  0.0  48724  6464 pts/0    S+   12:20   0:00 nginx: worker process
    nobody   26214  0.0  0.0  48724  6464 pts/0    S+   12:20   0:00 nginx: worker process
    nobody   26215  0.0  0.0  48724  6464 pts/0    S+   12:20   0:00 nginx: worker process
    nobody   26216  0.0  0.0  47956  6604 pts/0    S+   12:20   0:00 nginx: cache manager process

     

    16 hours ago, srirams said:
    ps -aux | grep nginx
    kill -9 <process id of nginx master process and maybe the s6-supervise nginx>

     

    I kept the nginx process stopped (/etc/rc.d/rc.nginx stop) unless I needed to use the web ui, in which case I would start it and immediately stop it after I was done.

     

    @srirams From the list of many nginx processes that I have going, do i need to kill all that say master? (1104, 9633, 15826, 19862, 21461).

     

    /etc/rc.d/rc.nginx stop

    ^ just hangs on "Shutdown Nginx gracefully..."

    Link to comment
    1 hour ago, TheDon said:

     

    @srirams From the list of many nginx processes that I have going, do i need to kill all that say master? (1104, 9633, 15826, 19862, 21461).

     

     

    That's what I would try... killing the master process should kill all the child process as well.... when that happens you can try starting the nginx service again

    Link to comment

    Are there any updates on the progress to fix this? Or would additional data/commands to gather information help?

     

    Also a bit offtopic, but maybe related, since it is another issue with ipv6:
    The managment access menu shows the public ipv6 link in this format:

    http://[aaaa:aaaa:aaaa:aaaa::bbb]b/

    When it obviously should be:

    http://[aaaa:aaaa:aaaa:aaaa::bbbb]/

    So maybe some part responsible for parsing ipv6 addresses in unraid is broken that gets reused?

    Link to comment

    I upgraded 6.12.2 -> 6.12.3 and webgui has been up over 7 days now. 6.12.2 didn't survive this long even once. Still too early to say if problem is solved or not. And many other posters wrote that update didn't solve the problem so I'm not too confident yet. Strange thing is that my other unraid server is still on version 6.12.2 and it hasn't crashed once. It has been running 30 days now.

     

    I'm expecting to webgui and whole server to stay up for 365 days as in earlier versions which is my normal maintenance interval and time to do yearly boot and dust cleaning. After that I can say problem solved 🙂.

    Link to comment

    I have been performing downgrades

    [6.12.3] - "current stable", this is where i started.  I dont think my issues started on this version, but its when i started dealing with it

    [6.12.2] - Issue was still present

    [6.12.0] - Issue seem to take longer to present itself, I got to over 19 hours runtime

    [6.11.5] - This broken all of my docker containers from starting automatically, I had to change the network to something else, and then back to correct setting to launch all my docker containers.

    Just booted into 6.11.5, so havent been able to give it a 24hr stability test.

    Link to comment

    Just noticed this longer thread with a similar issue as to what I was having with 6.12.x. Adding my $0.02.

    The lockups at least in my case seem to be related to disk access. I could lock up my system manually by trying to calculate the size of my user shares in the GUI. WebUI would stop responding and SSH wouldn't connect resulting in a "broken pipe" error. I could still read the system logs from my other Unraid install using the remote syslog function so it was receiving connections just fine and some process were even still running.

     

    Secondarily I downgraded to 6.11.5 today and my disk speeds have basically doubled with zero changes in configuration. No lockups as of yet and calculating user shares goes off without a hitch. Not really seeing anything related in the changelog for that version but figured I'd add my info in case it helps.

     

    For now downgrading looks to be the path. 

    EDIT: Oh and I almost forgot. Not only would the server lock-up but any machine Windows 11 machine accessing SMB shares from it would as well. I have a share mapped and explorer would literally just keep crashing and reloading. Would have to hard shutdown the machine to get it to stop.

    Edited by Mindaboveall
    Link to comment

    It has been stable for me in the last couple of days by just setting ipv6 DNS to static servers, with dynamic address. So the issue seems to be related to ipv6 DNS instead of just ipv6. I hope this helps and would really appreciate an official update on this.

    • Upvote 1
    Link to comment
    On 8/4/2023 at 10:38 AM, TheDon said:

    I have been performing downgrades

    [6.12.3] - "current stable", this is where i started.  I dont think my issues started on this version, but its when i started dealing with it

    [6.12.2] - Issue was still present

    [6.12.0] - Issue seem to take longer to present itself, I got to over 19 hours runtime

    [6.11.5] - This broken all of my docker containers from starting automatically, I had to change the network to something else, and then back to correct setting to launch all my docker containers.

    Just booted into 6.11.5, so havent been able to give it a 24hr stability test.

     

    Problem still exists in 6.11.5 for me, really not sure what else I can do at this point.  Next gui crash i can try restarting the nginx, that didnt work for me in 6.12.x but maybe now?

    oxygen-diagnostics-20230804-2248.zip

    Link to comment

    Ive been reading and testing a bit and it seems to be connected to ipv6. if i use the ipv4 address its fine.

    I also noticed that the Management Access page on 12.3 has a peculiar issue:1161852689_unraidodd.thumb.PNG.ce4f7cbf3749a6cb5e1a6edc10bad0e8.PNG

    the ipv6 is broken. not sure if this is just cosmetic or deeper and tries to pingpong one around when using the Name

    • Upvote 1
    Link to comment

    I have seen a lot of people mention ipv6, but i started this seeing this issue, and have had ipv6 disabled for a long time.  When i found this thread, and read through it I went to check if i had ipv6 enabled, but it was completely disabled, and i think i did that pretty early on.

     

    I am running 6.11.5 now, and still cant keep the GUI running for more than 12-24 hours.

     

    I am attempting safe mode right now, see if it helps at all.

    oxygen-diagnostics-20230810-1137.zip syslog-10.0.0.8-20230810.log

    Link to comment
    5 hours ago, TheDon said:

     

    I am running 6.11.5 now, and still cant keep the GUI running for more than 12-24 hours.

     

    I am attempting safe mode right now, see if it helps at all.

    oxygen-diagnostics-20230810-1137.zip syslog-10.0.0.8-20230810.log

    You can save yourself the time with the safe mode. Its broken even on complete fresh install with absolutely nothing running.

    Stay on 6.11.5 unless you really want the exclusive shares or zfs.

    Link to comment
    On 8/10/2023 at 5:59 PM, Mainfrezzer said:

    You can save yourself the time with the safe mode. Its broken even on complete fresh install with absolutely nothing running.

    Stay on 6.11.5 unless you really want the exclusive shares or zfs.

    Actually @Mainfrezzer I have achieved 2 days, 10 hours of stability with safe mode on (v6.11.5), so I guess this might mean my issue is plugin related?

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.