• [6.12] Unraid webui stop responding then Nginx crash


    H3ms
    • Retest Urgent

    Hi,

     

    Another day, another problem haha.

     

    Since the update to 6.12 (Rc6 was OK) my webui stop working correctly.

    The webui stop refreshing itself automaticaly (need F5 to refresh data on screen).

     

    Then finally, nginx crashes and the interface stops responding (on port 8080 for me). I have to SSH into the server to find the nginx PID, kill it and then start nginx.

     

    Ive set a syslog server earlier but i dont find anything relative to this except: 

    2023/06/15 20:42:50 [alert] 14427#14427: worker process 15209 exited on signal 6

     

    I attached the syslog and the diag file.

     

    Thx in advance.

     

    nas-icarus-diagnostics-20230615-2146.zip syslog




    User Feedback

    Recommended Comments



    53 minutes ago, SpaceInvader said:

    Another thing I just noticed is that with IPv6 enabled there always is this /usr/sbin/atd running a nginx reload every three seconds

     

    This is triggered because the DHCP client on your system is continuously adding and removing IP addresses.

    It is unclear yet to me why this is happening though. Still studying the diagnostics to find a clue.

     

    I have seen similar behavior in other diagnostics (together with failing services) and disabling IPv6 and run IPv4 only solves the problem.

     

     

    Link to comment
    2 hours ago, bonienl said:

    @SpaceInvader

    Can you test ipv6 and start the system in safe mode?

     

     

    IPv6 connectivity definitely works in both directions. The nas has internet access and can be reached from the local network through ipv6. The reason I want to use IPv6 in the first place is that it is more reliable and faster with my provider, since they use DSLite.

    Interesting that it is continuously changing IPs I didn't see that in the log.

    I'll reboot it in safe mode with ipv6 on and will add the diagnostics, if it crashes again.

    Link to comment

    So it just crashed in safe mode. I also disabled my vpn, before booting to safemode, to also exclude that.

    The crash was at about 2:48 in the log. At about 2:32 I tried enabling and then disabling nfs again, since it also seems weird that the nfs/rpc stuff attempts to start, even though it is disabled in the ui.

    The array was stopped during this.

    The nginx error log is also attached, since I think it is not included in the diag zip and it contains more lines than the syslog.

     

    -------

    I also just did another test with my second network interface disabled only ipv6 enabled (instead of both). It had the same crash in the attached file. There are also a bunch of these mesasages

     emhttpd: error: get_limetech_time, 251: Connection timed out (110): -2 (7)

    which seems to be an unrelated bug, with some unraid server not supporting ipv6.

     

    I'm out of ideas, since I basically disabled everything possible and it still crashes.

     

    nas-diagnostics-20230717-0250.zip nginx_error.log nas-diagnostics-20230717-0412_only_ipv6.zip

    Edited by SpaceInvader
    add ipv6 only
    Link to comment
    On 6/25/2023 at 10:22 AM, H3ms said:

    No crash since i'm not using the webgui at all...

    But i have a lot of a new error 

     [2023/06/25 16:19:13.971533,  0] ../../source3/nmbd/nmbd_packets.c:761(queue_query_name)
    Jun 25 16:19:13 NAS-ICARUS nmbd[21478]:   queue_query_name: interface 2 has NULL IP address !

     

    nas-icarus-diagnostics-20230625-1619.zip

     

    Is there an update on this error? I'm seeing it in my logs, it doesn't look to be critical, just a nuisance?

     

    What is interface 2 - is that ETH1? 

    In my case, ETH1 is bonded to ETH0, so it won't have it's own IP address?

    Edited by coolspot
    Link to comment
    7 hours ago, SpaceInvader said:

    So it just crashed in safe mode. I also disabled my vpn, before booting to safemode, to also exclude that.

     

    Thanks for testing

     

    8 hours ago, SpaceInvader said:

    I tried enabling and then disabling nfs again

     

    This is weird in your logs, because NFS is not enabled, it should not get started at all. Yet it does and gives lots of errors.

    In my testing enabling / disabling NFS in the GUI gives the correct behavior when starting the system. I never get the errors seen in your log. I checked all your config files and can't find anything wrong. A mystery!

     

    Somehow I think the RPC/NFS errors are related to the NGINX errors, or in other words the source of these errors make these services fail.

     

    8 hours ago, SpaceInvader said:

    The nginx error log is also attached,

     

    Can you post the content of file: etc/nginx/conf.d/servers.conf

     

    8 hours ago, SpaceInvader said:

    which seems to be an unrelated bug,

     

    Yeah unrelated, has nothing to do with the problem. Limetech site doesn't respond to IPv6.

     

    8 hours ago, SpaceInvader said:

    I'm out of ideas,

     

    We keep on investigating this issue.

     

    Link to comment
    6 hours ago, coolspot said:

    Is there an update on this error? I'm seeing it in my logs, it doesn't look to be critical, just a nuisance?

     

    This is a bug in netbios. You can ignore the message or disable netbios to stop it.

     

    Link to comment

    I think I might be seeing a similar issue, after updating to 6.12, I've been documenting it in another thread (possibly the wrong place):

     

     

    Link to comment

    I haven't yet updated to 6.12.3. Since everything is working except webui and it's laborious to shut all services down on VMs and to start them up again. Strange thing is that I have another unraid server which has almost identical config and is in same network than the server which has webui crashing. I upgraded unraid version for both of them at the same time but that server 2 has been running without webui crashing since update.

     

    Difference between them is that unraid server 1 is the "main" server with more containers and also VMs but the server 2 is just file storage. I don't know if that has anything to do with the problem.

     

    Unraid server 1 (the one with webui crashing):

    Containers:

    • elasticsearch
    • piwigo
    • endlessh
    • netdata
    • diskover
    • mariadb
    • mysql
    • Plex
    • deluge

    Plugins:

    • community.applications.plg - 2023.07.03  (Up to date)
    • dynamix.active.streams.plg - 2023.02.19  (Up to date)
    • dynamix.cache.dirs.plg - 2023.02.19  (Up to date)
    • dynamix.file.integrity.plg - 2023.03.26  (Up to date)
    • dynamix.system.temp.plg - 2023.02.04b  (Up to date)
    • file.activity.plg - 2023.06.15  (Up to date)
    • fix.common.problems.plg - 2023.04.26  (Up to date)
    • open.files.plg - 2023.06.12  (Up to date)
    • tips.and.tweaks.plg - 2023.07.05  (Up to date)
    • unbalance.plg - v2021.04.21  (Up to date)
    • unRAIDServer.plg - 6.12.2
    • user.scripts.plg - 2023.03.29  (Up to date)

    VMs:

    • 3 VMs running

     

    Unraid2 (stable):

    Containers:

    • netdata

    Plugins:

    • community.applications.plg - 2023.07.21  (Update available: 2023.07.03)
    • dynamix.file.integrity.plg - 2023.03.26  (Up to date)
    • fix.common.problems.plg - 2023.04.26  (Update available: 2023.07.16)
    • unassigned.devices.plg - 2023.07.04  (Update available: 2023.07.16)
    • unassigned.devices-plus.plg - 2023.04.15  (Up to date)
    • unbalance.plg - v2021.04.21  (Up to date)
    • unRAIDServer.plg - 6.12.2
    • user.scripts.plg - 2023.03.29  (Update available: 2023.07.16)

    VMs:

    • No VMs
    • Like 1
    Link to comment
    8 hours ago, bonienl said:

    Can you post the content of file: etc/nginx/conf.d/servers.conf

    The file is attached. Btw the webui is accessible using both the ipv6 and ipv4 directly.

     

    That I can't get rpc nfs to not start also seems very weird to me. Maybe I'll try making a completely clean install on another usb stick later to see if there is something going on with my install.

    servers.conf

     

     

    So after testing the fresh install I just created with the USB Creator  I got the exact same issue! The only thing I did was enable ipv4+ipv6 and ssh.

    diagnostics-fresh-install.zip

    Edited by SpaceInvader
    Link to comment

    I would like to add another data point for this issue.

    I'm new to the unraid and just set it up for 2 weeks. I started with 6.12.2. Everthing works great until 2days ago.

    The symptom is exactly same as this post. The webgui crashed after a few hours running and I have to kill the nginx and restart it to access the webgui. More interesting thing is that, I restarted my server when the first time I occured this issue. At that time, even though the array haven't running, all the dockers and VMs haven't started, the webgui was still crashed.

    After reading though this post, I highly suspect this is related to ipv6. Because as far as I remember, the last thing I was doing on my server is setting up Qbittorrnet docker. One process I'm done is enable ipv6 in my unraid network setting since the bt tracker I'm using only allow ipv6 connection. 

    Let me know if I should attach anything here or make a new post to help solve this issue. Thanks!

    Link to comment
    1 hour ago, SpaceInvader said:

    So after testing the fresh install I just created with the USB Creator  I got the exact same issue! The only thing I did was enable ipv4+ipv6 and ssh.

    diagnostics-fresh-install.zip

    I think we end up with same conclusion. At this point I wouldn't worry about ssh since I haven't enable ssh yet when my first crash happened.

    Link to comment
    26 minutes ago, sbihero said:

    I think we end up with same conclusion. At this point I wouldn't worry about ssh since I haven't enable ssh yet when my first crash happened.

     

    yeah, the only reason I enabled ssh was, to be able to use the diagnostics command and restart nginx, without a reboot, since that would loose the syslog.

    The fresh install I just tested also did not have any storage devices setup and not even activated the trial (so a bunch of stuff stayed disabled).

     

    --------------

    Big update! I just found that setting address assignment to static for ipv6 resolves the atd process reloading nginx and there are no more entries in the nginx error log.

    grafik.thumb.png.bb7009b1e833daedeca33c26e44c5174.png

    The address settings are unmodified from the suggested values.

     

    So this probably means there is an issue with the unraid dhcp for ipv6.

    Edited by SpaceInvader
    Link to comment
    1 hour ago, SpaceInvader said:

    So after testing the fresh install I just created with the USB Creator  I got the exact same issue!

     

    It is not exactly the same. In this fresh install NFS is not started (correct) because it isn't enabled, while your previous log showed NFS getting started (wrong) with lots of errors.

     

    It is really strange that from the moment nginx is started there are errors, can't explain that.

    IPv4 and IPv6 both look alright.

     

    Don't know if it is related to your ethernet controller card, which is a Realtek 2.5G version (but operating on 1G).

    If you have another NIC perhaps it is worth to try.

     

    Link to comment
    1 minute ago, bonienl said:

    Don't know if it is related to your ethernet controller card, which is a Realtek 2.5G version (but operating on 1G).

    If you have another NIC perhaps it is worth to try.

    In my other tests with the full os I actually used the 1Gig nic (pci card) to connect to the router. In my usual setup the 2.5Gbit is in bridge mode to my pc. But I tested both individually with the other disabled.

     

     I don't know, if you saw my last message yet, but it seems to be related to the dhcp function specifically.

    Link to comment
    34 minutes ago, SpaceInvader said:

    Big update! I just found that setting address assignment to static for ipv6 resolves the atd process reloading nginx and there are no more entries in the nginx error log.

     

    Wow, great find. Let me digest this and see if I can come with a possible solution.

     

    Quote

     I don't know, if you saw my last message yet, but it seems to be related to the dhcp function specifically.

     

    Maybe, maybe not, but at least we have a pointer to work on.

    Thx

     

    Link to comment
    10 minutes ago, bonienl said:

    @SpaceInvader

    Question: when you change back to DHCP assignment, does the problem come back too?

     

    yes it comes back and when setting it to auto again it goes away again.

    I attached a log, where I switch from static to automatic at 0:47, which results in the nginx errors again and then switch back at 0:49.

     

    nas-diagnostics-20230718-0050.zip

    Link to comment

    I just had my server go unresponsive.  Things had been going fine for several hours, then I started the Jellyfin container and the whole server went tango uniform.  As usual for me when my server is in this state there's no way to recover the diagnostics, but I did capture this on my syslog server:

    Jul 17 20:00:57	unRAID	kern	info	kernel	docker0: port 2(veth52713cd) entered blocking state
    Jul 17 20:00:57	unRAID	kern	info	kernel	docker0: port 2(veth52713cd) entered disabled state
    Jul 17 20:00:57	unRAID	kern	info	kernel	device veth52713cd entered promiscuous mode
    Jul 17 20:01:07	unRAID	kern	info	kernel	eth0: renamed from vethb8bbec1
    Jul 17 20:01:07	unRAID	kern	info	kernel	IPv6: ADDRCONF(NETDEV_CHANGE): veth52713cd: link becomes ready
    Jul 17 20:01:07	unRAID	kern	info	kernel	docker0: port 2(veth52713cd) entered blocking state
    Jul 17 20:01:07	unRAID	kern	info	kernel	docker0: port 2(veth52713cd) entered forwarding state
    Jul 17 20:01:16	unRAID	daemon	warning	php-fpm[7233]	[WARNING] [pool www] child 30889 exited on signal 9 (SIGKILL) after 43.290647 seconds from start
    Jul 17 20:01:18	unRAID	daemon	warning	php-fpm[7233]	[WARNING] [pool www] child 30890 exited on signal 9 (SIGKILL) after 45.306223 seconds from start
    Jul 17 20:01:20	unRAID	daemon	warning	php-fpm[7233]	[WARNING] [pool www] child 30891 exited on signal 9 (SIGKILL) after 47.270700 seconds from start

     

     

    When I get it back up I'll either disable IPv6 entirely or follow SpaceInvader's instructions above and see if it resolves things for me.

     

    -edit

     

    Turns out I already had it set to IPv4 only.  I tried changing it to IPv4+IPv6 and setting the IPv6 to Static using the settings in SpaceInvader's post but then I got an error starting Docker about 2a02:908:1060:c0::7c31 being an invalid address.  So I was unable to try that.

    Edited by stanger89
    Link to comment
    19 hours ago, stanger89 said:

    Turns out I already had it set to IPv4 only.  I tried changing it to IPv4+IPv6 and setting the IPv6 to Static using the settings in SpaceInvader's post but then I got an error starting Docker about 2a02:908:1060:c0::7c31 being an invalid address.  So I was unable to try that.

    You can't set your IPv6 address to the same one in someone else's post.  Set your box to "automatic assignment"

    Link to comment

    Unfortunately I've got IPv6 disabled on my router too, so automatic assignment didn't populate with anything.  I've reverted back to 6.11.5 until all this gets sorted out.

    Link to comment

    I'm on 6.12.3 and I still get this behavior

     

    Jul 19 03:37:45 trantor nginx: 2023/07/19 03:37:45 [alert] 30307#30307: worker process 6589 exited on signal 6
    Jul 19 03:37:46 trantor nginx: 2023/07/19 03:37:46 [alert] 30307#30307: worker process 6769 exited on signal 6
    Jul 19 03:37:48 trantor nginx: 2023/07/19 03:37:48 [alert] 30307#30307: worker process 6819 exited on signal 6
    Jul 19 03:37:50 trantor nginx: 2023/07/19 03:37:50 [alert] 30307#30307: worker process 6888 exited on signal 6
    Jul 19 03:37:55 trantor nginx: 2023/07/19 03:37:55 [alert] 30307#30307: worker process 7004 exited on signal 6
    Jul 19 03:37:58 trantor nginx: 2023/07/19 03:37:58 [alert] 30307#30307: worker process 7256 exited on signal 6

     

    Starting a cloudflared docker seems to trigger this, but it doesn't get fixed immediately after stopping the cloudflared docker container either.

    Link to comment

    Since setting the IPv6 config to static it has been stable! The nfs service also stopped attempting to start.
    Unfortunately I can't keep it static forever, since my provider semi-regularly changes the assigned prefix.

     

    I think the issue stanger89 has is different, since this did not occur for me.

    On 7/18/2023 at 3:19 AM, stanger89 said:
    Jul 17 20:01:16	unRAID	daemon	warning	php-fpm[7233]	[WARNING] [pool www] child 30889 exited on signal 9 (SIGKILL) after 43.290647 seconds from start

    as opposed to these messages, when it crashes for me

    Quote

    [alert] 5835#5835: worker process 25751 exited on signal 6
    [alert] 5835#5835: shared memory zone "memstore" was locked by 25751

    also only the webui is unresponsive. SSH and other services still work after the webui crashes.

    Link to comment

    After upgrade 6.12.3, still have problems about Nginx and ipv6. I felt tired and decided to give up. I will use TrueNas rebuild my system on this weekend.

    • Like 1
    Link to comment
    On 7/20/2023 at 1:48 AM, Beermedlar said:

    After upgrade 6.12.3, still have problems about Nginx and ipv6. I felt tired and decided to give up. I will use TrueNas rebuild my system on this weekend.

    I've done the same, my patience has been exhausted. Now running Version: 6.11.5 - so far so good.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.