• DNS resolution stopping on 6.12.0-rc5 and rc6


    Kaldek
    • Annoyance

    This issue appears to be occurring every few days.  I can't ping any hostnames from the CLI, and /etc/resolv.conf is blank.  I do not use DHCP for the server address, and my DNS servers are statically assigned.

     

    In addition, when it happens I am unable to reboot the server as the shares will never unmount as it constantly tells me /mnt/cache is busy.  There are definitely no clients holding shares open when this happens.

     

    I have attached diagnostics of when the server is working, and will attach again when it next fails.  I have made one change today after the last failure, and that was to disable IPv6.  My dual stack ISP connection isn't always the best when it comes to IPv6 working all the time, so I've disabled IPv6 to see if that helps, since this seems to mainly be a network issue.

     

    unraid-diagnostics-20230521-1953.zip




    User Feedback

    Recommended Comments



    Hi. I am also seeing this. The most obvious way to see this is that Plugins are saying status "not available" as well as Dockers' versions saying "not available".

     

    I then went to the console and got:

    root@Fenrir:~# ping google.com
    ping: google.com: Name or service not known
    root@Fenrir:~# 

     

    Other systems on the same network are not having DNS resolution problems.

     

    I don't know if this is related, but this seemed to start AFTER I installed my 1650 Super and the Nvidia Driver Plugin from @ich77.

     

    Edit: The diagnostics was pulled while DNS was failing to resolve. I will try to leave it in this state for a bit. Dockers seem to be working fine, so I will leave them running.

     

     

    fenrir-diagnostics-20230525-1001.zip

    Edited by nblom
    Link to comment

    @nblom Try giving it static DNS servers in Settings - Network Settings.  I'd suggest 208.67.222.222 and 208.67.220.220

     

    @Kaldek Don't use the router as DNS #1.  Use it as #3 and your google ones as 1 & 2

    Link to comment

    @Squid So you want me to stop docker and everything?

    Currently I have a static lease assigned by my router. I use my DNS server from the router as a way to be able to have local DNS resolution.

     

    I can try giving it a straight up static IP.

    Link to comment

    A lot of routers are terrible at handling DNS requests.  But yes to make the change, docker has to be stopped.

    Link to comment
    Just now, Squid said:

    A lot of routers are terrible at handling DNS requests.  But yes to make the change, docker has to be stopped.

    Understood.

     

    I am using Opnsense as my router with DNS setup as the following:

    Adguard Home on the router on port 53 --> unbound recursive on port 5353.

     

    So far this is the only issue I have seen, which started between rc5 and rc6.

    Link to comment

    @Squid

    I just attempted to stop docker and it failed. I then went to change Network Settings and the UI still thinks docker is running.

     

    Got this in the logs:

    May 25 10:41:55 Fenrir avahi-daemon[29851]: Service "Fenrir" (/services/sftp-ssh.service) successfully established.
    May 25 10:41:56 Fenrir root: waiting for docker to die ...
    May 25 10:41:57 Fenrir root: waiting for docker to die ...
    May 25 10:41:58 Fenrir root: waiting for docker to die ...
    May 25 10:41:59 Fenrir root: waiting for docker to die ...
    May 25 10:42:00 Fenrir root: waiting for docker to die ...
    May 25 10:42:01 Fenrir root: waiting for docker to die ...
    May 25 10:42:02 Fenrir root: waiting for docker to die ...
    May 25 10:42:03 Fenrir root: waiting for docker to die ...
    May 25 10:42:04 Fenrir root: waiting for docker to die ...
    May 25 10:42:05 Fenrir root: waiting for docker to die ...
    May 25 10:42:06 Fenrir root: waiting for docker to die ...
    May 25 10:42:07 Fenrir root: waiting for docker to die ...
    May 25 10:42:08 Fenrir root: waiting for docker to die ...
    May 25 10:42:09 Fenrir root: waiting for docker to die ...
    May 25 10:42:10 Fenrir root: docker will not die!
    May 25 10:42:10 Fenrir emhttpd: shcmd (419): exit status: 1
    May 25 10:42:10 Fenrir emhttpd: shcmd (420): umount /var/lib/docker
    May 25 10:42:16 Fenrir nmbd[29797]: [2023/05/25 10:42:16.966525,  0] ../../source3/nmbd/nmbd_become_lmb.c:398(become_local_master_stage2)
    May 25 10:42:16 Fenrir nmbd[29797]:   *****
    May 25 10:42:16 Fenrir nmbd[29797]:   
    May 25 10:42:16 Fenrir nmbd[29797]:   Samba name server FENRIR is now a local master browser for workgroup WORKGROUP on subnet 172.28.0.15
    May 25 10:42:16 Fenrir nmbd[29797]:   
    May 25 10:42:16 Fenrir nmbd[29797]:   *****

     

    fenrir-diagnostics-20230525-1044.zip

    Link to comment

    adding something more here.

    Looks like local dns resolution stopped working:
     

    root@Fenrir:~# nslookup google.com
    ;; communications error to ::1#53: connection refused
    ;; communications error to ::1#53: connection refused
    ;; communications error to ::1#53: connection refused
    ;; communications error to 127.0.0.1#53: connection refused
    ;; no servers could be reached
    
    
    root@Fenrir:~# nslookup google.com 8.8.8.8
    Server:         8.8.8.8
    Address:        8.8.8.8#53
    
    Non-authoritative answer:
    Name:   google.com
    Address: 74.125.136.102
    Name:   google.com
    Address: 74.125.136.113
    Name:   google.com
    Address: 74.125.136.138
    Name:   google.com
    Address: 74.125.136.100
    Name:   google.com
    Address: 74.125.136.139
    Name:   google.com
    Address: 74.125.136.101
    Name:   google.com
    Address: 2607:f8b0:4002:c09::64
    Name:   google.com
    Address: 2607:f8b0:4002:c09::65
    Name:   google.com
    Address: 2607:f8b0:4002:c09::8a
    Name:   google.com
    Address: 2607:f8b0:4002:c09::8b
    
    root@Fenrir:~# nslookup google.com 
    ;; communications error to ::1#53: connection refused
    ;; communications error to ::1#53: connection refused
    ;; communications error to ::1#53: connection refused
    ;; communications error to 127.0.0.1#53: connection refused
    ;; no servers could be reached
    
    
    root@Fenrir:~# 

     

    Link to comment
    21 minutes ago, nblom said:
    ay 25 10:42:09 Fenrir root: waiting for docker to die ...
    May 25 10:42:10 Fenrir root: docker will not die!

    This was a bug, fixed for v6.12-rc7, the docker still stopped, confirmed by the fact that it unmounted successfully, but it may cause other issues, like the GUI network issue.

    Link to comment

    I have rebooted my server and everything is back to "normal".

    I will wait and see if 6.12-rc7 resolves this.

    Link to comment
    7 hours ago, nblom said:

    @Squid

    I just attempted to stop docker and it failed. I then went to change Network Settings and the UI still thinks docker is running.

     

    Got this in the logs:

    May 25 10:41:55 Fenrir avahi-daemon[29851]: Service "Fenrir" (/services/sftp-ssh.service) successfully established.
    May 25 10:41:56 Fenrir root: waiting for docker to die ...
    May 25 10:41:57 Fenrir root: waiting for docker to die ...
    May 25 10:41:58 Fenrir root: waiting for docker to die ...
    May 25 10:41:59 Fenrir root: waiting for docker to die ...
    May 25 10:42:00 Fenrir root: waiting for docker to die ...
    May 25 10:42:01 Fenrir root: waiting for docker to die ...
    May 25 10:42:02 Fenrir root: waiting for docker to die ...
    May 25 10:42:03 Fenrir root: waiting for docker to die ...
    May 25 10:42:04 Fenrir root: waiting for docker to die ...
    May 25 10:42:05 Fenrir root: waiting for docker to die ...
    May 25 10:42:06 Fenrir root: waiting for docker to die ...
    May 25 10:42:07 Fenrir root: waiting for docker to die ...
    May 25 10:42:08 Fenrir root: waiting for docker to die ...
    May 25 10:42:09 Fenrir root: waiting for docker to die ...
    May 25 10:42:10 Fenrir root: docker will not die!
    May 25 10:42:10 Fenrir emhttpd: shcmd (419): exit status: 1
    May 25 10:42:10 Fenrir emhttpd: shcmd (420): umount /var/lib/docker
    May 25 10:42:16 Fenrir nmbd[29797]: [2023/05/25 10:42:16.966525,  0] ../../source3/nmbd/nmbd_become_lmb.c:398(become_local_master_stage2)
    May 25 10:42:16 Fenrir nmbd[29797]:   *****
    May 25 10:42:16 Fenrir nmbd[29797]:   
    May 25 10:42:16 Fenrir nmbd[29797]:   Samba name server FENRIR is now a local master browser for workgroup WORKGROUP on subnet 172.28.0.15
    May 25 10:42:16 Fenrir nmbd[29797]:   
    May 25 10:42:16 Fenrir nmbd[29797]:   *****

     

    fenrir-diagnostics-20230525-1044.zip

    I can confirm the same behaviour. I also tried to stop the array so I could change DNS settings and see if it forced DNS to come back, but it constantly said that Docker was still running.

    I'm not sure if I said this in my original post either, but it was also impossible to stop the array because Docker would never stop, and as a result it could not unmount the cache pool.

    Link to comment
    8 hours ago, Squid said:

    A lot of routers are terrible at handling DNS requests.  But yes to make the change, docker has to be stopped.

     

    I can say with some confidence this is not a DNS server issue for me.  

    1. Nothing changed except unRAID version
    2. unRAID was always set to use the router, followed by Google for DNS
    3. /etc/resolv.conf shows as blank when the problem occurs
    Link to comment

    FYI, it happened again. DNS resolution just *poof* gone. Server was running normally just like it has for the years, up until v6.12-rc6.

    Link to comment
    1 hour ago, nblom said:

    Reverted to RC5 to see if that fixes it. Thanks.


    It's been 7 days for me with no issues, and the only change I made was to disable IPv6.

    User JorgeB above mentioned there is an RC7 floating about that fixes a Docker Bug which could cause the issue. He must be a member of the team as he's a moderator.

    Link to comment

    Well, DNS kept going after having IPv6 disabled but Docker had a fit and lost its mind.  Server reboot required.

     

    I'll just reboot it every couple of days until RC7 is out.

    Link to comment
    5 hours ago, JorgeB said:

    There are some known issues with rc6 and DNS, try rc7 when available, it should fix those.

     

    Will do! Thanks.

    Link to comment

    I have the same issue and I am on 6.11.5 

    Rebooted server last night, DNS look up's working fine. Check this morning still OK, check now and its broken again. I have static IP for the server and static DNS. 

     

    Had this issue for a while but spent hours with numerous fixes nothing works. 

     

    Attached diagnostics if anyone has any ideas 🙂

    unraid-diagnostics-20230531-1713.zip

    • Like 1
    Link to comment

    I just had it happen again here too. I am back on v6.12-RC5 when it happened. My '/run' had filled up all 32MB.

    Turns out my plex container had filled up the logs with a bunch of stuff related to the Nvidia Drivers. I have now set the extra parameter of '--no-healthcheck' and that seems to have slowed the increase of the logs down.

     

    Edit:

    I don't think it is directly caused by the Nvida Runtime/Drivers, but more like the healthcheck lines went from "small" to "huge".

     

    Example of the new log line:
     

    {"level":"info","msg":"Running with config:\n{\n  \"AcceptEnvvarUnprivileged\": true,\n  \"NVIDIAContainerCLIConfig\": {\n    \"Root\": \"\"\n  },\n  \"NVIDIACTKConfig\": {\n    \"Path\": \"nvidia-ctk\"\n  },\n  \"NVIDIAContainerRuntimeConfig\": {\n    \"DebugFilePath\": \"/dev/null\",\n    \"LogLevel\": \"info\",\n    \"Runtimes\": [\n      \"docker-runc\",\n      \"runc\"\n    ],\n    \"Mode\": \"auto\",\n    \"Modes\": {\n      \"CSV\": {\n        \"MountSpecPath\": \"/etc/nvidia-container-runtime/host-files-for-container.d\"\n      },\n      \"CDI\": {\n        \"SpecDirs\": null,\n        \"DefaultKind\": \"nvidia.com/gpu\",\n        \"AnnotationPrefixes\": [\n          \"cdi.k8s.io/\"\n        ]\n      }\n    }\n  },\n  \"NVIDIAContainerRuntimeHookConfig\": {\n    \"SkipModeDetection\": false\n  }\n}","time":"2023-05-31T09:45:06-04:00"}
    {"level":"info","msg":"Using low-level runtime /usr/bin/runc","time":"2023-05-31T09:45:06-04:00"}

     

    Edited by nblom
    Link to comment

    OK disabling the docker service instantly fixes my issue. I was unable to ping google.com, disable the Docker service and I can ping fine. 

     

    I see some other threads referring to Host access to custom networks - I have disabled this option for now I think I had it for some other dockers I was testing that required it. Will see how it goes and report back, have enabled docker again with this option off and DNS is working ok for now. 

    Edited by witalit
    Link to comment
    6 hours ago, nblom said:

    I just had it happen again here too. I am back on v6.12-RC5 when it happened. My '/run' had filled up all 32MB.

    Turns out my plex container had filled up the logs with a bunch of stuff related to the Nvidia Drivers. I have now set the extra parameter of '--no-healthcheck' and that seems to have slowed the increase of the logs down.

     

    Concurred; mine had also just filled up with the same consumption caused by the log.json file used by Plex.

    Edited by Kaldek
    Link to comment
    On 5/31/2023 at 6:06 PM, witalit said:

    OK disabling the docker service instantly fixes my issue. I was unable to ping google.com, disable the Docker service and I can ping fine. 

     

    I see some other threads referring to Host access to custom networks - I have disabled this option for now I think I had it for some other dockers I was testing that required it. Will see how it goes and report back, have enabled docker again with this option off and DNS is working ok for now. 


    Not a single DNS issue since disabling 'Host access to custom networks' - Is this a known bug? 

     

    Location of setting:


    Settings > Docker

     

    image.thumb.png.e8af050c8b17fbd91244ac4962eb10df.png

    Link to comment

    Seems this is not only an issue under the 6.12 RC versions, but I have the same issue under version 6.11.5 since I updated my NVidia drivers to version: 525.116.04

    Before this version i had no problems and I did a reboot the first time due and that solved it, but 3 days later the issue is back and I can not find a clue on why.

     

    I did check my firewall (where DNS is running) but nothing strange there.

    root@Mindfix4:~# nslookup github.com
    ;; communications error to ::1#53: connection refused
    ;; communications error to ::1#53: connection refused
    ;; communications error to ::1#53: connection refused
    ;; communications error to 127.0.0.1#53: connection refused
    ;; no servers could be reached


    root@Mindfix4:~# nslookup github.com 192.168.66.1
    Server:         192.168.66.1
    Address:        192.168.66.1#53

    Non-authoritative answer:
    Name:   github.com
    Address: 140.82.121.4

     

    Added the diagnostics, maybe that helps in finding the problem.

     

     

    unraid-diagnostics-20230604-1950.zip

    Link to comment
    35 minutes ago, LMD said:

    Seems this is not only an issue under the 6.12 RC versions, but I have the same issue under version 6.11.5 since I updated my NVidia drivers to version: 525.116.04

    Before this version i had no problems and I did a reboot the first time due and that solved it, but 3 days later the issue is back and I can not find a clue on why.

     

    I did check my firewall (where DNS is running) but nothing strange there.

    root@Mindfix4:~# nslookup github.com
    ;; communications error to ::1#53: connection refused
    ;; communications error to ::1#53: connection refused
    ;; communications error to ::1#53: connection refused
    ;; communications error to 127.0.0.1#53: connection refused
    ;; no servers could be reached


    root@Mindfix4:~# nslookup github.com 192.168.66.1
    Server:         192.168.66.1
    Address:        192.168.66.1#53

    Non-authoritative answer:
    Name:   github.com
    Address: 140.82.121.4

     

    Added the diagnostics, maybe that helps in finding the problem.

     

     

    unraid-diagnostics-20230604-1950.zip

     

    Did you check the setting is enabled I mentioned above? 

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.