Random loss of DNS


Go to solution Solved by weirdcrap,

Recommended Posts

Unraid v6.10-rc5

 

Diagnostics are from a fresh boot so my previous syslog is also attached:

node-diagnostics-20220505-0737.zip

syslog-20220505-073254.txt

 

This issue is not new for me in RC5, it's happened before but its been years since it last occurred. It's now happened twice in the last week. DNS is statically configured to use Google, Cloudflare, & Quad9.

 

Out of the blue my server will lose the ability to resolve any host names at all. I don't realize it's happened until I log into Radarr/sonarr and see errors about all my indexers, download clients, literally anything requiring a DNS lookup being unreachable.

 

I console in and confirm that I can't ping any hostname with a "name or service unknown" error. Pinging internal and external IP addresses works fine. I don't see anything in the syslog to indicate what the issue might be.

 

I can usually fix it with a restart but its annoying to have to reboot the entire server just to get DNS back up. I've tried just stopping the array, shuffling the DNS servers, and hitting apply in hopes of reviving DNS but it generally doesn't work.

 

Is there a way to roll the UnRAID DNS service without having to reboot the entire machine? Or at least a way to verify that the DNS resolver is still running when this issue occurs?

If the DNS service is up and running in UnRAID I may have to investigate issues on the LAN but no other client seems to experience this issue except for my server.

 

Besides the loss of DNS networking seems entirely unaffected. My Wireguard VPN continues to work, sonarr and radarr are still available remotely, I can access shares and the webui.

Edited by weirdcrap
Link to comment
20 hours ago, weirdcrap said:

Or at least a way to verify that the DNS resolver is still running when this issue occurs?

 

You can use nslookup to troubleshoot

 

i.e. nslookup www.google.com x.x.x.x

 

x.x.x.x can be actual IP of DNS server or relay ( router, private DNS etc )

Link to comment
On 5/6/2022 at 4:25 AM, Vr2Io said:

 

You can use nslookup to troubleshoot

 

i.e. nslookup www.google.com x.x.x.x

 

x.x.x.x can be actual IP of DNS server or relay ( router, private DNS etc )

Duh why didn't I think of nslookup. I was wanting to use dig but the BIND package seems to have disappeared from NerdPack. I was also hoping for a way to check the literal status of the service, something equivalent to systemctl status ServiceName (I'm a debian guy mostly).

 

I haven't lost DNS again so far. I also switched my first DNS server to my router rather than an external server. Saw some threads here about DNS issues being resolved doing that so I figured why not.

Edited by weirdcrap
Link to comment
  • 3 weeks later...

Alright so this finally happened again. Now on v6.10.1


I ran nslookup against google.com and it returned a proper answer from my LAN router as well as 8.8.8.8:

root@Node:~# ping google.com
ping: google.com: Name or service not known
root@Node:~# nslookup google.com
Server:         192.168.20.254
Address:        192.168.20.254#53

Non-authoritative answer:
Name:   google.com
Address: 172.217.4.206
Name:   google.com
Address: 2607:f8b0:4009:806::200e



root@Node:~# nslookup google.com 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   google.com
Address: 172.217.0.174
Name:   google.com
Address: 2607:f8b0:4009:808::200e

root@Node:~# nslookup google.com 192.168.20.254
Server:         192.168.20.254
Address:        192.168.20.254#53

Non-authoritative answer:
Name:   google.com
Address: 172.217.4.206
Name:   google.com
Address: 2607:f8b0:4009:806::200e

root@Node:~# 

 

So then wtf is causing this?

 

Sonarr/Radarr can't reach any of my external resources, I can't ping domain names from the terminal, nothing DNS related seems to work yet nslookup seems to suggest DNS is fine?

Edited by weirdcrap
Link to comment
  • 3 weeks later...
1 hour ago, weirdcrap said:

nslookup returns responses

 

Do you success ping the gateway ( router ) and internet by IP ?

 

I haven't such problem so, can't reproduce, may be ref. below post

 

https://superuser.com/questions/495759/why-is-ping-unable-to-resolve-a-name-when-nslookup-works-fine

 

Suspect it is your DNS sever problem, may be try use public DNS server for troubleshoot

Edited by Vr2Io
Link to comment
19 hours ago, Vr2Io said:

 

Do you success ping the gateway ( router ) and internet by IP ?

When this happens I can reach the gateway and internet, but by IP only. I can SSH from the affected unraid server into the gateway by IP, and I can ping outside servers like 8.8.8.8. For all intents and purposes the internet is working, just not name resolution

Quote

Suspect it is your DNS sever problem, may be try use public DNS server for troubleshoot

UnRAID was set to use only external DNS servers (8.8.8.8, 1.1.1.1, 9.9.9.9) when this started occurring. I've tried switching to my router's DNS server for troubleshooting but doing so made no difference. 

 

I looked at that link you've posted but I don't really know how UnRAID's DNS system works. Does nslookup use a different lookup method vs ping in UnRAID? That might explain why nslookup works while a simple ping to an external domain does not.

Edited by weirdcrap
Link to comment
  • 1 month later...

Just lost DNS again. Still no idea why NSLookup works but things like ping, checking for plugin/docker updates, etc all fail.

 

Interestingly, i regained DNS functionality without having to restart or do anything other than wait this time. Very strange...

Edited by weirdcrap
Link to comment
57 minutes ago, Spike87 said:

Hello,

 

i have nearly the same problem. From time to time most of my UnRAID Server is not able to connect to the internet.

SSH into UnRAID and ping or nslookup a WAN IP/DNS Address is successful, but not checking for plugin/docker updates, no access to Home Assistant Docker in Host Mode.

Restarting the Docker Service resolves the problem.

 

Any Ideas?

nas-diagnostics-20220725-2108.zip 143.25 kB · 0 downloads

Unfortunately no, I'm at a loss with my own issue already lol. I'd take a look at what has already been suggested in this thread to see if you can narrow your issue down further.

 

So when you lose connectivity your still able to resolve hostnames via ping? I'm not able to resolve any hostnames at all EXCEPT with NSLookup. It seems to be the only thing still capable of resolving domain names when I run into whatever is causing this issue...

 

This doesn't seem to be a common problem, I've only found a few threads on it that sound similar enough to my issue and unfortunately none of them have yielded any further clues as to what's wrong.

 

This server is getting rebuilt with all new hardware next month so I'm trying to just keep it coasting until then, see if the new hardware magically resolves any of the issues I've been having.

Edited by weirdcrap
Link to comment
On 7/26/2022 at 4:40 AM, weirdcrap said:

This server is getting rebuilt with all new hardware next month so I'm trying to just keep it coasting until then, see if the new hardware magically resolves any of the issues I've been having.

Problem doesn't look like any hardware relate, anyway pls try two thing

 

1. Ping -4 google.com success or not

2. Disable Netbios in SMB setting

 

image.png.a46de4e60e2c5b8e99b4ce59e4999970.png

Link to comment

For anyone that's having this problem, do this in your console:

cat /etc/resolv.conf

 

If you are having problem, then the console will not give any output. This is should not be happening if you have DNS servers set under your Network settings.

 

After some investigation, I found that even if I manually edited this file to have the nameservers I want, as soon as I turn on Docker, "resolv.conf" gets cleared IMMEDIATELY and no further edit can be done on this file without it being cleared by Docker/System again.

 

I am having this problem CONSISTENTLY and Unraid is basically unusable at this point.

 

Relating to this, DHCP works very wonky too.

 

Wish I can revert back to a version that's not plagued by so many network problems.

 

Unraid staff if you can see this, please prioritize this fix.

Link to comment

@Quick_FOXmy resolv.conf is intact with docker running.

 

@SquidI can give the RC a try. It's difficult for me to diagnose as it seems to happen randomly. Maybe once a month or so.

 

I will also disable NETBIOS as suggested.

 

EDIT: Waiting on trying the RC until NerdPack gets updated as I need several of the tools it offers.

Edited by weirdcrap
Link to comment
  • 2 months later...

I'm still having issues with this. Netbios is off and I'm on the latest 6.11.0

 

ping -4 google.com worked but so did ping google.com immediately after so I assume it's just the issue fixed itself again. It doesn't seem to be as permanent as it used to be. If I just wait 5 minutes DNS will just start working again.

Edited by weirdcrap
Link to comment

When DNS goes out ping -4 google.com does NOT work. I've still got no clue what's causing this.

 

root@Node:~# nslookup google.com
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   google.com
Address: 142.251.32.14
Name:   google.com
Address: 2607:f8b0:4009:80a::200e

root@Node:~# ping -4 google.com
ping: google.com: Name or service not known
root@Node:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
From 192.168.20.254 icmp_seq=1 Destination Host Unreachable
64 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=12.7 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=59 time=12.5 ms
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, +1 errors, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 12.488/12.617/12.746/0.129 ms
root@Node:~# ping -4 google.com
ping: google.com: Name or service not known
root@Node:~# ping -4 google.com
ping: google.com: Name or service not known

 

Link to comment
29 minutes ago, Vr2Io said:

Main network have change from 192.168.1.x to 192168.20.x ?

 

You have two Unraid "node" "void", does same problem reproduce in both ?

That destination host unreachable line is a new symptom, it was never there before I upgraded to 6.11

 

NODE is on 192.168.20.x, it's router is at 192.168.20.254

 

VOID is on 192.168.1.x

 

I do not have any issues with DNS on VOID. It's an entirely separate network with its own ISP, router, switches, etc.

Edited by weirdcrap
Link to comment

In your 1st post, diagnostic name was NODE with IP 192.168.1.x, anyway this seems some confuse.

 

15 minutes ago, weirdcrap said:

It's an entirely separate network with its own ISP, router, switches, etc.

 

Could you also try put the problem free network's router to verify router related, just prove this. Or simple put the trouble server to that network to verify.

Edited by Vr2Io
Link to comment
25 minutes ago, Vr2Io said:

In your 1st post, diagnostic name was NODE with IP 192.168.1.x, anyway this seems some confuse.

This is incorrect, NODE has never been on 192.168.1.x. I just double checked the first diagnostic file I posted to be sure. From Network.cfg:

IPADDR[0]="192.168.20.249"
NETMASK[0]="255.255.255.0"
GATEWAY[0]="192.168.20.254"
DNS_SERVER1="8.8.8.8"
DNS_SERVER2="1.1.1.1"
DNS_SERVER3="9.9.9.9"

 

25 minutes ago, Vr2Io said:

Could you also try put the problem free network's router to verify router related, just prove this.

Unfortunately this is not possible. NODE is hosted at a friend's IT business 4 hours away (they've got fiber, I don't). I can't ask them to tear down their business network just for me to troubleshoot my plex server. Completely swapping out their router is out of the question.

 

I can see about procuring my own router to send over there and splitting the fiber demarc between their router and my own. But that will take some time to get set up if they'll allow it.

 

In the meantime I can share what I know about the network:

 

The router is a Zyxel USG110. None of the USG services (IDS, AV, Content FIlter, etc) are enabled so they shouldn't be causing me any grief here. I'm in the business's internal VLAN and have my own public static IP. The router doesn't handle DHCP/DNS directly for this VLAN, that's done by a Windows AD server. I don't pull DHCP or DNS from their AD server anyway since I've got my IP and DNS servers statically set in UnRAID.

 

I've asked everyone who works there on multiple occasions and no one else has DNS resolution issues, just me. I am the only linux user in their network though.

Edited by weirdcrap
Link to comment
17 minutes ago, weirdcrap said:

I just double checked the first diagnostic file I posted to be sure. From Network.cfg:

I should be wrongly open other one diagnostic file.

 

20 minutes ago, weirdcrap said:

Unfortunately this is not possible. NODE is hosted at a friend's IT business 4 hours away (they've got fiber, I don't). I can't ask them to tear down their business network just for me to troubleshoot my plex server. Completely swapping out their router is out of the question.

 

I can see about procuring my own router to send over there and splitting the fiber demarc between their router and my own. But that will take some time to get set up if they'll allow it.

 

In the meantime I can share what I know about the network:

 

The router is a Zyxel USG110. None of the USG services (IDS, AV, Content FIlter, etc) are enabled so they shouldn't be causing me any grief here. I'm in the business's internal VLAN and have my own public static IP. The router doesn't handle DHCP/DNS directly for this VLAN, that's done by a Windows AD server. I don't pull DHCP or DNS from their AD server anyway since I've got my IP and DNS servers statically set in UnRAID.

 

I've asked everyone who works there on multiple occasions and no one else has DNS resolution issues, just me. I am the only linux user in their network though.

Well note.

Link to comment
9 minutes ago, Vr2Io said:

I should be wrongly open other one diagnostic file.

 

Well note.

The USG110 is a few versions behind on firmware, I've asked them to update it if possible (assuming it isn't held back for a reason).

 

If that doesn't fix it I'll move forward with getting my own router installed (if possible) unless anyone has any other ideas.

Edited by weirdcrap
Link to comment
Just now, weirdcrap said:

If that doesn't fix it I'll move forward with getting my own router installed (ifpossible) unless someone else thinks of something.

Agree, I also think it could be network relate ( Router ), btw I never have DNS problem with Unraid, but this can't prove problem not on Unraid too.

Link to comment
  • 2 weeks later...

Even with the router on the latest firmware I have yet again lost DNS but nslookup continues to work.

 

image.png.2a81caadc6e88378247046b10a3733a5.png

 

I'm going to have to install my own router apparently as I don't know what else to do to troubleshoot this. I'm sick and tired of my server just randomly losing the ability to resolve ANY DNS names via normal means.

Edited by weirdcrap
Link to comment

So, when this issue occurs, you can perfectly ping google.com from the host, but the docker containers lose access to the web?
When this occurs, could you do 

docker exec -it containername /bin/bash -c 'cat /etc/resolv.conf'

for your failing containers.

 

edit: nevermind. Saw your router issue.
You're not running an openwrt with your own dhcpd & bind/named?

Edited by Osiris
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.