Feb 5 11:01:35 mail named[994]: error (host unreachable) resolving 'sbs-music.com/NS/IN': 50.56.243.69#53In essence, name server for this domain (50.56.243.69) is unreachable from the DNS server used by mail server. Trying to manually query the server, I get:
$ host -t ns sbs-music.comNote the status, it's SERVFAIL. The result is that mail server thinks it is a temporary error and retries later, with the same results. Trying this on another host (that uses another DNS server) I get:
Host sbs-music.com not found: 2(SERVFAIL)
$ host -t ns sbs-music.comWell, this time it tells me that there is no such domain. An error message like this would tell mail server to give up and return error response.
Host sbs-music.com not found: 3(NXDOMAIN)
So, why is there discrepancy between the two? Using tcpdump in the first case (i.e. when we get SERVFAIL) the following requests/responses are exchanged (slightly edited for readability):
192.168.x.y.51892 > 206.72.97.238.53: 39504 [1au] NS? sbs-music.com. (42)So, let me interpret this trace. The first query is to IP address 206.72.97.238 and it asks for the name server of a domain sbs-music.com. Doing reverse DNS query, we get:
206.72.97.238.53 > 192.168.x.y.51892: 39504 4/0/5 NS ns4.shepherdhosting.com., NS ns1.shepherdhosting.com., NS ns2.shepherdhosting.com., NS ns3.shepherdhosting.com. (194)
192.168.x.y.10749 > 50.56.243.69.53: 40104 [1au] NS? sbs-music.com. (42)
timeout
192.168.x.y.63081 > 206.72.97.238.53: 56636 [1au] NS? sbs-music.com. (42)
206.72.97.238.53 > 192.168.x.y.63081: 56636 4/0/5 NS ns1.shepherdhosting.com., NS ns2.shepherdhosting.com., NS ns3.shepherdhosting.com., NS ns4.shepherdhosting.com. (194)
192.168.x.y.31948 > 50.56.243.69.53: 27220 [1au] NS? sbs-music.com. (42)
timeout
# host 206.72.97.238So, it's some hosting provider. Now, what we get in response is that name servers for that domain are ns1.shepherdhosting.com through ns4.shepherdhosting.com. Ok, our DNS server choose ns4.shepherdhosting.com with IP address 50.56.243.69. Then, it queried it for sbs-music.com domain. This time the query timed out. So, our DNS server decided to query again 206.72.97.238 for name servers. It again received the same list and then it again queried ns4 which didn't answer the query.
238.97.72.206.in-addr.arpa domain name pointer sh214.shepherdhosting.com.
Let us manually try some other server. So, querying ns1.shepherdhosting.com we get:
# host -t ns sbs-music.com. 206.72.97.238This is the same response we saw in the first part of the trace. Trying ns2:
Using domain server:
Name: 206.72.97.238
Address: 206.72.97.238#53
Aliases:
sbs-music.com name server ns2.shepherdhosting.com.
sbs-music.com name server ns3.shepherdhosting.com.
sbs-music.com name server ns4.shepherdhosting.com.
sbs-music.com name server ns1.shepherdhosting.com.
$ host -t ns sbs-music.com. 206.72.100.134Well, ns2 doesn't respond. Neither do ns3, nor as was always obvious, ns4. What does the trace looks like (again, slightly edited):
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
12:20:22.532877 IP 192.168.x.y.34364 > 206.72.100.134.53: 31539+ NS? sbs-music.com. (31)Now, we have interesting situation here. First, DNS server for sbs-music.com takes significant time to answer, and when it answers our local DNS isn't listening any more (thus those ICMP error messages). But, in the end local DNS concludes correctly that something's wrong with the name servers for that domain.
12:20:27.532864 IP 192.168.x.y.34364 > 206.72.100.134.53: 31539+ NS? sbs-music.com. (31)
12:20:32.533445 IP 192.168.x.y.45322 > 206.72.100.134.53: 5827+ NS? sbs-music.com. (31)
12:20:52.733389 IP 206.72.100.134.53 > 192.168.x.y.34364: 31539 ServFail 0/0/0 (31)
12:20:52.733410 IP 192.168.x.y > 206.72.100.134: ICMP 192.168.x.y udp port 34364 unreachable, length 67
12:20:52.734042 IP 206.72.100.134.53 > 192.168.x.y.45322: 5827 ServFail 0/0/0 (31)
12:20:52.734053 IP 192.168.x.y > 206.72.100.134: ICMP 192.168.x.y udp port 45322 unreachable, length 67
The final piece of puzzle comes from querying com name server for sbs-music.com domain:
$ host -t ns sbs-music.com l.gtld-servers.net.Obviously, this domain was removed. It is clear now that this sbs-music.com domain existed for some time, and DNS server that produces error cached IP address of its domain name. If it were to query com domain name server again, it would receive NXDOMAIN error and properly notify mail server.
Using domain server:
Name: l.gtld-servers.net.
Address: 192.41.162.30#53
Aliases:
sbs-music.com has no NS record
To see currently cached entries of BIND name server use the following command:
rndc dumpdbThen look for file cache_dump.db in /var/named/data (or in /var/named/chroot/var/named/data if you are running BIND in chroot). It is a textual file that you can inspect with text editor, less or something similar. In my case there were the following lines there:
; glue
sbs-music.com. 172669 NS ns1.shepherdhosting.com.
172669 NS ns2.shepherdhosting.com.
172669 NS ns3.shepherdhosting.com.
172669 NS ns4.shepherdhosting.com.
To flush a single entry use the following command
rndc flushname sbs-music.com internalThis removes sbs-music.com from caches in internal view (I configured viewes so that server behaves differently depending on who asks it). Yet, this didn't help. Then I tried to flush everything in internal view using:
rndc flush internalBut this, while helped, didn't actually solve the problem. Namely, looking into packet trace it turns out that BIND server receives from b.gtld-servers.net. that given domain doesn't exist and from somewhere it pulls the old IP address 206.72.97.238?!
So, I finally decided to look into log and there are a lot of the following messages:
error (FORMERR) resolving 'sbs-music.com/NS/IN': 206.72.97.238#53along with the one that triggered all this. Ok, I also tried to upgrade bind, but no luck, still SERVFAIL errors.
Now, its time for heavy artillery, or Wireshark. So I saved packet trace and loaded it into Wireshark. Guess what! Wireshark crashed on requests sent by local DNS server!?
Ok, after a bit more fiddling I realised that some com domain name servers do know for this domain. But note the difference between the output of host command (that I've used previously) and nslookup command:
$ nslookup -type=ns sbs-music.com 192.5.6.30Ok, if this com server tells me that there is no this domain, why is then it pointing me to those nameservers? dig command is a bit more informative:
Server: 192.5.6.30
Address: 192.5.6.30#53
Non-authoritative answer:
*** Can't find sbs-music.com: No answer
Authoritative answers can be found from:
sbs-music.com nameserver = ns1.shepherdhosting.com.
sbs-music.com nameserver = ns2.shepherdhosting.com.
sbs-music.com nameserver = ns3.shepherdhosting.com.
sbs-music.com nameserver = ns4.shepherdhosting.com.
ns1.shepherdhosting.com internet address = 206.72.97.238
ns2.shepherdhosting.com internet address = 206.72.100.134
ns3.shepherdhosting.com internet address = 206.72.97.237
ns4.shepherdhosting.com internet address = 50.56.243.69
$ dig @192.5.6.30 sbs-music.comWhat a mess?!
; <<>> DiG 9.9.2-P1-RedHat-9.9.2-6.P1.fc18 <<>> @192.5.6.30 sbs-music.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64382
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 5
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;sbs-music.com. IN A
;; AUTHORITY SECTION:
sbs-music.com. 172800 IN NS ns1.shepherdhosting.com.
sbs-music.com. 172800 IN NS ns2.shepherdhosting.com.
sbs-music.com. 172800 IN NS ns3.shepherdhosting.com.
sbs-music.com. 172800 IN NS ns4.shepherdhosting.com.
;; ADDITIONAL SECTION:
ns1.shepherdhosting.com. 172800 IN A 206.72.97.238
ns2.shepherdhosting.com. 172800 IN A 206.72.100.134
ns3.shepherdhosting.com. 172800 IN A 206.72.97.237
ns4.shepherdhosting.com. 172800 IN A 50.56.243.69
xxx
;; Query time: 149 msec
;; SERVER: 192.5.6.30#53(192.5.6.30)
;; WHEN: Tue Feb 5 13:49:22 2013
;; MSG SIZE rcvd: 194->
Then, I googled a bit to find why BIND is returning SERVFAIL instead of NXDOMAIN. This is something interesting that I found:
- BIND could have returned SERVFAIL instead of NXDOMAIN responses for nonexistent resource records from the unsigned child zone if the parent zone was signed. (BZ#643012)
Looking at how different BIND versions behave I get the following results:
- bind-9.3.6-20.P1.el5_8.6 and bind-9.9.2-6.P1.fc18.x86_64 return NXDOMAIN.
- bind-9.8.2-0.10.rc1.el6_3.6.x86_64 and bind-9.8.2-0.10.rc1.el6_3.5.x86_64 return SERVFAIL.
Ok, let me conclude. The problem is that name servers for domain sbs-music.com aren't correctly configured, while the domain itself is registered with com domain servers. This triggers different behavior from BIND. So, there are two possibilities from here:
- Persuade somehow BIND to return NXDOMAIN instead of SERVFAIL.
- Find what is causing queries for this domain in the first place.
No comments:
Post a Comment