[19654] in Athena Bugs
Re: name resolution on dialups
daemon@ATHENA.MIT.EDU (Greg Hudson)
Sat Aug 18 14:30:14 2001
Message-Id: <200108181830.OAA19635@egyptian-gods.MIT.EDU>
To: John Carr <jfc@MIT.EDU>
cc: bugs@MIT.EDU, bug-dialup@MIT.EDU
In-Reply-To: Your message of "Sat, 18 Aug 2001 13:28:14 EDT."
<200108181728.NAA07094@contents-vnder-pressvre.mit.edu>
Date: Sat, 18 Aug 2001 14:30:10 -0400
From: Greg Hudson <ghudson@MIT.EDU>
Hi. Kolya has looked into this with a packet tracer. We believe we
have traced the problem to the following causes:
* Some domains use name servers not in the same domain as they
live in. For instance, if you query a GTLD server for
amelsrl.com, you are referred to ns1.cws.net and ns2.cws.net
as the authoritative servers.
* The BIND 8 named does not trust the glue records in the
additional section of such a response, because .net is a
different domain from .com and cache poisoning could result
if the .com servers were treated as credible for .net
records. (BIND 8 is not smart enough to use glue records
without caching them, nor smart enough to notice that the
".com server" it got the answer from was really a GTLD
server which is credible for all domains.) So BIND feels
the need to go out and look up ns1.cws.net or ns2.cws.net
(assuming it doesn't yet have cached records for those
hosts) before continuing a query for amelsrl.com.
* When named does a glue query like this, it drops the
original query on the floor, hoping that the client will
retransmit after named has the glue records in its cache.
* When you ask a resolver library for "foo.bar", it will first
try "foo.bar." and then "foo.bar.mit.edu.", assuming that
ndots is not set in resolv.conf to 1 or less (where 1 is the
number of dots in foo.bar).
* Here is the new part: the Solaris 8 resolver interleaves the
queries for "foo.bar.mit.edu." with the queries for
"foo.bar.". So before retransmitting the query for
"foo.bar.", the resolver will look up "foo.bar.mit.edu.",
get an NXDOMAIN, and return that result to the user.
Quite the comedy of errors. BIND 9 might improve the BIND part of
this lossage (dnscache definitely would, but it has license issues),
but we are not prepared to replace the caching resolver at this point
in the release (except by making small changes). We have one idea for
solving this problem on the Solaris resolver side without source code
modification (setting ndots to 1 or 2), but it's not a great answer--a
value of 1 might piss off users used to using "foo.lcs" as hostnames,
and a value of 2 won't help for domains with only one dot.