[19654] in Athena Bugs

home help back first fref pref prev next nref lref last post

Re: name resolution on dialups

daemon@ATHENA.MIT.EDU (Greg Hudson)
Sat Aug 18 14:30:14 2001

Message-Id: <200108181830.OAA19635@egyptian-gods.MIT.EDU>
To: John Carr <jfc@MIT.EDU>
cc: bugs@MIT.EDU, bug-dialup@MIT.EDU
In-Reply-To: Your message of "Sat, 18 Aug 2001 13:28:14 EDT."
             <200108181728.NAA07094@contents-vnder-pressvre.mit.edu> 
Date: Sat, 18 Aug 2001 14:30:10 -0400
From: Greg Hudson <ghudson@MIT.EDU>

Hi.  Kolya has looked into this with a packet tracer.  We believe we
have traced the problem to the following causes:

	* Some domains use name servers not in the same domain as they
	  live in.  For instance, if you query a GTLD server for
	  amelsrl.com, you are referred to ns1.cws.net and ns2.cws.net
	  as the authoritative servers.

	* The BIND 8 named does not trust the glue records in the
	  additional section of such a response, because .net is a
	  different domain from .com and cache poisoning could result
	  if the .com servers were treated as credible for .net
	  records.  (BIND 8 is not smart enough to use glue records
	  without caching them, nor smart enough to notice that the
	  ".com server" it got the answer from was really a GTLD
	  server which is credible for all domains.)  So BIND feels
	  the need to go out and look up ns1.cws.net or ns2.cws.net
	  (assuming it doesn't yet have cached records for those
	  hosts) before continuing a query for amelsrl.com.

	* When named does a glue query like this, it drops the
	  original query on the floor, hoping that the client will
	  retransmit after named has the glue records in its cache.

	* When you ask a resolver library for "foo.bar", it will first
	  try "foo.bar." and then "foo.bar.mit.edu.", assuming that
	  ndots is not set in resolv.conf to 1 or less (where 1 is the
	  number of dots in foo.bar).

	* Here is the new part: the Solaris 8 resolver interleaves the
	  queries for "foo.bar.mit.edu." with the queries for
	  "foo.bar.".  So before retransmitting the query for
	  "foo.bar.", the resolver will look up "foo.bar.mit.edu.",
	  get an NXDOMAIN, and return that result to the user.

Quite the comedy of errors.  BIND 9 might improve the BIND part of
this lossage (dnscache definitely would, but it has license issues),
but we are not prepared to replace the caching resolver at this point
in the release (except by making small changes).  We have one idea for
solving this problem on the Solaris resolver side without source code
modification (setting ndots to 1 or 2), but it's not a great answer--a
value of 1 might piss off users used to using "foo.lcs" as hostnames,
and a value of 2 won't help for domains with only one dot.

home help back first fref pref prev next nref lref last post