[11083] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: NSI bulletin 097-004 | Root Server Problems

daemon@ATHENA.MIT.EDU (Greg A. Woods)
Mon Jul 21 00:55:21 1997

Date: Mon, 21 Jul 1997 00:47:06 -0400 (EDT)
From: woods@most.weird.com (Greg A. Woods)
To: David Holtzman <dholtz@internic.net>
Cc: nanog@merit.edu, namedroppers@internic.net
Reply-To: woods@weird.com (Greg A. Woods)

> Date: Thu, 17 Jul 1997 22:52:18 +0500 (GMT)
> From: David Holtzman <dholtz@internic.net>
> To: nanog@merit.edu
> Subject: NSI bulletin 097-004 | Root Server Problems
> Resent-Date: Thu, 17 Jul 1997 14:42:42 -0400 (EDT)
> 
> On Wednesday night, July 16, during the computer-generation of the
> Internet top-level domain zone files, an Ingres database failure resulted 
> in corrupt .COM and .NET zone files.  Despite alarms raised by Network 
> Solutions' quality assurance schemes, at approximately 2:30 a.m. (Eastern 
> Time), a system administrator released the zone file without regenerating the
> file and verifying its integrity.  Network Solutions corrected the
> problem and reissued the zone file by 6:30 a.m. (Eastern Time).  
> 
> Thank you.
> David H. Holtzman
> Sr VP Engineering, Network Solutions
> dholtz@internic.net

So, if the new zone files were re-issued at 06:30 EST, and they take
about an hour to download, why was it that some root servers were still
handing out bad data many hours later (at least one until about 14:00
EST)?  The particular server I'm thinking of, though not residing in the
Eastern timezone, does seem to have what I think is a 24x7 NOC nearby,
and in theory could have been prepared to reload as quickly as anyone.

This may be just a coincidence, but it was about an hour after I
e-mailed and telephoned them that they finally had the right data in
place.  Unfortunately finding the right contact was not entirely trivial
because the listed contact person had a full voice-mailbox and his
operator had no idea who else I could speak to, and the NOC has only a
1-800 number (and a FAX) listed that doesn't work outside the USA.  The
NOC person I finally reached on the telephone didn't even seem to be
fully aware that they indeed ran a root nameserver for the Internet.  He
did know that there was e-mail bouncing, and indeed I didn't expect they
could answer my e-mail if they were using their own root server....

Worst of all though they left the errant server on-line, handing out
NXDOMAIN replies to any and all who asked, while they were downloading
the corrected zone files.  Hopefully this is not standard operating
procedure for a root server, or at least not from now on.

What annoys me most is that I didn't receive any notification of any
sort of problem from any of the mailing lists out of internic.net.  I
probably should subscribe to nanog, but I'd have thought namedroppers,
or maybe even rs-info, should have had the above announcement posted
just as soon as the mailers had enough trustworthy DNS data to deliver
it with.  There was nothing in http://rs.internic.net/announcements/
either, except for drivel about "maintaining high customer service
levels," and there still isn't (though I suppose this event wasn't
exactly "good PR").

What are the current procedures for announcing such problems to more
than just the root operators themselves?

-- 
							Greg A. Woods

+1 416 443-1734      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>

home help back first fref pref prev next nref lref last post