[2632] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Re: Please strongly consider backing out the zephyr servers

daemon@ATHENA.MIT.EDU (Greg Hudson)
Tue Mar 6 19:33:34 2001

Message-Id: <200103070033.TAA30308@egyptian-gods.MIT.EDU>
To: "Jeffrey I. Schiller" <jis@MIT.EDU>
cc: Garry Zacheiss <zacheiss@MIT.EDU>, "Susan S. Minai-Azary" <azary@MIT.EDU>,
        Greg Hudson <ghudson@MIT.EDU>, John Hawkinson <jhawk@MIT.EDU>,
        release-team@MIT.EDU, op@MIT.EDU, winzephyr-release@MIT.EDU
In-Reply-To: Your message of "Tue, 06 Mar 2001 18:43:41 EST."
             <20010306184341.C1894@mit.edu> 
Date: Tue, 06 Mar 2001 19:33:19 -0500
From: Greg Hudson <ghudson@MIT.EDU>

> That the customer is a "tester" is besides the point. There is an
> implied understanding with testers (generically, not just here at
> MIT) that software being tested may have bugs. However this
> agreement does not usually extend to "we can pull the service out
> from underneath you" unless there is an agreement that states this.

It is my understanding that winzephyr has never progressed beyond
"alpha," and that the canonical understanding (not just at MIT) of
"alpha" software is that it might have bugs serious enough to totally
break operation under some circumstances.

> You cannot make changes to the wire protocol without massive
> coordination. Welcome to the Real World.

There was, actually, a lot of coordination, including deployment to
non-Unix clients and a six year period where people running old code
were able to subscribe but not send authentic messages.

And it would have been sufficient except for a little bug that hasn't
yet been diagnosed.  Winzephyr switched to the new checksum code in
1998 (although I was temporarily fooled into believing that it hadn't
been because that was an obvious explanation).  An analagous situation
would be a new version of Kerberos doing stricter checking on client
requests to close a security hole, and finding out upon deployment
that some alpha clients failed the checks because of a bug.

> What I saw was an extended debate about who was responsible for
> fixing what... while the customers were not working.

Perhaps you had information I did not see, but I don't think we had
seen any evidence that any customers were seriously inconvenienced or
unable to work (or even unable to use zephyr) because of this problem.
(Of course I would not advocate waiting for such information to arrive
in the case of an in-production service.)

This does not in any way imply that your argument has no merit.  I
think there was a real judgment call to be made and one can
legitimately argue in either direction.  I think part of the
frustration here is that the groups with responsibility for the
software and services in question were not allowed to make the
judgment call for themselves, nor was the decision made through the
reporting paths of those groups.

home help back first fref pref prev next nref lref last post