[3303] in Release_Engineering
zwgc problems
daemon@ATHENA.MIT.EDU (ghudson@MIT.EDU)
Sat Aug 27 18:17:20 1994
From: ghudson@MIT.EDU
Date: Sat, 27 Aug 94 18:17:14 -0400
To: rel-eng@MIT.EDU
Cc: ops@MIT.EDU, marc@MIT.EDU
There has been a serious problem with zwgc this morning. The symptoms
are that you start up zwgc, become locatable and zwritable, but do not
receive messages. After several minutes, you receive a whole bunch of
zephyrgrams in a burst. At this point, you may continue to receive
zephyrgrams, or you may find that the servers have flushed your
locations and subscriptions.
The cause of the problem is related to the name servers and not the
zephyr servers. The problem is that, at startup, zwgc pretends to
have received a packet from address 0.0.0.0. In the process of
decoding the packet, zwgc does a gethostbyaddr() of 0.0.0.0, which
takes several minutes to complete (in a test case, it took exactly one
minute, but in zwgc, it seems to vary from invocation to invocation
and from platform to platform).
The name server problem is still under investigation. Regardless of
how it is resolved, however, zwgc should probably not be doing a
gethostbyaddr() of a faked packet. To correct this, in the code,
change the following lines in decode_notice() in zwgc's notice.c:
fromhost = gethostbyaddr(&(notice->z_sender_addr), sizeof(struct in_addr),
AF_INET);
var_set_variable("fromhost", fromhost ? fromhost->h_name :
inet_ntoa(notice->z_sender_addr));
to:
if (notice->z_sender_addr.s_addr != 0) {
fromhost = gethostbyaddr(&(notice->z_sender_addr),
sizeof(struct in_addr), AF_INET);
var_set_variable("fromhost", fromhost ? fromhost->h_name :
inet_ntoa(notice->z_sender_addr));
} else {
var_set_variable("fromhost", "localhost");
}
(There are other possible fixes with similar effects.) This is a
kludge, which is in my opinion necessary due to the poor design
decision of faking a packet to oneself. Zephyr is full of routines
which fake other routines into taking appropriate action even though
they were intended for different purposes; this is a poor way of
reusing code.