[1015] in Zephyr_Bugs
zephyr: a few bugs & quick fixes
daemon@ATHENA.MIT.EDU (Brian Boylston USG)
Mon Apr 30 09:01:08 2001
Date: Mon, 30 Apr 2001 09:00:34 -0400 (EDT)
From: Brian Boylston USG <boylston@zk3.dec.com>
Reply-To: Brian.Boylston@compaq.com
To: zephyr-bugs@mit.edu
Message-ID: <Pine.OSF.4.03.10104271710570.618259-100000@wasted.zk3.dec.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
zephyr-bugs,
I understand that Zephyr is no longer under active development and that
there will be no more public releases. However, I have encountered several
bugs and have fixes for them that I thought you might be interested in, at
least for amusement.
I am working from the sources in zephyr-20000421.tar.gz on alpha running
tru64 unix. I emphasize that these changes have not been extensively
tested in my environment and have not been at all tested in other
environments.
The bugs and fixes:
1. 64-bit problem with definition of Code_t in zephyr.h.
I noticed that the error codes generated by the et package are of type
'long' while the definition of Code_t (the return type for the zephyr
library routines) is of type 'int'. On my machine, at least, this
causes clients to report false or incorrect errors.
From h/zephyr/zephyr.h, lines 109-110:
/* Function return code */
typedef int Code_t;
The following change has corrected the problem for me:
/* Function return code */
typedef long Code_t;
2. Non-unique unique ids generated in Z_SendFragmentedNotice().
My host manager was generating syslog entries like:
Hey! This packet isn't in my queue!
when I was zwriting large (> 1K) notices. I tracked this down to a
problem with using gettimeofday() to generate uids for notice
fragments: fast machines often return the same times for sequential
calls to gettimeofday()!
The non-unique uids confuse the host manager, causing it not to ACK
client packets. The clients report that the host manager is not
responding.
From lib/Zinternal.c, lines 872-885:
while (offset < notice->z_message_len || !notice->z_message_len) {
(void) sprintf(multi, "%d/%d", offset, notice->z_message_len);
partnotice.z_multinotice = multi;
if (offset > 0) {
(void) gettimeofday(&partnotice.z_uid.tv,
(struct timezone *)0);
partnotice.z_uid.tv.tv_sec =
htonl((u_long) partnotice.z_uid.tv.tv_sec);
partnotice.z_uid.tv.tv_usec =
htonl((u_long) partnotice.z_uid.tv.tv_usec);
(void) memcpy((char *)&partnotice.z_uid.zuid_addr, &__My_addr,
sizeof(__My_addr));
}
message_len = min(notice->z_message_len-offset, fragsize);
The following change has eliminated the problem for me, though it does
not address the fundamental problem of using gettimeofday() to
distinguish packets from a given host (consider two processes each
generating uids on an SMP machine):
while (offset < notice->z_message_len || !notice->z_message_len) {
(void) sprintf(multi, "%d/%d", offset, notice->z_message_len);
partnotice.z_multinotice = multi;
if (offset > 0) {
partnotice.z_uid.tv.tv_usec =
ntohl((u_long)partnotice.z_uid.tv.tv_usec);
partnotice.z_uid.tv.tv_usec++;
if( partnotice.z_uid.tv.tv_usec >= 1000000 ) {
partnotice.z_uid.tv.tv_sec =
ntohl((u_long)partnotice.z_uid.tv.tv_sec);
partnotice.z_uid.tv.tv_sec++;
partnotice.z_uid.tv.tv_sec =
htonl((u_long)partnotice.z_uid.tv.tv_sec);
partnotice.z_uid.tv.tv_usec = 0;
}
partnotice.z_uid.tv.tv_usec =
htonl((u_long)partnotice.z_uid.tv.tv_usec);
}
message_len = min(notice->z_message_len-offset, fragsize);
This change guarantees that the uid is unique for all fragments in a
given notice.
3. Delayed packet processing in zhm with DEBUG defined.
While investigating the previous bug, I encountered another one after
#defining DEBUG for the zhm. In zhm's main loop in zhm/zhm.c, the
ZPending() in line 225:
DPR2("Pending = %d\n", ZPending());
may cause additional packets to be read from the socket and queued up.
However, the loop only processes one queued packet before waiting for
more packets to arrive from the network. This can result in very
delayed processing of queued packets. For a fix, I surrounded the
contents of the if(...) clause in lines 185-238 with a do...while:
if (count > 0) {
do {
.
.
.
} while( ZQLength() > 0 );
}
4. Bad memmove() length parameter in server/dispatch.c
I discovered this upon code inspection. I have to believe that this
can cause some problems though I haven't seen any that I can directly
attribute to it.
In server/dispatch.c, lines 927-933:
for (i = 0; i < num_hosts; i++) {
if (hosts[i].s_addr == who->sin_addr.s_addr) {
memmove(&hosts[i], &hosts[i + 1], num_hosts - (i + 1));
num_hosts--;
break;
}
}
Note that hosts[] is an array of struct in_addrs. Thus, the call to
memmove() only copies a fraction of the necessary data. I suggest the
following change to the call to memmove():
memmove(&hosts[i], &hosts[i + 1], (num_hosts - (i + 1)) * sizeof(struct in_addr));
Lemme know if you have questions, comments, etc.
Thanks,
Brian Boylston