[3232] in Release_Engineering
Patch for zwrite
daemon@ATHENA.MIT.EDU (ghudson@MIT.EDU)
Thu Jun 16 21:13:13 1994
From: ghudson@MIT.EDU
Date: Thu, 16 Jun 1994 21:13:02 -0400
To: rel-eng@MIT.EDU
Cc: tytso@MIT.EDU, probe@MIT.EDU, cfields@MIT.EDU
Problem
-------
The Athena 7.7 zwrite hangs when you send a broadcast zephyrgram.
Solution
--------
Back out revision 1.42, going back to revision 1.41, i.e:
23c23
> static char rcsid_zwrite_c[] = "$Id: zwrite.c,v 1.42 94/04/30 18:33:03 probe Exp $";
---
< static char rcsid_zwrite_c[] = "$Id: zwrite.c,v 1.41 1993/11/21 03:34:27 probe Exp $";
230c230
> if (quiet || !nrecips)
---
< if (quiet)
Explanation
-----------
This is a little complicated. Around line 230, we have:
if (!nocheck && nrecips)
send_off(¬ice, 0);
if (quiet || !nrecips)
notice.z_kind = UNACKED; /* change for real sending */
[... lots of code ...]
send_off(¬ice, 1);
That is, notice.z_kind is set to UNACKED *after* the call to
send_off(¬ice, 0) and before the call to send_off(¬ice, 1).
This is important below. The declaration for send_off() is:
send_off(notice, real)
ZNotice_t *notice;
int real;
send_off() then goes into a loop over the recipients (or a loop that
executes once, if there are no recipients, as is the case with a
broadcast zephyrgrams) containing (among other things) the code:
if ((retval = ZSendNotice(notice, auth)) != ZERR_NONE) {
(void) sprintf(bfr, "while sending notice to %s",
nrecips?notice->z_recipient:inst);
com_err(whoami, retval, bfr);
break;
}
if (quiet && real) {
if (nrecips)
continue; /* next! */
else
break; /* no more */
}
if ((retval = ZIfNotice(&retnotice, (struct sockaddr_in *) 0,
ZCompareUIDPred,
(char *)¬ice->z_uid)) !=
ZERR_NONE) {
ZFreeNotice(&retnotice);
(void) sprintf(bfr, "while waiting for acknowledgement for %s",
nrecips?notice->z_recipient:inst);
com_err(whoami, retval, bfr);
continue;
}
ZIfNotice() waits for an ack from the host manager. If quiet is set,
then the loop avoids calling ZIfNotice() and goes to the next loop
iteration. However, if !nrecips, then there is no such check,
ZIfNotice() gets called anyway, and zwrite hangs waiting for an ack
from the host manager that isn't forthcoming.
The dependency on real is there because when send_off() is called with
real set to 0, notice.z_kind is still ACKED even if quiet is true.
Rationale for fix
-----------------
Sending broadcast zephyrgrams unacked won't do anything to reduce load
on the server; it only affects whether the host manager sends an ack
to the client. In fact, there is no code in the server to support not
sending such an ack; you can set notice.z_kind to UNSAFE or
NUCLEAR_WAR or whatever, and the server will still send an ack to the
host manager.
You could change the `if (quiet && real)' check to `if (real && (quiet
|| !nrecips))' and fix the hanging problem, but that would mean not
telling people who send broadcast zephyrgrams whether they succeeded
or not, for no good reason.
According to Timo's thesis, less than 3% of incoming notices to the
zephyr servers are actually notices, so trying to convince the server
not to send acks for broadcast zephygrams isn't likely to help the
servers out any. In my opinion, r1.42 was a very misguided change.
There also seems to be some misconception about how the server acks
notices. Under no circumstances will the server inform the sending
client that the receiving client has acked the zephyrgram. Some
people thought that the server would wait for an ack from a single
client (regardless of the number of recipients) and then ack to the
sending client. I think the confusion stems from the following server
code (at the bottom of sendit(), in dispatch.c):
if ((clientlist = subscr_match_list(notice)) != NULLZCLT) {
for (ptr = clientlist->q_forw;
ptr != clientlist;
ptr = ptr->q_forw) {
/* for each client who gets this notice,
send it along */
xmit(notice, &(ptr->zclt_client->zct_sin), auth,
ptr->zclt_client);
if (!acked) {
acked = 1;
ack(notice, who);
}
}
subscr_free_list(clientlist);
}
if (!acked)
nack(notice, who);
xmit() is just a sending routine, and nothing is checking it for
errors (the Zephyr server code does not make use of an
exception-handling facility). Thus, the ack is *always* sent if there
is at least one entry in the subscription list. You will also note
that there is no code here that checks whether notice.z_kind is ACKED,
UNACKED, UNSAFE, or whatever; ack() and nack() don't check that field
either.
(When the server receives a CLIENTACK notice, all it does is cancel
its unacked packet entry; it never sends anything off to the client
which sent the notice being acked.)
--GBH