[2268] in SIPB-AFS-requests
Re: Status of the cell, and possible plans to revert to 3.2
daemon@ATHENA.MIT.EDU (Greg Hudson)
Tue Jan 30 02:47:09 1996
To: jweiss@MIT.EDU
Cc: sipb-afsreq@MIT.EDU
In-Reply-To: Your message of "Tue, 30 Jan 1996 01:50:49 EST."
<9601300650.AA25445@w20-spare-dec.MIT.EDU>
Date: Tue, 30 Jan 1996 02:46:26 EST
From: Greg Hudson <ghudson@MIT.EDU>
> cd /
> /usr/afs/bin/bosserver < /dev/null >& /dev/null &
> This is how I was told to start it the first time I dealt with the
> dying bosserver problem, and am somewhat superstitous about the I/O
> redirection.
Actually, I think this fully explains the problems we're seeing.
When I upgraded the cell, and every time we recovered from the
dying-server problem, we ran "/usr/afs/bin/bosserver &" without I/O
redirection. Then we logged out, making the tty the bosserver is
associated with go away. The next time the bosserver tries to write
to the tty, it will die (on a SIGTTOU, I think). This doesn't happen
on normal startup, because the tty associated with the bosserver
(/dev/console) doesn't go away.
If this theory is correct, rosebud's bosserver should not die, but we
should expect ronald-ann's to in the near future.
Incidentally, this is another instance of AFS sucking. Unix servers
traditionally background themselves, disassociate themselves from the
tty, and log errors using syslog() or some form of file logging rather
than output to stdout or stderr. Thus they don't have this problem.
Since bosserver doesn't background itself, I should have thought of
the tty problem much earlier.
Updated plan:
* If we see the failure mode on ronald-ann, reboot the machine
(just in case something else is wrong).
* If we see the failure mode on rosebud again, then my theory
is shot to hell and we should follow the old plan, subject
to jweiss's and kcr's addendums.