[16436] in Athena Bugs
Re: dm failed to clean up
daemon@ATHENA.MIT.EDU (Greg Hudson)
Wed Oct 21 17:07:34 1998
Date: Wed, 21 Oct 1998 17:07:29 -0400
From: Greg Hudson <ghudson@MIT.EDU>
To: Greg Hudson <ghudson@MIT.EDU>
Cc: bugs@MIT.EDU, marc@MIT.EDU
In-Reply-To: "[16433] in Athena Bugs"
I've diagnosed this problem and fixed two bugs relating to it (as well
as a more minor bug which was probably not relevant). The fixes
should be in 8.2.14.
Here is what is going on, using the debugging syslogs to indicate
chronology:
Oct 20 19:19:50 x15-cruise-basselope.mit.edu dm: Starting console
Oct 20 19:19:50 x15-cruise-basselope.mit.edu dm: Starting xlogin, try #1
Oct 20 19:19:50 x15-cruise-basselope.mit.edu dm: Received SIGUSR1; setting login_running.
At this point, the unrelated bug crops up and Marc doesn't get any
windows (because his login is blocked writing to stdout). So he
SIGTERMs the console process:
Oct 20 19:31:50 x15-cruise-basselope.mit.edu dm: Received SIGCHLD for consolepid (28139), status 15
Oct 20 19:31:50 x15-cruise-basselope.mit.edu dm: Starting console
Now Marc's login process unblocks and runs config_console:
Oct 20 19:31:50 x15-cruise-basselope.mit.edu dm: Received SIGCHLD for consolepid (28348), status 0
Oct 20 19:31:50 x15-cruise-basselope.mit.edu dm: Starting console
Despite the syslog, dm didn't actually start a console process.
Instead, start_console() decided, "damn, console exited twice in three
seconds, it must be a lost cause," set the console_failed flag, and
returned without starting a console process. This is bug #1; dm
should never decide that console is a lost because it exited with
status 0. Marc's login continues:
Oct 20 19:32:02 x15-cruise-basselope.mit.edu /usr/athena/bin/get_message[28365]: GMS client started...
Oct 20 19:32:02 x15-cruise-basselope.mit.edu /usr/athena/bin/get_message[28365]: GMS not showing.
Oct 20 19:32:02 x15-cruise-basselope.mit.edu dm: Starting console
Bug #2 is that nothing actually keeps dm from starting a new console
process after it has set the console_failed flag. So even though dm
has given up on console, it still starts a new one.
At this point, dm is reading from consoletty because the
console_failed flag is set, and the console process is also reading
from consoletty. The result is that dm winds up blocking on a read()
of consoletty (the console process having snatched the input away
before it could read it). dm receives a couple of signals (a leftover
SIGALRM and a SIGCHLD when Marc logs out), but does not act on them.
Oct 20 19:32:50 x15-cruise-basselope.mit.edu dm: Received SIGALRM.
Oct 21 04:04:04 x15-cruise-basselope.mit.edu dm: Received SIGCHLD for loginpid (28140), status 0
The next morning, Jeremy Daniel arrived to find a sticky note on xcb
saying "Log me out if you can." He proceeded to kill Marc's user
processes. At some point in this process, he managed to unstick dm
(probably by generating some consoletty output which the console
process didn't grab first), at which point dm noticed that xlogin
wasn't running and quit.
Oct 21 08:44:49 x15-cruise-basselope.mit.edu dm: login_running=0, x_running=1, quitting