[4458] in testers
Re: On the Linux autoupdate problem
daemon@ATHENA.MIT.EDU (Greg Hudson)
Thu Jun 15 11:45:48 2000
Date: Thu, 15 Jun 2000 11:45:35 -0400
Message-Id: <200006151545.LAA01860@egyptian-gods.mit.edu>
From: Greg Hudson <ghudson@MIT.EDU>
To: tb@MIT.EDU (Thomas Bushnell, BSG)
Cc: Greg Hudson <ghudson@MIT.EDU>, testers@MIT.EDU
In-Reply-To: <u1hbt1ad1th.fsf@pusey.mit.edu>
> oxyacetylene seems to be running 8.3, and we knew there were some
> problems. This problem is one I hadn't imagined, but given that
> utmp handling in 8.4 is basically Totally Different, I'm not sure
> it's a good data point, except that it would explain the difficulty
> the cluster workstations have.
Well, we still have the problem under 8.4. astrophel, an 8.4.2
machine, failed to autoupdate to 8.4.3 because of a stale utmp entry
which doesn't show up in /usr/bin/finger or w.
Upon investigation, the most likely culprit in this case seems to be
an xterm which didn't die properly. I can reproduce the condition by
running xterm and sending it a kill -9. These entries don't show up
in /usr/bin/finger or w because:
* /usr/bin/finger and w both ignore entries if ut_user doesn't
exist in the passwd file.
* w also ignores entries if ut_pid doesn't exist in the
process table.
(So if I start xterm and kill -9 it, I get a stale utmp entry which
shows up in /usr/athena/bin/finger and /usr/bin/finger, but not in w.
Yay consistency.)
Solaris has a daemon which runs around cleaning up stale utmp entries
(basically changes USER_PROCESS to DEAD_PROCESS if the pid no longer
exists). Perhaps Linux needs something similar. In the meantime, I
think we can solve the important part of the problem by checking for
the existence of ut_pid in xlogin's utmp checking routines.
Following is some data leading up to my conclusions. First, the
indication of the bogus entry I found on astrophel:
astrophel% /usr/athena/bin/finger
Local:
Login Name TTY Idle When Office
alexp ???
ghudson Greg Hudson p1 Thu 10:59 E40-342D x3-0825
astrophel% w
11:16am up 6 days, 12:34, 2 users, load average: 1.00, 1.00, 1.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
ghudson pts/1 egyptian-gods.mi 10:59am 0.00s 0.24s 0.01s w
astrophel% /usr/bin/finger
Login Name Tty Idle Login Time Office Office Phone
ghudson Greg Hudson pts/1 Jun 15 10:59 (egyptian-gods.mit.edu)
I dumped the utmp file (using /mit/ghudson/tmp/dumputmp.c) and the two
alexp entries were:
alexp 047,051,000,000 pts/3 25398 DEAD_PROCESS 000000,00 960852641
alexp 047,052,000,000 pts/4 24650 USER_PROCESS 000000,00 960834566
The most recent "last" line for alexp on pts/4 is:
alexp pts/4 Mon Jun 12 14:29 still logged in
which suggests that this is a simple case of something failing to
clean up utmp. Since no remote host is listed in the initial wtmp
entry (sshd, rlogind, and telnetd all put in a remote host), and the
pty is pts/4, the culprit would seem to be xterm.