[4458] in testers

home help back first fref pref prev next nref lref last post

Re: On the Linux autoupdate problem

daemon@ATHENA.MIT.EDU (Greg Hudson)
Thu Jun 15 11:45:48 2000

Date: Thu, 15 Jun 2000 11:45:35 -0400
Message-Id: <200006151545.LAA01860@egyptian-gods.mit.edu>
From: Greg Hudson <ghudson@MIT.EDU>
To: tb@MIT.EDU (Thomas Bushnell, BSG)
Cc: Greg Hudson <ghudson@MIT.EDU>, testers@MIT.EDU
In-Reply-To: <u1hbt1ad1th.fsf@pusey.mit.edu>

> oxyacetylene seems to be running 8.3, and we knew there were some
> problems.  This problem is one I hadn't imagined, but given that
> utmp handling in 8.4 is basically Totally Different, I'm not sure
> it's a good data point, except that it would explain the difficulty
> the cluster workstations have.

Well, we still have the problem under 8.4.  astrophel, an 8.4.2
machine, failed to autoupdate to 8.4.3 because of a stale utmp entry
which doesn't show up in /usr/bin/finger or w.

Upon investigation, the most likely culprit in this case seems to be
an xterm which didn't die properly.  I can reproduce the condition by
running xterm and sending it a kill -9.  These entries don't show up
in /usr/bin/finger or w because:

	* /usr/bin/finger and w both ignore entries if ut_user doesn't
	  exist in the passwd file.
	* w also ignores entries if ut_pid doesn't exist in the
	  process table.

(So if I start xterm and kill -9 it, I get a stale utmp entry which
shows up in /usr/athena/bin/finger and /usr/bin/finger, but not in w.
Yay consistency.)

Solaris has a daemon which runs around cleaning up stale utmp entries
(basically changes USER_PROCESS to DEAD_PROCESS if the pid no longer
exists).  Perhaps Linux needs something similar.  In the meantime, I
think we can solve the important part of the problem by checking for
the existence of ut_pid in xlogin's utmp checking routines.

Following is some data leading up to my conclusions.  First, the
indication of the bogus entry I found on astrophel:

astrophel% /usr/athena/bin/finger

Local:
Login       Name               TTY Idle When        Office
alexp                 ???
ghudson  Greg Hudson           p1       Thu 10:59   E40-342D      x3-0825
astrophel% w
 11:16am  up 6 days, 12:34,  2 users,  load average: 1.00, 1.00, 1.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU  WHAT
ghudson  pts/1    egyptian-gods.mi 10:59am  0.00s  0.24s  0.01s  w 
astrophel% /usr/bin/finger
Login     Name          Tty      Idle  Login Time   Office     Office Phone
ghudson   Greg Hudson   pts/1          Jun 15 10:59 (egyptian-gods.mit.edu)

I dumped the utmp file (using /mit/ghudson/tmp/dumputmp.c) and the two
alexp entries were:

alexp    047,051,000,000 pts/3        25398 DEAD_PROCESS  000000,00 960852641
alexp    047,052,000,000 pts/4        24650 USER_PROCESS  000000,00 960834566

The most recent "last" line for alexp on pts/4 is:

alexp    pts/4                         Mon Jun 12 14:29   still logged in   

which suggests that this is a simple case of something failing to
clean up utmp.  Since no remote host is listed in the initial wtmp
entry (sshd, rlogind, and telnetd all put in a remote host), and the
pty is pts/4, the culprit would seem to be xterm.

home help back first fref pref prev next nref lref last post