[2237] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Heads up! AFS problem with IRIX 6.5.7 is of significant impact.

daemon@ATHENA.MIT.EDU (Bill Cattey)
Wed May 3 17:16:26 2000

Message-ID: <gt49QO0GgE6e0t_040@mit.edu>
Date: Wed,  3 May 2000 17:16:10 -0400 (EDT)
From: Bill Cattey <wdc@MIT.EDU>
To: owls@MIT.EDU
CC: release-team@MIT.EDU, brianb@sgi.com, jasonuhl@csd.sgi.com, rbasch@MIT.EDU,
        rar@MIT.EDU

I believe this was mentioned informally at a recent owls meeting.
(I am copying some of our friends at SGI so they know what is being
said about this situation.)  I wanted to give an update:

A couple weeks ago, Bob Basch got a report of an SGI system hanging when
it took the IRIX 6.5.7-based Athena update.  He isolated the fault
to a problem that the new version of IRIX, 6.5.7 can copy programs out
of AFS, but cannot execute them, on Indy's that have a Rev 1 R5000
CPU chip.

Jason Uhlenkott of SGI has been very helpful in moving our understanding
of this situation forward, and in helping find a solution.  Our current
understanding is that there was a bug in the Rev 1 R5000 CPU chip that
was worked around in the UNIX kernel.  IRIX 6.5.7 incorporated
modification to that work-around to enable some other third party
filesystem to work.  It was a surprise to everyone that that broke AFS.

Garry Zacheiss has used Athena athinfo to identify all systems that
either Moira or the Athena hardware database think are Indys, that
identify themselves over the net as having Rev 1 CPU chips.  A total of
89.

At this point we have five options:

1. Obtain a remedy  for this problem from SGI before June 1 so we can
integrate and test it in time for the July 15 roll-out of Athena 8.4. 
Bob and Jason are hard at work on this option.

2. Obtain a remedy  for this problem from Transarc before June 1 so we
can integrate and test it in time for the July 15 roll-out of Athena
8.4.  Bob is putting together a trouble report to Transarc to get them
into the picture.

3. Field upgrade the CPU chips of at least 89 systems.

4. Do not do an IRIX 6.5.7-based Athena release.  (This would mean that
known bug fixes would not go to the field including fixes for random
hangups we get on 300 MHz O2's owing to lack of support for that
configuration under IRIX 6.5.3.)

5. Disable the offending CPU chip work-around code completely.  (We know
how to do this now, but it would mean that random applications would
fail in random ways in the field.)

It is my hope that either Transarc or SGI come through with a remedy
in time.  We are pursuing a couple possibilities with SGI, including
backing out of just the code introduced in 6.5.7 for that other third party
filesystem.

If option 1 or 2 do not pan out I guess, it falls to the Owls team to
decide if we exercise choice 3, 4, or 5.

Release team will keep everyone informed of progress as is our usual
practice in situations like this.

-wdc

home help back first fref pref prev next nref lref last post