[4894] in testers
Semi-obscure AFS problem - SparcClassic and Sparcstation-20
daemon@ATHENA.MIT.EDU (Mitchell E Berger)
Mon Jun 25 20:32:12 2001
Message-Id: <200106260032.UAA25865@byte-me.mit.edu>
To: testers@MIT.EDU
Date: Mon, 25 Jun 2001 20:32:09 -0400
From: Mitchell E Berger <mitchb@MIT.EDU>
Preface: I understand that these are unsupported machine types, one more
than the other, but I think there are some Sparcstation-20s still in active
use and their owners might want/try to upgrade them to 9.0.
In attempting to coerce my SparcClassic into running Athena 9.0, I've
stumbled across a problem that likely hasn't been seen yet. After booting
into the miniroot, everything goes fine through the point at which AFS starts.
Then a kernel panic happens. After much debugging, I've discovered that
the panic occurs when you first try to read a file out of AFS (note that
traversing AFS space and using ls -l in it works fine, though). Information
on the bug I believe this to be can be found here:
http://www.transarc.ibm.com/Support/afs/readmes/afs36.patch2.readme.html
(Search for "12595," which is the defect number.)
It is listed as specific to Solaris 8 and Sparcstation 20s, which is likely
why we haven't heard about it before. It wouldn't shock me if I were the
first to try running Solaris 8 and Transarc AFS 3.6 on a SparcClassic, so that
may be why they're not mentioned - I think kolya knows of an architectural
similarity between it and the SS20. This bug was fixed in Transarc's AFS
3.6 Patch 2, but we're currently running Patch 1 (jweiss thinks this may have
been because the initial patch 2 had problems and was withdrawn).
I tried hacking on the installation and essentially running the setup-swap-boot
procedure step by step by hand and replacing the Transarc 3.6 Patch 1 binaries
in the miniroot with OpenAFS 1.0.4 binaries before booting into it, and as of
a couple of hours ago, my Classic had finished installing packages and was on
to patches. If those binaries are believed to be stable and reliable, since
we're already using OpenAFS on linux, perhaps we should consider moving to them
on Sun4 since they seem to have more up-to-date bugfixes than what we're using
now and will probably allow more machines to run Athena, even though they're
not supported.
I was concerned some SS20s might be in the early cluster and get bitten tonight,
but checking that with moira and athinfo didn't turn up any definite SS20s.
There was one SparcClassic (dry-cooked-sliced-beef) that's still listed as
being in E40 (not used recently), and three early machines failed to ping or
have model information in moira: florey-old, mozart, and suan-la-chow-show.
Mitch