[2954] in Release_Engineering
Eros down - looks bad
daemon@ATHENA.MIT.EDU (epeisach@Athena.MIT.EDU)
Sun Dec 27 21:33:55 1992
From: epeisach@Athena.MIT.EDU
Date: Sun, 27 Dec 92 21:33:36 -0500
To: op@Athena.MIT.EDU
Cc: builder@Athena.MIT.EDU, rel-eng@Athena.MIT.EDU
Eros is down and out again......
*eros.mit.edu 12/26 12:53 timeout 11/25 11:11-11/25 11:46 timeout
Dev 21,26 is rz3c - which is one of the R squared drives. I do not know
which one it is and if it is the same one that failed last time.
I came into MIT tonight to try and deal with the problem but was locked
out. (No one's fault really). As Richard is out of town, I am willing to
come by tomorrow and diagnose the failure and the impact and determine a
possible course of action. Meanwhile access to the source tree is
partial. (i.e. release.74 probably available).
I give someone a call tomorrow when I figure out my schedule.
Ezra
----------------------------------------
Diagnosis: (from DSLOGGER reports)
The relevant code from the kernel gfs_bio.c:
/* CJXXX - this must mean something to somebody */
if ((unsigned)blkno >= 1 << (sizeof(int)*NBBY-DEV_BSHIFT)) { /* XXX */
mprintf("getblk: invalid blkno %d, dev %d,%d",
blkno, major(dev), minor(dev));
if (gp)
mprintf(" gp 0x%x number %d mode %o\n",
gp, gp->g_number, gp->g_mode);
else
mprintf(" gp NULL\n");
blkno = 1 << ((sizeof(int)*NBBY-DEV_BSHIFT) + 1);
}
I wrote a small program to test what the maximum block number could be
and came up with 1<<23, or 8388608
A 2 gig drive should have a maximum of approx half of that....
I believe the code fragment is from either alloc or realloccg. (based on
gp being NULL).
If I saw a filesystem or out of of gnodes error listed for eros I can
see a kernel "bug" that would cause an error.
The interesting fact is that blkno -2 is requested.... There is code in
afs_istuff.c that sets the gid to -2 for these vice-inodes.... It merits
some thought.....
-----------------
From Euterpe's logs on Saturday (12/26 before it died)
12:52:31
getblk: invalid blkno -2, dev 21,26
On thursday (12/24) many errors were logged of the following nature:
MESSAGE getblk: invalid blkno 1936946000, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 1936946000, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 2065077968, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 1824861568, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 89352624, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -760681200, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 104024592, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 89352624, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 1404996368, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 100166064, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -121786864, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 100166064, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 2111204624, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -121730288, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 89352624, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 961616944, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 995718160, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 171105136, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 990223136, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -1736537264, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 2113035312, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 91380784, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -1078258048, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 1681612016, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -658166400, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -382593152, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -1330157536, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -772706528, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 1786785360, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -1299032064, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 1936946000, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 1936946000, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -603979712, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 2021993872, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -760680880, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno -1736537264, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 1404939728, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 633380016, dev 21,26
MESSAGE gp NULL
MESSAGE getblk: invalid blkno 633380016, dev 21,26
MESSAGE getblk: invalid blkno -1179010624, dev 21,26
MESSAGE getblk: invalid blkno 1001266032, dev 21,26
MESSAGE getblk: invalid blkno -521884144, dev 21,26
MESSAGE getblk: invalid blkno -1259185040, dev 21,26
MESSAGE getblk: invalid blkno 1917479072, dev 21,26
MESSAGE getblk: invalid blkno 1136896160, dev 21,26