[2954] in Release_Engineering

home help back first fref pref prev next nref lref last post

Eros down - looks bad

daemon@ATHENA.MIT.EDU (epeisach@Athena.MIT.EDU)
Sun Dec 27 21:33:55 1992

From: epeisach@Athena.MIT.EDU
Date: Sun, 27 Dec 92 21:33:36 -0500
To: op@Athena.MIT.EDU
Cc: builder@Athena.MIT.EDU, rel-eng@Athena.MIT.EDU



Eros is down and out again......

*eros.mit.edu     12/26 12:53  timeout      11/25 11:11-11/25 11:46 timeout

Dev 21,26 is rz3c - which is one of the R squared drives. I do not know
which one it is and if it is the same one that failed last time. 


I came into MIT tonight to try and deal with the problem but was locked
out. (No one's fault really). As Richard is out of town, I am willing to
come by tomorrow and diagnose the failure and the impact and determine a
possible course of action. Meanwhile access to the source tree is
partial. (i.e. release.74 probably available).

I give someone a call tomorrow when I figure out my schedule.

	Ezra



----------------------------------------
Diagnosis: (from DSLOGGER reports)



The relevant code from the kernel gfs_bio.c:

	/* CJXXX - this must mean something to somebody */
	if ((unsigned)blkno >= 1 << (sizeof(int)*NBBY-DEV_BSHIFT)) { /* XXX */
		mprintf("getblk: invalid blkno %d, dev %d,%d",
		       blkno, major(dev), minor(dev));
		if (gp)
			mprintf(" gp 0x%x number %d mode %o\n",
				gp, gp->g_number, gp->g_mode);
		else
			mprintf(" gp NULL\n");
		blkno = 1 << ((sizeof(int)*NBBY-DEV_BSHIFT) + 1);
	}
	
I wrote a small program to test what the maximum block number could be
and came up with 1<<23, or 8388608

A 2 gig drive should have a maximum of approx half of that....

I believe the code fragment is from either alloc or realloccg. (based on
gp being NULL). 

If I saw a filesystem or out of of gnodes error listed for eros I can
see a kernel "bug" that would cause an error. 

The interesting fact is that blkno -2 is requested.... There is code in
afs_istuff.c that sets the gid to -2 for these vice-inodes.... It merits
some thought.....






-----------------

From Euterpe's logs on Saturday (12/26 before it died)

12:52:31
getblk: invalid blkno -2, dev 21,26 


On thursday (12/24) many errors were logged of the following nature:
MESSAGE                getblk: invalid blkno 1936946000, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 1936946000, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 2065077968, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 1824861568, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 89352624, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -760681200, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 104024592, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 89352624, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 1404996368, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 100166064, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -121786864, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 100166064, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 2111204624, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -121730288, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 89352624, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 961616944, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 995718160, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 171105136, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 990223136, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -1736537264, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 2113035312, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 91380784, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -1078258048, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 1681612016, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -658166400, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -382593152, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -1330157536, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -772706528, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 1786785360, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -1299032064, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 1936946000, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 1936946000, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -603979712, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 2021993872, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -760680880, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno -1736537264, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 1404939728, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 633380016, dev 21,26
MESSAGE                 gp NULL
MESSAGE                getblk: invalid blkno 633380016, dev 21,26
MESSAGE                getblk: invalid blkno -1179010624, dev 21,26
MESSAGE                getblk: invalid blkno 1001266032, dev 21,26
MESSAGE                getblk: invalid blkno -521884144, dev 21,26
MESSAGE                getblk: invalid blkno -1259185040, dev 21,26
MESSAGE                getblk: invalid blkno 1917479072, dev 21,26
MESSAGE                getblk: invalid blkno 1136896160, dev 21,26


home help back first fref pref prev next nref lref last post