[570] in arla-drinkers

home help back first fref pref prev next nref lref last post

RE: frequent cache corruption with arla 0.21 on linux 2.2.1

daemon@ATHENA.MIT.EDU (Neulinger, Nathan R.)
Thu Feb 4 10:31:21 1999

From owner-arla-drinkers@stacken.kth.se Thu Feb 04 15:31:20 1999
Return-Path: <owner-arla-drinkers@stacken.kth.se>
Delivered-To: arla-drinkers-mtg@bloom-picayune.mit.edu
Received: (qmail 19091 invoked from network); 4 Feb 1999 15:31:19 -0000
Received: from unknown (HELO sundance.stacken.kth.se) (130.237.234.41)
  by bloom-picayune.mit.edu with SMTP; 4 Feb 1999 15:31:19 -0000
Received: (from majordom@localhost)
	by sundance.stacken.kth.se (8.8.8/8.8.8) id QAA00213
	for arla-drinkers-list; Thu, 4 Feb 1999 16:25:55 +0100 (MET)
Received: from umr.edu (hermes.cc.umr.edu [131.151.1.68])
	by sundance.stacken.kth.se (8.8.8/8.8.8) with ESMTP id QAA00204
	for <arla-drinkers@stacken.kth.se>; Thu, 4 Feb 1999 16:25:47 +0100 (MET)
Received: from umr-mail01.cc.umr.edu (umr-mail01.cc.umr.edu [131.151.37.121]) via ESMTP by hermes.cc.umr.edu (8.8.7/R.4.20) id JAA18793; Thu, 4 Feb 1999 09:25:45 -0600 (CST)
Received: by umr-mail01.cc.umr.edu with Internet Mail Service (5.5.2232.9)
	id <D9V03ZPJ>; Thu, 4 Feb 1999 09:25:44 -0600
Message-ID: <9DA8D24B915BD1118911006094516EAF019C7ECA@umr-mail02.cc.umr.edu>
From: "Neulinger, Nathan R." <nneul@umr.edu>
To: "'arla-drinkers@stacken.kth.se'" <arla-drinkers@stacken.kth.se>
Subject: RE: frequent cache corruption with arla 0.21 on linux 2.2.1
Date: Thu, 4 Feb 1999 09:25:44 -0600 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2232.9)
Content-Type: text/plain;
	charset="ISO-8859-1"
Sender: owner-arla-drinkers@stacken.kth.se
Precedence: bulk

> cd $objdir/tests
> make
> WORKDIR=/afs/my.cell/i/have/lots/of/space/here ./run-tests -all

The tests where you do "sh <whatever flags> /path/to/compiled/executable"
will not work (at least on linux). You have to do sh -c if it's a binary, if
you don't, it tries to read in the file.

-----

I've been having some really serious stability problems with arla the past
few days.

This is w/ kernel 2.2.1, arla 0.21+setgroups, egcs/pgcc 1.1.1 build (no
optimization on arla). 

Here are a few examples:
	1) Created a file, and it immediately shows up, but it reports "no
such file or directory" when doing ls. (Same as if you mkm a non-existent
volume.) Problem is, the file can't be removed or recreated. 

	2) sigpending lied - I am getting so many of these that it is
becoming a performance problem. The login time for an AFS account is
significantly different when these are occurring.

	3) Losing contact with file servers during heavy afs activity
(compiles). 

	4) Complete lockups and oops's: In some cases, the lockups appear to
be an extreme case of sigpending lied. In others, they are oopses.

Here's one of them after ksymoops:

Unable to handle kernel NULL pointer dereference at virtual address 00000400
current->tss.cr3 = 0f2f8000, `r3 = 0f2f8000
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<d08251fc>]
EFLAGS: 00010202
eax: 00000400   ebx: cd79cd30   ecx: cd6bfa1c   edx: 00000000
esi: 00000000   edi: cd6cd8f0   ebp: cedaa6b0   esp: cf2fbf20
ds: 0018   es: 0018   ss: 0018
Process arlad (pid: 355, process nr: 23, stackpage=cf2fb000)
Stack: 00000000 cf1ca014 cd6cd8b8 cd6cd8f0 d082527c cd79cd50 00000000
d08252c7
       cd713020 cef60000 00000000 d0823c79 00000000 cef60000 0000001c
cef60000
       0000001c d08235a9 00000000 cef60000 0000001c cf1ca000 ffffffea
cf3cfb64
Call Trace: [<d082527c>] [<d08252c7>] [<d0823c79>] [<d08235a9>] [<c0124d56>]
[<d
0823540>] [<c01089dc>] [<c010002b>]
Code: c7 00 00 00 00 00 8b 44 24 10 83 c0 30 39 c5 0f 85 3b ff ff

>>EIP: d08251fc <clear_all_childs+120/158>
Trace: d082527c <xfs_invalid_xnode+48/50>
Trace: d08252c7 <xfs_message_invalidnode+43/60>
Trace: d0823c79 <xfs_message_receive+101/130>
Trace: d08235a9 <xfs_devwrite+69/e0>
Trace: c0124d56 <sys_write+ea/110>
Trace: d0823540 <xfs_devwrite+0/e0>
Trace: c01089dc <system_call+34/38>
Trace: c010002b <startup_32+2b/11e>
Code:  d08251fc <clear_all_childs+120/158>     00000000 <_EIP>:
Code:  d08251fc <clear_all_childs+120/158>        0:    c7 00 00 00 00  movl
$0x0,(%eax)
Code:  d0825201 <clear_all_childs+125/158>        5:    00
Code:  d0825202 <clear_all_childs+126/158>        6:    8b 44 24 10     movl
0x10(esp,1),%eax
Code:  d0825206 <clear_all_childs+12a/158>        a:    83 c0 30        addl
$0x30,%eax
Code:  d0825209 <clear_all_childs+12d/158>        d:    39 c5           cmpl
%eax,%ebp
Code:  d082520b <clear_all_childs+12f/158>        f:    0f 85 3b ff ff  jne
ffff50 <_EIP+0xffff50> d182514c <END_OF_CODE+ffa810/????>
Code:  d0825210 <clear_all_childs+134/158>       14:    00




-- Nathan

home help back first fref pref prev next nref lref last post