[670] in arla-drinkers

home help back first fref pref prev next nref lref last post

Re: shared libs in AFS

daemon@ATHENA.MIT.EDU (Magnus Ahltorp)
Fri Mar 5 17:15:44 1999

From owner-arla-drinkers@stacken.kth.se Fri Mar 05 22:15:42 1999
Return-Path: <owner-arla-drinkers@stacken.kth.se>
Delivered-To: arla-drinkers-mtg@bloom-picayune.mit.edu
Received: (qmail 9700 invoked from network); 5 Mar 1999 22:15:41 -0000
Received: from unknown (HELO sundance.stacken.kth.se) (130.237.234.41)
  by bloom-picayune.mit.edu with SMTP; 5 Mar 1999 22:15:40 -0000
Received: (from majordom@localhost)
	by sundance.stacken.kth.se (8.8.8/8.8.8) id XAA26079
	for arla-drinkers-list; Fri, 5 Mar 1999 23:09:45 +0100 (MET)
Received: from squid.pdc.kth.se (squid.pdc.kth.se [130.237.221.65])
	by sundance.stacken.kth.se (8.8.8/8.8.8) with ESMTP id XAA26075
	for <arla-drinkers@stacken.kth.se>; Fri, 5 Mar 1999 23:09:41 +0100 (MET)
Received: (from d95-mah@localhost)
	by squid.pdc.kth.se (8.8.7/8.8.7) id XAA03711;
	Fri, 5 Mar 1999 23:09:22 +0100 (MET)
To: Dave Morrison <dave@bnl.gov>
Cc: arla-drinkers <arla-drinkers@stacken.kth.se>
Subject: Re: shared libs in AFS
References: <36DFF8E9.84D3B9AE@bnl.gov>
From: Magnus Ahltorp <map@stacken.kth.se>
Date: 05 Mar 1999 23:09:22 +0100
In-Reply-To: Dave Morrison's message of "Fri, 05 Mar 1999 10:31:53 -0500"
Message-ID: <ixdsobjmvvx.fsf@squid.pdc.kth.se>
Lines: 40
X-Mailer: Gnus v5.6.45/Emacs 19.34
Sender: owner-arla-drinkers@stacken.kth.se
Precedence: bulk

> When arla is first started on the machine, all is well, and we can
> see and load the libraries.  After a time (few hours or so), an odd
> thing happens - I can still ls each of the shared libraries, I can
> cat each of them to /dev/null, and I can cp each out of AFS.  If I
> retrieve one of the libraries from AFS by way of a different machine
> running Transarc's AFS and compare it to one retrieved using arla
> they seem to be byte for byte identical.  However, running ldd on
> our main application reports that some of the shared libraries can't
> be found.

This may well be an arla problem. I haven't dug into the problem with
reference counts when mapping files. It should work, but I have never
done any tests to investigate if it does. I will try some experiments,
and I will come back with results.

It does look like a problem where a file has been thrown away by arla
and can be seen in the normal ways but an old bogus copy is still
referenced somewhere. It could be a linux problem, but I would not say
that it is before I have eliminated all arla suspicions.

> If anyone has tips on how I could diagnose the problem, how I could
> collect good debug info or something, I'd be very grateful.

These are not things I expect you to do, but if any of this feels like
something you (or anyone else) would be able to do, I would be most
grateful:

1) Debugging the dynamic linking process to find out what data it
really looks at would be most helpful, but I don't have the slightest
clue as to how this might be done. strace might work when doing ldd,
but then you won't catch the memory accesses.

2) A behaviour analysis of the whole system when this happens. xfs
debugging might help here. The key points are when xfs_d_delete is
called (especially when it says "no references, tell arlad") and when
a new file is requested through xfs calling arlad with "getdata" and
arlad responds with "installdata".

/Magnus
map@stacken.kth.se

home help back first fref pref prev next nref lref last post