[57] in DeathTongue Changes
Re: Possible AFS file corruption...
daemon@ATHENA.MIT.EDU (Richard Basch)
Wed Mar 24 06:55:07 1993
Date: Wed, 24 Mar 93 06:54:09
From: probe@MIT.EDU (Richard Basch)
To: eichin@Athena.MIT.EDU ("Mark W. Eichin")
Cc: bug-sparc@MIT.EDU, licks.discuss@charon.MIT.EDU
From: eichin@Athena.MIT.EDU ("Mark W. Eichin")
Message-Id: <9303240540.AA19731@tsx-11.MIT.EDU>
I just found an RMAIL folder (which I had saved messages to from DT)
with blocks of NULs scattered about it. The file itself is only 55K
long.
This may have been due to the local disk on DT being short on space
(/var filled up more than once, though I don't think anything else
did) but I'd suggest that people keep an eye out...
Certainly, if /var fills up, that can cause cache corruption and
problems with re-writing the file, in exactly the manner you describe,
in fact. The AFS client code doesn't deal well with the local disk
cache filling up (this is supposedly being fixed in AFS 3.3).
I looked around on deathtongue, and found that the information in
the "cacheinfo" file is dangerously close to the partition limits.
If a lot is added to /var on deathtongue, the cacheinfo file should
be updated. In the meantime, I have set it down to a 30000 cache.
I know that the AFS cache starts behaving incorrectly at the 100% mark,
even if there is a 10% overflow allowance in the filesystem. The Athena
kernel fixes this problem, but I know the problem still existed in
Transarc's AFS 3.2. (The Athena version fails to do the right thing
when no blocks are left in the partition, but the Transarc version fails
in this case also, if it doesn't fail first at the 100% mark.)
Unfortunately, since I said it starts failing at the 100% mark, it may
be hard to correlate this with syslogs on the machine, since those only
happen when no blocks are available.
Problems for "bug-sparc":
- The cachesize is not dynamically determined at bootup, as is done
on all of the other platforms. I suggest using the algorithm in
use on the VAX/RT/DECstation, where a separate cache partition is
not in use. Overflowing a cache partition is KNOWN to cause problems
with AFS data integrity (on all platforms).
-Richard