[8510] in Info-AFS_Redistribution

home help back first fref pref prev next nref lref last post

Re: Sun's 'logging' mount option -- our findings -- RUN AWAY

daemon@ATHENA.MIT.EDU (Jeff Blaine)
Wed Feb 21 10:30:44 2001

Date: Wed, 21 Feb 2001 10:22:59 -0500
From: Jeff Blaine <jblaine@linus.mitre.org>
To: Harald Barth <haba@pdc.kth.se>
Cc: info-afs@transarc.com
Message-Id: <419715204.982750979@jblaine-pc.mitre.org>
In-Reply-To: <20010221113046M.haba@pdc.kth.se>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

> your story is quite similar to what we experienced in the
> stacken.kth.se cell. It cost some nights, and I got quite upset.
> Fortunately we only had one file server enabled and it was not as
> severe. I "only" had badly corrupted volumes, but no salvager
> segfaults.

Agh!  Mail the list next time with your findings, pleeeeeeease :)

> 3) The salvager is not able to clean out broken backup volumes all the
> time. The only way to "fix" these is to "vos zap -force" them.

Haha, it's as if you were watching over my shoulder over the last 4 days!
That we have the _same_ exact symptoms proves to me that this was not
any sort of fluke on our part.  I hope anyone reading this with logging
turned on has good backups.

> 4) vos backup is not able to overwrite such a corrupted backup volume.
> You have to be very observant to check that you really get backup
> volumes for all volumes when you do vos backupsys.

In our case, the output of our vos backupsys is emailed to us.  The
errors got shrugged off for 3 days.  Lesson learned.  Stupid, stupid,
stupid.

> 5) "vos backup volumename ; vos dump volumename.backup" seems to be a
> reasonable check that your volumes are better again after you tried to
> cure them with the salvager.

The more general problem we had was that with a corrupt backup volume in
place, cloning or re-cloning of the RW is not possible.  This causes the
problem you mentioned above (vos backup) and is also found when trying to
do a 'vos release'.

> I don't understand why the annoying overwrite of log files hasn't been
> fixed a long time ago.  We can afford a big /usr/afs/logs. Write all
> log files with date (SalvageLog.20010221.111059.log or something).
> Never overwrite logfiles.

At someone else's prodding, I looked into this further.  The person pointed
out that once salvager is entirely done, /usr/afs/logs/SalvageLog contains
everything that was in the SalvageLog.NNN files.  Good.

Should you run salvager again, at all, you lose this data.  Horrible.


home help back first fref pref prev next nref lref last post