[1174] in NetBSD-Development


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

AFS server discoveries, client bugs

daemon@ATHENA.MIT.EDU (Greg Hudson)
Wed Jan 3 05:36:29 1996

Date: Wed, 3 Jan 1996 05:35:49 -0500
From: Greg Hudson <ghudson@MIT.EDU>
To: sipb-afsreq@MIT.EDU, netbsd-afs@MIT.EDU
Cc: linux-dev@MIT.EDU

linux-dev people: due to a vos bug we discovered tonight affecting
little-endian clients, the volume linux.slackware.nb was destroyed.
The "nb" stands for "no backup."  Read on for details if you're
interested.  I have deleted the VLDB entry; I will recreate the volume
and take other steps to fix the problem tomorrow.

We reordered the SCSI chain on ronald-ann, and, to my surprise, it
seems to have worked for the moment; volume creations on ronald-ann
only take a couple of seconds now.  We'll have to try again in a few
days to see if it's a problem that occurs over time.

After testing volume creations, I attempted to move linux.slackware.nb
onto ronald-ann2:/vicepg.  This move failed immediately because it was
unable to create the destination volume due to a turd left over on
ronald-ann2:/vicepg from the move that failed yesterday.  (I apologise
for the lack of forethought in this regard.)  Due to a bug in vos, vos
decided to clean up the source partition rather than the target
partition, and therefore nuked the source linux.slackware.nb volume.

After reproducing the problem in the zone cell, we discovered that the
bug was a failure to convert the address of a VLDB reply from host
byte order to network byte order before comparing it to another
network-byte-order address.  (Technically, comparisons in network byte
order probably aren't guaranteed to work, but this is how the AFS code
does things and I'm not prepared to change it.)  It thus concluded
that the volume was no longer located on the source partition, and
cleaned up the source partition.  I fixed the bug by applying the
following patch:

	*** 1.2 1994/07/24 01:38:35
	--- vsprocs.c   1996/01/03 10:09:23
	***************
	*** 1534,1539 ****
	--- 1534,1540 ----
	                fflush(STDOUT);
	                goto done;
	        }
	+       MapHostToNetwork(&entry);

	        /* is this a new RW entry? */

Since this bug can cause serious data loss, it should probably be
forwarded to MIT and Transarc.  I will do that tomorrow, hopefully, if
no one else does.

There is still a turd on ronald-ann2:/vicepg, and I think the volume
header will have to be deleted from /vicepg by hand (salvages have not
removed it).


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

[1174] in NetBSD-Development

AFS server discoveries, client bugs

daemon@ATHENA.MIT.EDU (Greg Hudson)Wed Jan 3 05:36:29 1996

daemon@ATHENA.MIT.EDU (Greg Hudson)
Wed Jan 3 05:36:29 1996