[1657] in SIPB_Linux_Development

home help back first fref pref prev next nref lref last post

Linux-AFS Cache Corruption Testing

daemon@ATHENA.MIT.EDU (Derek Atkins)
Thu Apr 17 18:55:56 1997

To: nygren@MIT.EDU, tytso@MIT.EDU
Cc: linux-dev@MIT.EDU, linux-afs-bugs@MIT.EDU
From: Derek Atkins <warlord@MIT.EDU>
Date: 17 Apr 1997 18:55:32 -0400

I've been testing the Linux-AFS Cache Corruption patches on
Cutter-John by trying to thrash the AFS Cache.  I've been running the
following:

	(attach source; cd /source; tar cf /dev/null .)
	(cd /afs/sipb/project/afs/src; tar cf /dev/null .)
	(attach sipb-athena; cd /mit/sipb-athena; tar cf /dev/null .)
	(cd /afs/sipb/project/afs/src/sipb-3.3a/src; cvs -q diff -c)

So far in my testing, I have received a grand total of zero
syslog messages.  This means that none of the failure modes
that we are testing are being hit.  However, the patches do
other things that don't have associated printk's, so I feel
that this is an inconclusive test.

I have not noticed any cache corruption (which doesn't mean
it isn't happening).  It's possible that the patches change
some timing, or perhaps something that we do which we don't
syslog is fixing the problem.  For example, it could be the
inode ref count in clear_inode() is solving the problem.  I
don't know for sure.

In any case, I encourage people to apply this patch to your
machines.  It should patch cleanly into most 2.0.x systems.
This patch is a conglomeration of Ted's and Erik's patches,
and is based on Linux 2.0.18.  Please test this out and let
me know...  If I don't hear any responses, I'm going to let
people on linux-announce and linux-help (and maybe even the
linux-afs) lists know about it....

-derek

--- fs/inode.c.orig	Wed Jul 24 00:03:07 1996
+++ fs/inode.c	Thu Apr 17 13:18:50 1997
@@ -13,6 +13,7 @@
 #include <asm/system.h>
 
 #define NR_IHASH 512
+#define AFS_CACHE_CORRUPTION_DEBUG
 
 /*
  * Be VERY careful when you access the inode hash table. There
@@ -147,7 +148,17 @@
 
 static inline void lock_inode(struct inode * inode)
 {
-	wait_on_inode(inode);
+#ifdef AFS_CACHE_CORRUPTION_DEBUG
+  if (inode->i_lock) 
+    {
+      wait_on_inode(inode);
+      if (inode->i_lock)
+	printk("VFS: would have had consistency problem in lock_inode dev %s nr %lu\n",
+	       kdevname(inode->i_dev), inode->i_ino);
+    }
+#endif
+	while (inode->i_lock)  /* I'm pretty sure this while isn't needed */
+		wait_on_inode(inode);
 	inode->i_lock = 1;
 }
 
@@ -173,8 +184,14 @@
 {
 	struct wait_queue * wait;
 
+	inode->i_count++;
+#ifdef AFS_CACHE_CORRUPTION_DEBUG
+	if (inode->i_lock)
+	  printk("VFS: inode already locked in clear_inode with dev %s nr %lu\n",
+		 kdevname(inode->i_dev), inode->i_ino);
+#endif
+	lock_inode(inode);
 	truncate_inode_pages(inode, 0);
-	wait_on_inode(inode);
 	if (IS_WRITABLE(inode)) {
 		if (inode->i_sb && inode->i_sb->dq_op)
 			inode->i_sb->dq_op->drop(inode);
@@ -182,10 +199,12 @@
 	remove_inode_hash(inode);
 	remove_inode_free(inode);
 	wait = ((volatile struct inode *) inode)->i_wait;
+	inode->i_count--;
 	if (inode->i_count)
 		nr_free_inodes++;
 	memset(inode,0,sizeof(*inode));
 	((volatile struct inode *) inode)->i_wait = wait;
+	wake_up(&inode->i_wait);  /* The memset clears the lock (???) */
 	insert_inode_free(inode);
 }
 
@@ -251,7 +270,12 @@
 		inode->i_dirt = 0;
 		return;
 	}
-	inode->i_lock = 1;	
+#ifdef AFS_CACHE_CORRUPTION_DEBUG
+	if (inode->i_lock)
+	  printk("VFS: inode already locked in write_inode dev %s nr %lu\n",
+	       kdevname(inode->i_dev), inode->i_ino);
+#endif
+	inode->i_lock = 1;/* lock_inode would be cleaner but not needed here */
 	inode->i_sb->s_op->write_inode(inode);
 	unlock_inode(inode);
 }
@@ -457,7 +481,12 @@
 			/* Here we can sleep also. Let's do it again
 			 * Dmitry Gorodchanin 02/11/96 
 			 */
-			inode->i_lock = 1;
+#ifdef AFS_CACHE_CORRUPTION_DEBUG
+		        if (inode->i_lock)
+			  printk("VFS: inode already locked in iput dev %s nr %lu\n",
+				 kdevname(inode->i_dev), inode->i_ino);
+#endif
+			lock_inode(inode);  /* I bet the problem was here */
 			inode->i_sb->dq_op->drop(inode);
 			unlock_inode(inode);
 			goto repeat;
@@ -512,6 +541,13 @@
 		sleep_on(&inode_wait);
 		goto repeat;
 	}
+found_good:  /* The poor location of this could also have caused problems */
+#ifdef AFS_CACHE_CORRUPTION_DEBUG
+	if (best->i_lock)
+	  printk("VFS: inode locked in get_empty_inode dev %s nr %lu\n",
+		 kdevname(inode->i_dev), inode->i_ino);
+#endif
+
 	if (best->i_lock) {
 		wait_on_inode(best);
 		goto repeat;
@@ -522,7 +558,7 @@
 	}
 	if (best->i_count)
 		goto repeat;
-found_good:
+	
 	clear_inode(best);
 	best->i_count = 1;
 	best->i_nlink = 1;


-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/      PP-ASEL      N1NWH
       warlord@MIT.EDU                        PGP key available

home help back first fref pref prev next nref lref last post