[868] in SIPB_Linux_Development

home help back first fref pref prev next nref lref last post

Re: ext2 files getting unlinked under 1.1.7[23] and other things

daemon@ATHENA.MIT.EDU (Erik Nygren)
Tue Dec 20 21:41:42 1994

To: "Theodore Ts'o" <tytso@MIT.EDU>
Cc: nygren@MIT.EDU, jered@MIT.EDU, linux-afs-bugs@MIT.EDU, linux-dev@MIT.EDU,
        warlord@MIT.EDU, yonah@MIT.EDU
In-Reply-To: Your message of "Tue, 20 Dec 1994 20:43:55 +0500."
             <9412210143.AA25982@dcl.MIT.EDU> 
Date: Tue, 20 Dec 1994 21:41:33 -0500
From: Erik Nygren <nygren@MIT.EDU>


>    Perhaps this is somehow linked to the fact that on some systems, afsd
>    will not shutdown?
> 
> I don't think so....  Perhaps this is linked to the fact that you're not
> supposed to even try to shut afsd down?  This is listed as a KNOWN BUG
> in the Linux AFS readme file....
> 
> * afsd -shutdown doesn't work -- in fact, it will probably crash
> your machine.

Just on a odd related note: when I had afsd running off my root
partition, I had no problem with it not getting unmounted cleanly.
This is most likely because the root partition doesn't get
unmounted by umount -a anyways.  After the first problem,
I moved /usr/vice to it's own 16 MB partition and now
that partition doesn't unmount cleanly.  When it comes
back up, it generally has fsck errors.

> Well, let's try to correlate other patterns.  One of the things which I
> do is that I've got my cache set up as its own separate 30 meg
> partition.  Does anyone else do this? 

After my first disk problem, I did exactly this.  I removed
/usr/vice from my root partition and gave it its own 16 MB
partition (which doesn't get anywhere close to filling since
my cache is still set to 10 MB).  My more recent problems have
happened since then. 

Regarding the low memory conditions, the only time I've
seen anything is with kmalloc returning NULL at boot
either while scanning the cache or while doing the insmod.
BTW, I noticed there's a new version of modutils for 1.1.67.
Might this be relevant?  Anyways, is kmalloc the only
place where memory allocation really takes place?  I checked
through the ext2 sources and found that it's only
use of kmalloc dealt pretty well with it returning NULL.
However, in fs/locks.c, I noticed:

    /* Okay, let's make a new file_lock structure... */
    tmp = (struct file_lock *)kmalloc(sizeof(struct file_lock), GFP_KERNEL);
    tmp -> fl_owner = NULL;
    tmp -> fl_next = file_lock_free_list;
    tmp -> fl_nextlink = file_lock_table;
    file_lock_table = tmp;

Here, tmp is the file_lock_free_list that's getting created.
I don't know if this is the problem and I kind of doubt it,
but regardless, it's really ugly this dereferences a pointer
that's not guaranteed to be non-NULL.

In fs/exec.c, a new vm_area_struct gets created in create_tabes
with a kmalloc.  It checks to see if it's NULL but just seems
to not dereference it if it is.  I don't know if this is really the
best behavior.

A more likely place for problems may be with get_free_page and
__get_free_page which the filesystem code uses explicitly.
The fact that we're getting "attempt to free free page" errors
may imply that someone's not dealing properly with this.
Is there a way to find out where a function in the kernel was
called from?  This would be useful in the "free free page"
error and might give us a direction to look in.

I wish I had a Linux machine to sacrifice and debug this
on.  Unfortunately, I want this machine stable while I'm home
over break so I'm dropping back to 1.1.68.  I was still
getting messages about "attempt to free free page" back then
but never noticed any fs corruption.  Either no one noticed
it or something between patch68 and patch72 triggered a
dormant problem.

> Also, so far it seems to be Linux AFS specific.  If this is the
> case, perhaps we should take it off the
> linux-kernel@vger.rutgers.edu list.....

Unless I hear this happening on non-linux-afs systems,
I've taken linux-kernel@vger.rutgers.edu off the Cc: list.
We can report back there after we figure out what's going on.

Disclaimer: I'm starting to look at the kernel source for the first
time so I'm not clear on how everything works so much of what I say
are guesses based on how I think things might be designed to work.

	--- Erik

___________________________________________________________________________
Erik Nygren        \ \ \  Massachusetts Institute of Technology
450 Memorial Drive  \ \ \  Email: nygren@mit.edu  Voice: 617/225-9297
Cambridge, MA 02139  \ \ \  http://www.mit.edu:8001/people/nygren/home.html

home help back first fref pref prev next nref lref last post