[26904] in Athena Bugs

home help back first fref pref prev next nref lref last post

Re: 9.5.27 linux: firefox lockfile problems

daemon@ATHENA.MIT.EDU (Robert Basch)
Wed Jul 26 18:14:32 2006

In-Reply-To: <20060726192719.GK661@multics.mit.edu>
Mime-Version: 1.0 (Apple Message framework v752.2)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <BE6EB788-CD83-4C60-94C8-305A03F59527@mit.edu>
Content-Transfer-Encoding: 7bit
From: Robert Basch <rbasch@MIT.EDU>
Date: Wed, 26 Jul 2006 18:14:16 -0400
To: John Hawkinson <jhawk@MIT.EDU>
X-Spam-Score: 3.548
X-Spam-Level: *** (3.548)
X-Spam-Flag: NO
Cc: bugs@MIT.EDU, samira@MIT.EDU
Errors-To: bugs-bounces@MIT.EDU

On Jul 26, 2006, at 3:27 PM, John Hawkinson wrote:

>     User samira came by SIPB reporting problems with Firefox and
> lockfiles. She seems to get the firefox lockfile dialog at startup
> (refusing to continue and only allowing her to exit), requiring
> removing the lockfile by hand. Furthermore, these problems seem to
> happen repeatedly.

Starting with firefox 1.5.0, firefox now uses fcntl()-style locking on
the profile directory, and presents this error dialog when the
lock cannot be acquired, but a running instance cannot be found.
Our wrapper tries to deal with the case where an instance is
running on another machine, stale locks, etc.
>
>     I removed a lockfile (found from "find .mozilla -name lock"),
> firefox still started fine

.parentlock is the file that firefox now locks, via fcntl(), though it
still maintains the old "lock" symlink too.  The wrapper tries to work
around an AFS bug -- in which the fcntl() lock is not released
upon process termination -- by just deleting .parentlock when
the lock symlink does not exist.

>     The output of "sh -x `which firefox`" follows... I didn't feel
> inclined to delve into whether/where/how testlock is failing, but
> looks like it's looking for a .parentlock when it should be looking
> for "lock" (?).

No, I believe testlock is working correctly here; .parentlock is
the file that firefox actually tries to lock with fcntl() (see above).
>
>     Also, there doesn't seem to be a manpage for testlock.

True.  I considered its only purpose to be as described, i.e. a
utility for the wrapper to invoke, not something users would
be invoking explicitly.  And there is no firefox man page to
which it could refer....

> I realize
> it's a 51-line C program, but it's frustrating to have to go to the
> source to figure out what it is doing. Similarly, the comment in
> firefox.sh isn't very insightful ("testlock is used to test whether
> the profile directory's lock file is actually locked.") -- something
> more helpful, like "testlock opens the specified file and checks for
> an fcntl lock with fcntl(, F_GETLK,)" would have saved me some time.

I do appreciate your diagnostic efforts here, but maybe it would
be more useful for me to write up a separate document with
debugging tips, if lock file problems persist.

> Also, is there a good reason why the file is opened
> O_RDWR instead of O_RDONLY?

No, that was left over from an early version, where I thought it
would try to acquire the lock, but I decided to have it only test
the lock.  I will submit a patch to correct this.

> + /usr/lib/firefox/mozilla-xremote-client -u samira -a firefox 'ping 
> ()'
> + case $? in
> + found_running=error

This led me to one problem, though unfortunately not the more
serious underlying problem.  The wrapper does not set up the
environment properly for mozilla-xremote-client, so will not find
a running instance.  (In this case, it will fall through to invoking
firefox, and the latter is supposed to find the running instance).
I will submit a patch to fix this.

> ++ /usr/athena/bin/testlock /afs/athena.mit.edu/user/s/a/ 
> samira/.mozilla/firefox/3nob9mk6.default/.parentlock
> + lock_pid=15304
> + '[' 2 -eq 2 ']'
> + '[' 15304 -eq 0 ']'
> + kill -0 15304
> + :
> + exec /usr/lib/firefox/firefox

This suggests that there really is a firefox process already running
on the local machine, but that it cannot be contacted for some
reason.  Could there have been a previously running instance
that hung?  (I am also investigating problems causing the Linux
version to hang).

Thanks for your help in debugging this.

Bob

home help back first fref pref prev next nref lref last post