[5792] in testers

home help back first fref pref prev next nref lref last post

frequent Mozilla hangs at startup on 9.3 Solaris

daemon@ATHENA.MIT.EDU (Garry Zacheiss)
Tue Apr 20 03:40:49 2004

Message-Id: <200404200740.i3K7ekm7006220@brad-majors.mit.edu>
To: testers@MIT.EDU
Date: Tue, 20 Apr 2004 03:40:46 -0400
From: Garry Zacheiss <zacheiss@MIT.EDU>

Since updating my Sun to 9.3, I've started seeing frequent Mozilla hangs
at startup.  On the occasions where I don't get a hang at startup, I
find that going to any https site that will pop up the dialog box for
your certificate password frequently does.

Poking around with truss, I see:

5823/1:         read(3, "\0\0\007\0\b\00E\0\0\0\0".., 3060)     = 3060
5823/1:             Incurred fault #6, FLTBOUNDS  %pc = 0xFF2C6DCC
5823/1:               siginfo: SIGSEGV SEGV_MAPERR addr=0x011F2820
5823/1:             Received signal #11, SIGSEGV [caught]
5823/1:               siginfo: SIGSEGV SEGV_MAPERR addr=0x011F2820
5823/1:         sigprocmask(SIG_SETMASK, 0xFFBFD834, 0x00000000) = 0

so we're getting a SIGSEGV from somewhere.

Following up with dbx, I see:

(dbx) where
current thread: t@1
  [1] __lwp_park(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff3658f4
  [2] mutex_lock_queue(0xff378b44, 0x0, 0xff3405a0, 0xff378000, 0x0, 0x0), at 0xff36166c
  [3] slow_lock(0xff3405a0, 0xff250000, 0x0, 0xff33c000, 0x0, 0x0), at 0xff36206c
  [4] free(0x2195e0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff2c7a24
  [5] nsProfileLock::Unlock(0x13b270, 0x0, 0x0, 0x0, 0x0, 0x1), at 0xfc14c174
  [6] nsProfileLock::RemovePidLockFiles(0x13b270, 0x1b8, 0x0, 0xfc16b0b8, 0x1fa9c, 0x0), at 0xfc14b648
  [7] nsProfileLock::FatalSignalHandler(0xb, 0x1fa08, 0xa, 0x0, 0x0, 0xfc16c76c), at 0xfc14b6cc
  [8] __sighndlr(0xb, 0x0, 0xffbfd888, 0xfc14b6a4, 0x0, 0x0), at 0xff365b0c
  [9] call_user_handler(0xb, 0x0, 0xffbfd888, 0x0, 0x0, 0x0), at 0xff35f804
  [10] sigacthandler(0xb, 0x0, 0xffbfd888, 0x85b9f8, 0x12aa28, 0x210c13), at 0xff35f9b4
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [11] realfree(0x12a9a0, 0x744f8, 0x400, 0xff33c000, 0x0, 0x881168), at 0xff2c7328
  [12] cleanfree(0x0, 0x2, 0xff34283c, 0x88, 0x12a9a0, 0xfbb6bc9c), at 0xff2c7b58
  [13] _malloc_unlocked(0x10, 0x0, 0xcf, 0xff33c000, 0xfef6c60c, 0xfef26750), at 0xff2c6c94
  [14] malloc(0x10, 0x683e8, 0x0, 0xfec1a000, 0x666f6375, 0x584e5661), at 0xff2c6b74
  [15] _XIMVaToNestedList(0xffbfddd0, 0x1, 0xffbfdd80, 0xff3f83bc, 0x2d4f1e8, 0x0), at 0xfebb1c54
  [16] XSetICValues(0x8ba638, 0xfe8dc3a0, 0x880025, 0x0, 0x0, 0x1c8390), at 0xfebdddd0
  [17] gdk_ic_real_set_attr(0x85b8b0, 0x878ef0, 0x4, 0xfe893ae0, 0x0, 0x8000000), at 0xfe8be348
=>[18] gdk_ic_set_attr(0x85b8b0, 0xffbfdf40, 0x4, 0x881140, 0xfbb59820, 0x210c13), at 0xfe8bf8e4
  [19] gdk_im_begin(0x85b8b0, 0x1c8390, 0x400, 0x23c, 0x0, 0x881168), at 0xfe8bc6a0

etc.  Checking for a deadlock, I see:

(dbx) thread -blockedby t@1
Thread t@1 is blocked by:
__malloc_lock (0xff3405a0): thread  mutex(locked)
Lock owned by t@1

and from the stack trace it seems pretty clear that some malloc called
from the main thread is dying, and calling a signal handler which tries
to free something, which blocks forever trying to acquire the mutex the
malloc we were called from is holding.  The handler in question is
FatalSignalHandler in:

third/mozilla/profile/dirserviceprovider/src/nsProfileLock.cpp

which calls Unlock() in the same source file, which is the function that
calls free().

Fixing all that isn't very interesting, though, since presumeably all it
would accomplish is having the browser successfully crash, which, while
less confusing, still isn't very useful.

The function that's calling the malloc that crashes is
_XIMVaToNestedList, which is a libX11 function, but I really doubt
that's where the problem lies.

At this point, I threw my hands up and went to a different machine to
use a web browser.  Let me know if you need more information about any
of this.

Garry

home help back first fref pref prev next nref lref last post