[59803] in SAPr3-news

home help back first fref pref prev next nref lref last post

JCO Solaris driver crashes with segmentation fault

daemon@ATHENA.MIT.EDU (Sven Vermeulen)
Thu Aug 2 10:57:50 2007

To: sapr3-news@mit.edu
Date: Thu, 02 Aug 2007 14:57:41 -0000
From: Sven Vermeulen <sven.j.vermeulen@gmail.com>
Message-ID: <1186066661.023246.7120@i38g2000prf.googlegroups.com>

Hi all

Our Java application connects to SAP through the JCO drivers on Sun
Solaris 10. For this, we have the libsapjcorfc.so and librfcccm.so
available in the LD_LIBRARY_PATH (of course). Our connection works (we
see some traffic from/to the SAP server when we take a network trace)
but suddenly fails due to a JVM crash.

The JVM crash information tells us that the segmentation fault (the
reason for the cras) is in the native library called by the JVM:

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0xfed0452c, pid=16861, tid=10
#
# Java VM: Java HotSpot(TM) Client VM (1.5.0_06-b05 mixed mode,
sharing)
# Problematic frame:
# V  [libjvm.so+0x10452c]
#


The stack information yields:

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
C=native code)
V  [libjvm.so+0x10452c]
C  [libsapjcorfc.so+0x8210]

When I look at the so files itself the position in which the crash
occurs is:
for libjvm.so: jni_GetStringLength
for libsapjcorfc.so: getStringChars

The strange thing is, we have the same application running on
different servers which connect to different SAP servers, and not all
applications crash. The one that doesn't crash (yet) is production
(all the rest is non-production). We didn't find any substantial
difference between those servers and the one that doesn't crash: we
checksummed all .so files (no difference), checked the LD_LIBRARY_PATH
on all systems (no difference), checksummed the entire JVM tree (no
difference).

I think that it is string convertion related (non-unicode / unicode?)
but our SAP people couldn't confirm that this is a difference between
the SAP servers.

We tried this with a 1.4.2 and a 1.5 JVM and with the latest
JCO .so/.jar files but without any success.

When I truss the connection, I can't find any real reason for this
crash:

17195/109:      8393825.3913    lwp_create()    (returning as new
lwp ...)      = 0
17195/109:      8393825.3913    context(3, 0xFE2C7C88)
17195/109:      8393825.3914
lwp_self()                                      = 109
17195/109:      8393825.3914
schedctl()                                      = 0xFEBD62B0
17195/109:      8393825.3914    priocntlsys(1, 0xAF87FE7C, 3,
0xAF87FF14, 0)    = 17195
17195/109:              op=POP_AND  ltyp=P_LWPID lid=109  rtyp=P_ALL
rid=0
17195/109:      8393825.3915    lwp_sigmask(3, 0x00000004,
0x00000000)          = 0xFFBFFEFF [0x0000FFFF]
17195/109:      8393825.3915    lwp_sigmask(3, 0x00000004,
0x00000000)          = 0xFFBFFEFF [0x0000FFFF]
17195/109:      8393825.3916    mprotect(0xAF800000, 24576,
0x0000)             = 0
17195/109:      8393825.3918
times(0xAF87F008)                               = 838928360
17195/109:              utim=2740   stim=205    cutim=0
cstim=0      (HZ=100)
17195/109:      8393825.3921
times(0xAF87F008)                               = 838928360
17195/109:              utim=2740   stim=205    cutim=0
cstim=0      (HZ=100)
17195/109:      8393825.3922
times(0xAF87F008)                               = 838928360
17195/109:              utim=2740   stim=205    cutim=0
cstim=0      (HZ=100)
17195/109:      8393825.3922        Incurred fault #6, FLTBOUNDS  %pc
= 0xFECE1EA0
17195/109:            siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
17195/109:      8393825.3923        Received signal #11, SIGSEGV
[caught]
17195/109:            siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000


The program counter points to the jni_GetStringLength method inside
libjvm.so, btw.

I'm hoping someone here sees this as "oh yes, we had that too, but ...
solved it" because we are almost out of options. The one remaining
thing we're going to do is to run a small test application (have it
already) that crashes on the SAP servers (except the production one)
against the production server - but that's of course something our
process manager isn't happy to allow, even though it shouldn't hurt.

Wkr,
  Sven Vermeulen


home help back first fref pref prev next nref lref last post