[59803] in SAPr3-news
JCO Solaris driver crashes with segmentation fault
daemon@ATHENA.MIT.EDU (Sven Vermeulen)
Thu Aug 2 10:57:50 2007
To: sapr3-news@mit.edu
Date: Thu, 02 Aug 2007 14:57:41 -0000
From: Sven Vermeulen <sven.j.vermeulen@gmail.com>
Message-ID: <1186066661.023246.7120@i38g2000prf.googlegroups.com>
Hi all
Our Java application connects to SAP through the JCO drivers on Sun
Solaris 10. For this, we have the libsapjcorfc.so and librfcccm.so
available in the LD_LIBRARY_PATH (of course). Our connection works (we
see some traffic from/to the SAP server when we take a network trace)
but suddenly fails due to a JVM crash.
The JVM crash information tells us that the segmentation fault (the
reason for the cras) is in the native library called by the JVM:
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# SIGSEGV (0xb) at pc=0xfed0452c, pid=16861, tid=10
#
# Java VM: Java HotSpot(TM) Client VM (1.5.0_06-b05 mixed mode,
sharing)
# Problematic frame:
# V [libjvm.so+0x10452c]
#
The stack information yields:
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
C=native code)
V [libjvm.so+0x10452c]
C [libsapjcorfc.so+0x8210]
When I look at the so files itself the position in which the crash
occurs is:
for libjvm.so: jni_GetStringLength
for libsapjcorfc.so: getStringChars
The strange thing is, we have the same application running on
different servers which connect to different SAP servers, and not all
applications crash. The one that doesn't crash (yet) is production
(all the rest is non-production). We didn't find any substantial
difference between those servers and the one that doesn't crash: we
checksummed all .so files (no difference), checked the LD_LIBRARY_PATH
on all systems (no difference), checksummed the entire JVM tree (no
difference).
I think that it is string convertion related (non-unicode / unicode?)
but our SAP people couldn't confirm that this is a difference between
the SAP servers.
We tried this with a 1.4.2 and a 1.5 JVM and with the latest
JCO .so/.jar files but without any success.
When I truss the connection, I can't find any real reason for this
crash:
17195/109: 8393825.3913 lwp_create() (returning as new
lwp ...) = 0
17195/109: 8393825.3913 context(3, 0xFE2C7C88)
17195/109: 8393825.3914
lwp_self() = 109
17195/109: 8393825.3914
schedctl() = 0xFEBD62B0
17195/109: 8393825.3914 priocntlsys(1, 0xAF87FE7C, 3,
0xAF87FF14, 0) = 17195
17195/109: op=POP_AND ltyp=P_LWPID lid=109 rtyp=P_ALL
rid=0
17195/109: 8393825.3915 lwp_sigmask(3, 0x00000004,
0x00000000) = 0xFFBFFEFF [0x0000FFFF]
17195/109: 8393825.3915 lwp_sigmask(3, 0x00000004,
0x00000000) = 0xFFBFFEFF [0x0000FFFF]
17195/109: 8393825.3916 mprotect(0xAF800000, 24576,
0x0000) = 0
17195/109: 8393825.3918
times(0xAF87F008) = 838928360
17195/109: utim=2740 stim=205 cutim=0
cstim=0 (HZ=100)
17195/109: 8393825.3921
times(0xAF87F008) = 838928360
17195/109: utim=2740 stim=205 cutim=0
cstim=0 (HZ=100)
17195/109: 8393825.3922
times(0xAF87F008) = 838928360
17195/109: utim=2740 stim=205 cutim=0
cstim=0 (HZ=100)
17195/109: 8393825.3922 Incurred fault #6, FLTBOUNDS %pc
= 0xFECE1EA0
17195/109: siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
17195/109: 8393825.3923 Received signal #11, SIGSEGV
[caught]
17195/109: siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
The program counter points to the jni_GetStringLength method inside
libjvm.so, btw.
I'm hoping someone here sees this as "oh yes, we had that too, but ...
solved it" because we are almost out of options. The one remaining
thing we're going to do is to run a small test application (have it
already) that crashes on the SAP servers (except the production one)
against the production server - but that's of course something our
process manager isn't happy to allow, even though it shouldn't hurt.
Wkr,
Sven Vermeulen