[315] in Pthreads mailing list archive

home help back first fref pref prev next nref lref last post

Sockets and PThreads

daemon@ATHENA.MIT.EDU (Alex Tang)
Sun May 12 20:12:14 1996

To: pthreads@MIT.EDU
Date: Sun, 12 May 1996 19:49:51 -0400 (EDT)
Cc: altitude@cic.net
From: Alex Tang <altitude@cic.net>

Hi folks.

I've got an app that has 2 threads.  one of the threads (the child, I'll
call it "Thread 2"), is in a socket recv() loop, getting data from a
socket.  The other thread ("Thread 1") is performing tasks not related to
the socket for the most part.

Whoops, i should say that I'm using 2 different machines and having almost
the same exact problem:

Machine 1: Sparc5 running SunOS4.1.4.  PThreads-1-60_beta5, gcc/g++-2.7.2, 
Machine 2: UltraSparc running Solaris2.5, Solaris bundled PThreads (not
           solaris threads), gcc/g++-2.7.2
This is a C++ application

The problem occurs when I try to close the socket from within the Thread 1
When i call close() on the socket, my program hangs.  

Here's more detailed info:

Thread 2 has a start function called "RecvProc()" which looks like this:

RecvProc() {

        while(((cbPacket = recv(pcui->m_socket, (LPSTR) Buffer, 
                  sizeof(Buffer), NULL)) != 0) &&
                  (SOCKET_ERROR != cbPacket)) {
	    //processing...
        }
	//Cleanup
}

Most of the time is spent in the inner "receive" loop.  From my
understanding, when close() is called on the socket, the socket should
become invalid and the recv() function should pass back either -1 or 0.
Either of those return values should make the receive loop quit and the
thread should continue execution in the RecvProc() function.  

Most of the time that thread 2 is executing, Thread 1 is blocked.
Thread 1 becomes unblocked just as Thread 2's recieve loop finishes getting
all of it's data.  

When Thread1 calls close(), it seems like Thread 2 dissappears.  It never
gets to the Cleanup procedures at the end of RecvProc().   It is at this
point that the program hangs.

If I step through the debugger (gdb), when i get to the last recv() call,
I hit "n", and the program continues without stopping at the next call.

On machine 1, it hangs with the following stack trace:

(gdb) where
#0  0x169dc in machdep_sys_select ()
#1  0xfbc0 in fd_kern_wait ()
    at /usr/local/src/pthreads-1_60_beta5/pthreads/fd_kern.c:269
#2  0x13a80 in context_switch ()
    at /usr/local/src/pthreads-1_60_beta5/pthreads/signal.c:153
#3  0x13d5c in sig_handler (sig=26)
    at /usr/local/src/pthreads-1_60_beta5/pthreads/signal.c:264
#4  0x14174 in pthread_resched_resume (state=PS_FDLR_WAIT)
    at /usr/local/src/pthreads-1_60_beta5/pthreads/signal.c:511
#5  0xe50c in fd_basic_lock (fd=8, lock_type=3, mutex=0x2f180, timeout=0x0)
    at /usr/local/src/pthreads-1_60_beta5/pthreads/fd.c:348
#6  0xed24 in close (fd=8)
    at /usr/local/src/pthreads-1_60_beta5/pthreads/fd.c:643
#7  0x1a948 in CClientUI::Disconnect (this=0xeffff820)
    at /home/altitude/condor/condor-v2/common/clientui.cpp:347
#8  0x3da8 in main (argc=2, argv=0xeffff904)
    at /home/altitude/condor/condor-v2/cgi/main.cpp:200

The #7 is where the close() call is made.


On Machine 2, it hangs completely.  I have to do a SIGKILL to get my
terminal back.

I tried setting SO_LINGER on the socket so that it would abort as soon as
the socket was closed, but that didn't work.  

Here's some (perhaps) relavent info from truss on Machine 2:

	// This is the info about the socket as far as i can tell.
open("/dev/tcp", O_RDWR)            = 6
ioctl(6, I_PUSH, "sockmod")         = 0
ioctl(6, I_STR, 0xEFFFF320)         = 0
ioctl(6, I_SETCLTIME, 0xEFFFF3D4)       = 0
ioctl(6, I_SWROPT, 0x00000002)          = 0
ioctl(6, I_STR, 0xEFFFF238)         = 0
fcntl(6, F_GETFL, 0x00000000)           = 2

...

sigaction(SIGWAITING, 0xEF79585C, 0x00000000)   = 0
signotifywait()                 = 32
sigtimedwait(0xEF40FD98, 0x00000000, 0x00000000) = SIGWAITING
lwp_create(0xEF40FB18, 0x00C0, 0xEF109E30)  = 3
lwp_continue(3)                 = 0
lwp_create(0x00000000, 0, 0x00000000)       = 0
write(3, " T R A C E :   C C l i e".., 31)  = 31

	// This is where close() is getting called.  It gets wedged right
	// after the close() call.
getmsg(6, 0xEF20B858, 0xEF20B864, 0xEF20B844) (sleeping...)

	// Right now, it's wedged.  I have to SIGKILL it to get my shell 
	// back.
    *** process killed ***  
    			   
I must admit, my knowledge of sockets isn't the greatest,  but I thought
in a single threaded implementation, what i'm trying to do would be legal.
Is there something I'm missing b/c it's multi threaded?  

Thanks very much.  

...alex...



home help back first fref pref prev next nref lref last post