[540] in Pthreads mailing list archive

home help back first fref pref prev next nref lref last post

Signals in PThreads

daemon@ATHENA.MIT.EDU (Chris Colohan)
Mon Dec 23 15:36:42 1996

From: Chris Colohan <colohan@eecg.toronto.edu>
To: pthreads@MIT.EDU
Date: 	Mon, 23 Dec 1996 15:01:14 -0500

I am working on a particularly ambitious project using PThreads, and
have run into a number of problems with both the implementation of
signals in MIT PThreads and the standard itself.  All of the problems I have
encountered involve signal handling of synchronous signals.  In this
article I am going to outline those problems, and list the solutions I
have devised.  At the end, I am going to propose a set of changes to
MIT PThreads, and I hope to get some feedback on this, because unless
I hear some better ideas for solving my problems I will be
implementing them.  Hopefully my proposed solutions are compatible
extensions and changes to the library, and will be accepted for future
revisions to MIT PThreads.

First, a little background.  Handling of synchronous signals [1] in
most programs is simple.  If a program receives a synchronous signal,
it is broken, so the handling that is required is to either gracefully
quit the program or restart it.  Under PThreads this is supported.  A
thread can install a signal handler for itself which can perform
cleanup and gracefully terminate the program.  What is not supported
very well is programs that take action to correct the problem and
recover from the condition that caused the signal, or are just using
the signal as a mechanism for gathering information about a program.

To tell the truth, I am not that interested in the threading library
innards itself.  I am trying to port a distributed shared-memory
system to run under Linux.  Each node in a network of Linux PC's will
be running a compute process, and when it receives a request to do
some computation it will spawn off a thread to do it.  Each process on
each computer will have the ability to run multiple computation
threads at once, and hopefully will keep the processor as busy as
possible doing computation.

The tricky part comes in creating the illusion of shared memory.  This
illusion is created by mmap()ing memory in each computation process
and mprotect()ing the memory so that reads and writes can be detected.
If a process tries to access memory that is not actually on the local
machine, but rather on some other processor, then the operating system
generates the signal SIGSEGV.  This is delivered to the threading
subsystem, which then has to decide what to do with it.  Under the
current system, the signal is simply delivered to the signal handler
of the thread that caused the exception.  If there is no signal
handler, and the signal is blocked, then the results are not defined.
This is what I would like to happen:

1.  Receive the signal.

2.  Decide which thread caused the synchronous signal, what memory
    location was being accessed at the time, and what was being done to it
    (read or write).

3.  Stop the execution of the thread which caused the signal until the
    page is available for it's use.

4.  Lock the page table entry for the page in question, so that other
    threads will not simultaneously try to fetch the page.

5.  Send a request to a remote machine to fetch the page, and let
    other computation threads proceed in the meantime.

6.  When the page is retrieved, allow the computation thread to retry
    the offending instruction, unlock the page table, and continue where
    we left off.

Now, this looks like I am asking a lot.  Under MIT PThreads, the only step
I don't have any problems doing is step 1.  In a non-threaded program,
the entire process can be done without too much hassle.

I have boiled the problems I have been finding down to the following:

1.  Synchronous signals can not be trapped and corrected.  Since a
synchronous signal is directed towards the thread which generated it,
there is no way of doing a sigwait() for the signal, since the signal
will never be generated inside the sigwait() call, and execution can
not continue within a thread to the next sigwait() call once the
signal is raised.  Since signal handlers can not use pthreads_*()
calls, and therefore can not interact with shared data, they are not
very useful.  What is needed is a mechanism for using sigwait() to
wait for synchronous signals in a separate thread.

2.  A signal handler/sigwait thread does not receive any information
about the context in which a synchronous signal was raised.  Although
there is no portable way of getting a signal's context, a number of
operating systems pass a sigcontext_struct [2] on the stack to the
signal handler that contains information such as the stack pointer,
instruction pointer, and status registers at the time of the signal.
This extended information needs to be made available to threads
receiving signals as well.

=== * ===
[1] Under Linux, the synchronous signals are:
SIGILL    Illegal Instruction
SIGTRAP   Trace/breakpoint trap
SIGFPE    Floating point exception
SIGSEGV   Segmentation violation
SIGSTKFLT Stack fault on coprocessor
SIGPIPE   Write to pipe with no readers

[2] On various platforms, this has different names.  Here are some of
the variants taken from the code I am working on:
void 
#if defined(BSD_HP300) || defined(BSD_HP800) || defined(HPUX)
segv_handler(int sig, int code, struct sigcontext *scp)
#endif 
#if defined(SUNOS) 
segv_handler(int sig, int code, struct sigcontext *scp, char *a)
#endif 
#ifdef IRIX
segv_handler(int sig, siginfo_t *sip, ucontext_t *uc)  
#endif
#ifdef LINUX486
segv_handler(int sig, struct sigcontext_struct sc)
#endif

-----------------------------------------------------------------------------


Problem 1:  Handling Synchronous Signals
========================================

I have devised two solutions to problem 1.  Firstly, pthreads_*()
calls could be allowed within signal handlers.  Unfortunately, this
would make the design of the threading system unnecessarily complex,
so I have rejected this possibility.  Instead, I propose the following
scheme, which is already partially functional in MIT PThreads 1.60b6.
It does not violate the standard (as far as I can see), since it is
currently not defined as to what action to take if a synchronous
signal is blocked.  MIT PThreads currently passes the signal on to
sigwait(), but does not stop the thread from being re-run generating
more spurious signals, or allow multiple simultaneous sigwait() calls
so signals can be lost if multiple signals occur while the first
signal is being processed.

- If a synchronous signal is generated, first try and deliver it to
the current thread.  If the signal is not blocked, then we are done.
Either the signal handler registered for this thread is called, or the
default action takes place.
- Next, check and see if any threads are sigwait()ing on the signal.
If so, suspend the execution of the thread that generated the signal
and deliver the signal to the sigwait()ing thread.  The signal
handling thread will have to explicitly wake up the suspended thread
once it is OK for it to run again, otherwise spurious signals could be
generated if the offending thread is run before the problem which
stopped it is corrected.  Just to be generic, the functions
pthread_freeze() and pthread_thaw() can be added to allow threads to
be suspended and re-started.
- It is questionable as to what to do if we reach this step.  We could
either:
     (a) Enqueue the signal and wait for some thread to do a sigwait()
         on it or unblock it.  This could result in a deadlock
	 situation if no thread ever handles the signal.
     (b) Give up.  Since no one is handling the signal, and
         execution can not continue without handling it, perform the default
	 action for the signal, which for all of the synchronous
	 signals under Linux involves terminating the program and
	 possibly dumping core.
	 
	 Since I prefer a program to never hang and always be
	 predictable in its behaviour, I prefer choice (b).  If a
	 program never wants to be killed, it can create a signal
	 handler thread for every computation thread, so that there is
	 always at least one signal handler thread in the sigwait()
	 function waiting to service the request of a signal raised in
	 a computation thread.  For this to work the signal handler
	 threads would have to be scheduled at a higher priority than
	 the computation threads, otherwise there could be a brief
	 danger region between when a stopped thread is thawed and its
	 signal handler thread re-enters sigwait().


-----------------------------------------------------------------------------


Problem 2:  Retrieving Signal Context Information
=================================================

To deal with synchronous signals, the signal handler or signal handler
thread needs two pieces of information:

1.  The operating system dependant signal context structure.
2.  The thread that caused the signal to be raised.

This information can be gleaned fairly easily.  The signal context
structure is delivered to the threading system when the signal is
received, and just needs to be saved so it can be forwarded on to the
user's program.  The thread that caused the signal is simply the
active thread at the time that the signal is received, and can be
stored in the same data structure where the signal context is stored.

Passing this information on to the program requires some non-standard
extensions to the PThreads standard.  Just as on many systems signal()
accepts as a second parameter a pointer to a function of the form
"void segv_handler(int sig, struct sigcontext_struct sc)",
pthread_signal() can support signal handler functions of the same
form.  It can just pass the signal context information along to the
signal handler in the same form in which it received it.  The function
sigwait() can be extended in a similar way.  For example, on Linux
sigwait() would be prototyped:

int sysdep_sigwait(const sigset_t *set, 
                   int *sig, 
                   struct sigcontext_struct *sc);

This way programs requiring the extra information have access to it,
while portable programs can use the old form.


-----------------------------------------------------------------------------

A Summary of the Changes:

- Change handling of synchronous signals to be logically consistent,
prevent loss of signals, and be useful.
- Add the functions pthread_freeze() and pthread_thaw() to allow a
signal handler thread to stop another thread from executing until the
problem that caused the signal is resolved.
- Save signal context information when signals are delivered to the
threading system, and pass this information on to signal handlers.
- Add the function sysdep_sigwait(), which also allows sigwait to get
signal context information.


-----------------------------------------------------------------------------


References used:
- "PThreads Programming" by Bradford Nochols, Dick Buttlar and
Jacqueline Proulx Farrell (O'Reilly & Associates Inc).  
- "Signals in Multithreaded Programs" by Chary G. Tamirisa (AIXpert, Aug. '95,
http://www.developer.ibm.com/library/aixpert/aug95/aixpert_aug95_signal.html).
- "Linuxthreads -- POSIX 1003.1c kernel threads for Linux" by Xavier
Leroy (http://pauillac.inria.fr/~xleroy/linuxthreads/README).
- Note that I do *not* have the Posix standard (I don't know where to
get it, and I don't have much money being a poor student...), but it
appears that the references I do have offer pretty complete coverage
of what the standard says about signals.

===

Chris Colohan
4th Year Computer Engineering
University of Toronto



home help back first fref pref prev next nref lref last post