Another race in signal handling

Chris Shoemaker c.shoemaker at cox.net
Wed Jan 16 15:15:50 CET 2008


Hi,

  I was occasionally having signals caught by libev not trigger any
watchers.  This turns out to be a another race condition in libev, but
I'm not sure if the simple fix is correct.

The race involves the static sig_atomic_t volatile gotsig.

Here is the sequence of events that causes the signal to be lost:

Assume there are two signal watchers started, and gotsig is 0.

The first signal is received, triggering sighandler().

  if (!gotsig) {
      int old_errno = errno;
      gotsig = 1;
      write (sigpipe [1], &signum, 1);
      errno = old_errno;
  }

The condition is true, and gotsig becomes 1.  The signal handler returns.

Normally, we will eventually detect the write to sigpipe, wake the sigev,
calling sigcb, which will clear gotsig after reading from the pipe:

static void sigcb (EV_P_ ev_io *iow, int revents) {
...
  read (sigpipe [0], &revents, 1);
  gotsig = 0;
...

However, as soon as the sighandler returns, the full signal mask is
removed, so a new signal may be received at any time.  If a signal is
received before sigcb() clears gotsig, the sighandler will not record
it, because (!gotsig) will still be false.

This stupid patch closes the race, and improves the reliability of
signal delivery in my tests.

@@ -792,7 +792,7 @@ sighandler (int signum)
 
   signals [signum - 1].gotsig = 1;
 
-  if (!gotsig)
+  if (1)
     {
       int old_errno = errno;
       gotsig = 1;


But, I don't understand the motive for the flag in the first place, so
this may be breaking something else, that I don't appreciate.  Is there
any problem with removing the variable altogether?

-chris



More information about the libev mailing list