Bug with signal delivery after fork.

Chris Shoemaker c.shoemaker at cox.net
Tue Jan 15 18:54:59 CET 2008


On Tue, Jan 15, 2008 at 10:50:34AM -0500, Chris Shoemaker wrote:
> On Tue, Jan 15, 2008 at 05:06:27AM +0100, Marc Lehmann wrote:
> > On Mon, Jan 14, 2008 at 10:56:54AM -0500, Chris Shoemaker <c.shoemaker at cox.net> wrote:
> > >     I believe the attached program demonstrates some bug related to signal
> > > delivery after a fork.
> > 
> > Oh, what you see is that ev_default_fork only sets a flag for the next run
> > of ev_loop. You will have to run ev_loop to reinitialise the kernel state
> > after a fork (e.g. ev_loop (EVLOOP_NONBLOCK) will do).
> > 
> > The documentation will point this out in the next release (and it will
> > contain other things required by kqueue, which makes it less flexible).
> 
> Adding:
> 
> ev_loop(loop, EVLOOP_NONBLOCK);
> 
> immediately after the call to ev_default_fork() did not noticeably
> change the behavior of the program.  It still hangs about about 10% of
> the time.


I've narrowed this down considerably by tracing both good and bad
executions and comparing.  I'll comment the differences in the code:

******

  if ((pid = fork()) == 0) {
    close(pdes[0]);

    ev_default_fork();
    ev_loop(loop, EVLOOP_NONBLOCK);
    ev_signal_init (&signal_watcher, signal_cb, SIGHUP);
    ev_signal_start (loop, &signal_watcher);
    write(pdes[1], "baz", 4);
    close(pdes[1]);
    /* In a good run, the child will enter the ev_loop and block before
       receiving the SIGHUP.  Then it notifies the watcher correctly.
       In a bad run, the child receives the SIGHUP before this ev_loop
       blocks, notices the SIGHUP, but never wakes the watcher.  */
    ev_loop(loop, 0);
    exit(99);

  } else {
    close(pdes[1]);
    read(pdes[0], buf, 4);  /* After this point, the ev_signal has started */
    close(pdes[0]);

    kill(pid, SIGHUP);
    res = waitpid(pid, &status, 0);
    printf("Result = %d, ExitStatus = %d\n", res, WEXITSTATUS(status));
  }
********

Looking at ev.c, it seems there is a race between the first active
watcher and the delayed fork-fixup function.  Ah, yes, there it is.
The following patch fixes it, and my test program no longer hangs.

-chris

----

diff --git a/shotgun/external_libs/libev/ev.c b/shotgun/external_libs/libev/ev.c
index b21b1a4..f8357b8 100644
--- a/shotgun/external_libs/libev/ev.c
+++ b/shotgun/external_libs/libev/ev.c
@@ -1534,13 +1534,13 @@ ev_loop (EV_P_ int flags)
           call_pending (EV_A);
         }
 
-      if (expect_false (!activecnt))
-        break;
-
       /* we might have forked, so reify kernel state if necessary */
       if (expect_false (postfork))
         loop_fork (EV_A);
 
+      if (expect_false (!activecnt))
+        break;
+
       /* update fd-related kernel structures */
       fd_reify (EV_A);
 



More information about the libev mailing list