Bug with signal delivery after fork.

Chris Shoemaker c.shoemaker at cox.net
Mon Jan 14 16:56:54 CET 2008


Hi,
    I believe the attached program demonstrates some bug related to signal
delivery after a fork.

I expect the child to receive the HUP signal, run the signal_cb, and exit.
Notice the locking that ensures that the ev_signal is started before
any signal is sent.

This program behaves as I expect about 9/10 of the time.  But by running it
repeatedly, it is quite easy to encounter a case where the ev_signal
callback is never triggered, thus the loop is never unrolled, and the
parent hangs waiting for the child.

***************************************
#include <ev.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
#include <sys/wait.h>

struct ev_signal signal_watcher;

static void signal_cb (EV_P_ struct ev_signal *w, int revents) {
  printf ("signal\n");
  ev_unloop (EV_A_ EVUNLOOP_ONE); /* leave one loop call */
}

int main (int args, char *argv[]) {
  struct ev_loop *loop = ev_default_loop(EV_FORK_ENABLE);
  pid_t pid;
  int pdes[2], status, res;
  char buf[4];

  ev_loop(loop, EVLOOP_ONESHOT | EVLOOP_NONBLOCK);

  pipe(pdes);
  if ((pid = fork()) == 0) {
    close(pdes[0]);

    ev_default_fork();
    ev_signal_init (&signal_watcher, signal_cb, SIGHUP);
    ev_signal_start (loop, &signal_watcher);
    write(pdes[1], "baz", 4);
    close(pdes[1]);

    ev_loop(loop, 0);
    exit(99);

  } else {
    close(pdes[1]);
    read(pdes[0], buf, 4);  /* After this point, the ev_signal has started */
    close(pdes[0]);

    kill(pid, SIGHUP);
    res = waitpid(pid, &status, 0);
    printf("Result = %d, ExitStatus = %d\n", res, WEXITSTATUS(status));
  }

  printf("done\n");
  return 0;
}
************

The relevant portion of the strace -f:

pipe([5, 6])                            = 0
clone(Process 25732 attached (waiting for parent)
Process 25732 resumed (parent 25731 ready)
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aaaaaac7770) = 25732
[pid 25731] close(6 <unfinished ...>
[pid 25732] close(5 <unfinished ...>
[pid 25731] <... close resumed> )       = 0
[pid 25732] <... close resumed> )       = 0
[pid 25731] read(5,  <unfinished ...>
[pid 25732] rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
[pid 25732] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 25732] rt_sigaction(SIGHUP, {0x4015d0, ~[RTMIN RT_1], SA_RESTORER|SA_RESTART, 0x366a430f30}, NULL, 8) = 0
[pid 25732] write(6, "baz\0", 4)        = 4
[pid 25731] <... read resumed> "baz\0", 4) = 4
[pid 25732] close(6 <unfinished ...>
[pid 25731] close(5 <unfinished ...>
[pid 25732] <... close resumed> )       = 0
[pid 25731] <... close resumed> )       = 0
[pid 25732] close(3 <unfinished ...>
[pid 25731] kill(25732, SIGHUP <unfinished ...>
[pid 25732] <... close resumed> )       = 0
[pid 25731] <... kill resumed> )        = 0
[pid 25732] --- SIGHUP (Hangup) @ 0 (0) ---
[pid 25731] wait4(25732, Process 25731 suspended
 <unfinished ...>
[pid 25732] write(4, "\1", 1)           = 1
[pid 25732] rt_sigreturn(0x4)           = 0
[pid 25732] close(4)                    = 0
[pid 25732] pipe([3, 4])                = 0
[pid 25732] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 25732] fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
[pid 25732] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 25732] fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
[pid 25732] select(64, [3], [], NULL, {59, 743000} <unfinished ...>
Process 25731 resumed
Process 25732 detached 


Notice that the SIGHUP is received, but the ev_signal is never
triggered.  Incidentally, while this is easy to reproduce in the
untraced run of the program, it is more difficult during an strace
of just the parent, but it seems to have a 100% failure rate when
traced with "strace -f".

Please let me know if there's any more information I can provide to
help.

-chris 



More information about the libev mailing list