Disabling the SIGCHLD handler

Chris Shoemaker c.shoemaker at cox.net
Mon Jan 14 16:38:51 CET 2008


I think you might be taking this personally.  Please don't.  I'd
really like a productive dialog here.  Perhaps we're talking past each
other.  Perhaps we're not looking at the same version of libev.  (I'm
looking at 2.01.)  Perhaps I'm an idiot.  However, even if I'm an
idiot, I'm not attacking you.  A little politeness doesn't cost you
much.

On Mon, Jan 14, 2008 at 03:06:41AM +0100, Marc Lehmann wrote:
> On Sun, Jan 13, 2008 at 06:05:18PM -0500, Chris Shoemaker <c.shoemaker at cox.net> wrote:
> > > As I explained, if for whatever reason it is inconvinient to use it,
> > > don't. Libev doesn't force the use of its child watcher.
> > 
> > Well, it doesn't force the use of ev_child, but it does force an
> > application to choose between either:
> > 
> >   a) not having access to the exit status of children that exited before a
> > ev_child was started, OR
> 
> As I said earlier, this is untrue. Why do you kepe repeating it? libev
> *explicitly* allows registering an ev_child handler after the child has
> exited already, anythign else would be pointless indeed.

I will try to be more explicit about why I believe that this doesn't work.

> This is actually used in practise, so if it doesn't work, I'd like to know
> why.

I will also try to explain why it doesn't work.

> >   b) not being able to use ev_signal
> 
> That is also untrue. Nothing precludes libev users form using ev_signal
> but install their own sigchld handler, as was explained before.

I guess I've failed to effectively communicate the problem, so I will
try some different approaches.

> > I'd like to use ev_signal (for unrelated purposes, mostly) AND have access
> > to exit status of children that exited before I knew I would want their
> > exit status, (for which I know I need to use my old SIGCHLD handler, that
> > I used with libevent).
> 
> I will never understand how people can "know" factually wrong things.
> 
> You made this claim repeatedly, please back it up, or take it back. Thanks.

Okay.  I guess the simplest proof is in the code:

ev.c has a childcb SIGCHLD handler containing:
  if (0 >= (pid = waitpid (-1, &status, WNOHANG | WUNTRACED | WCONTINUED)))
    if (!WCONTINUED
        || errno != EINVAL
        || 0 >= (pid = waitpid (-1, &status, WNOHANG | WUNTRACED)))
      return;
...
  child_reap (EV_A_ sw, pid, pid, status);

Notice that nothing prevents the waitpid from reaping any child at all.

child_reap() contains:
  for (w = (ev_child *)childs [chain & (EV_PID_HASHSIZE - 1)]; w; w = (ev_child *)((WL)w)->next)
    if (w->pid == pid || !w->pid)
      {
 ... [ feed the event ]
      }

Notice that if the conditional (w->pid == pid || !w->pid) is false for all
ev_childs, then
  a) we have just reaped a child, and have its status.
  b) we will not be generating any event, since no ev_child matches.
  c) we will never again see the status for this child

All that is then required is for someone to start a ev_child for the
child we just reaped.  That event will never trigger.


If you'd prefer a more empirical approach, you can run this program:
*************
#include <ev.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
#include <sys/wait.h>

struct ev_child child_watcher;

struct ev_signal signal_watcher;

static void signal_cb (EV_P_ struct ev_signal *w, int revents) {
  printf ("signal\n");
  ev_unloop (EV_A_ EVUNLOOP_ONE); /* leave one loop call */
}

static void cb (EV_P_ struct ev_child *w, int revents) {
  printf ("w->pid = %d; w->rstatus = %d\n", w->rpid,  WEXITSTATUS(w->rstatus));
  ev_unloop (EV_A_ EVUNLOOP_ONE); /* leave one loop call */
}

int main (int args, char *argv[]) {
  struct ev_loop *loop = ev_default_loop(0/*EV_FORK_ENABLE*/);
  pid_t pid;

  ev_signal_init (&signal_watcher, signal_cb, SIGHUP);
  ev_signal_start (loop, &signal_watcher);

  if ((pid = fork()) == 0) {
    exit(99);
  }
  printf("spawned %d\n", pid);
  if (fork() == 0) {
    sleep(5);
    kill(pid, SIGHUP);
    exit(98);
  }

  ev_loop(loop, EVLOOP_NONBLOCK);

  sleep(1);
  printf("STARTING LOOP %d\n", pid);
  ev_child_init (&child_watcher, cb, pid);
  ev_child_start (loop, &child_watcher);

  ev_loop(loop, 0);

  printf("done\n");
  return 0;
}
************

If I understand what you're claiming, then that program should print the
exit status 99, and then terminate.  In fact, it does neither.

An strace of the process shows why, and it agrees with the source code:

...
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aaaaaac7770) = 21046
--- SIGCHLD (Child exited) @ 0 (0) ---
write(4, "\21", 1)                      = 1
write(1, "spawned 21046\n", 14)         = 14
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aaaaaac7770) = 21047
epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLIN, {u32=3, u64=3}}) = 0
epoll_wait(5, {{EPOLLIN, {u32=3, u64=3}}}, 64, 0) = 1
read(3, "\21", 1)                       = 1
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 99}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 21046
wait4(-1, 0x7fffd639771c, WNOHANG|WSTOPPED|WCONTINUED, NULL) = 0
write(1, "STARTING LOOP 21046\n", 20)   = 20
epoll_wait(5, 609010, 64, 59743)        = -1 EINTR (Interrupted system call)
--- SIGCHLD (Child exited) @ 0 (0) ---
write(4, "\21", 1)                      = 1
rt_sigreturn(0x4)                       = -1 EINTR (Interrupted system call)
epoll_wait(5, {{EPOLLIN, {u32=3, u64=3}}}, 64, 59743) = 1
read(3, "\21", 1)                       = 1
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 98}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 21047
wait4(-1, 0x7fffd639771c, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
epoll_wait(5,  <unfinished ...>
*****

Notice that the child (21046) has been reaped, with exit status 99,
_before_ the ev_child has been started.  And as a matter of fact, this
is the same reason why it would prevent my own waitpid from finding
the already-reaped child.

> 
> I certainly know a number of ways to implement that, and for the most
> common case (creating a process, and then waiting for it to exit) its
> trivial (as you can validly register the child handler after creating the
> process).

I don't follow, really, but here's basically how I would do it:
[COMPLETELY untested]

   for (w = (ev_child *)childs[i]; w; w = (ev_child *)((WL)w)->next) {
      do {
        pid = waitpid (w->pid, &status, WNOHANG | WUNTRACED | WCONTINUED);
        /* some systems define WCONTINUED but don't support it (linux 2.4) */
        if (WCONTINUED && pid < 0 && errno == EINVAL) {
          pid = waitpid (w->pid, &status, WNOHANG | WUNTRACED);
        }
      } while (pid < 0 && (pid != -1 || errno == EINTR));

      /* no event when pid == 0 */
      if (pid) {
        ev_set_priority (w, ev_priority (sw)); /* need to do it *now* */
        w->rpid    = pid;           /* pid == -1 means ECHILD (or EINVAL) */
        w->rstatus = status;
        ev_feed_event (EV_A_ (W)w, EV_CHILD);
      }
    }

AFAICT, that removes the need for several other peices of code:
  - you don't have to special case pid == 0
  - it will also work for watching process groups (pid < -1)
    (That's a hard requirement for me, btw.)

Perhaps you wanted to avoid handling the ECHILD, but I would need it,
and using rpid == -1 seems rather like waitpid returning -1.

Now, I realize that this might not offer the behavior you desire in terms
of multiple ev_childs registering for the same pid.  But this is ok for
me, since I'm fine with the POSIX waitpid semantics of each child only
unblocking one waitpid, and other waitpids getting ECHILD.  I don't want
it to appear like the child died more than once, even if there are multiple
ev_child watcher.

> > > But ev_signal has the same problem: As soon as some part of a program
> > > registers a watcher, signal handlers instaleld via sigaction stop working.
> > 
> > I don't follow.  I don't intend to use sigaction.  I want to use ev_signal
> > for all signal watching.
> 
> I was told that ruby gives access to functions such as waitpid or
> sigaction, for which the problem exists. If that is untrue, then I was
> misinformed, sorry.

Well, kind of.  The language does support those APIs, but the Rubinius
VM offers them in an insulated way.  So, I don't have to worry about
anyone else using the real waitpid or sigaction - everything goes
through the VM.

> > > > How about treating it exactly as if the loop didn't support ev_child,
> > > > just like every non-default loop?
> > > 
> > > That would mean a crash. I don't think that this is acceptable, as one of
> > > the basic premises of the default loop is to support all those features.
> > 
> > Well, I would think that aborting if the application attempted to use
> > a feature that it explicitly disabled would be acceptable.
> 
> I don't, because "the applciation" is just one of the libev users, other
> modules might use it as well. By disabling standard fucntionality this would
> make it hard to integrate third-party software. That means one best avoids
> crippling functionality but instead should use and improve it.
> 
> And even if, you cna always provid your own child reaper. I just do not
> see your problem.

I hope I've been clearer?

> > another option could be to offer another loop type: Something just
> > like the default loop but without the SIGCHLD handler.
> 
> No, signals are an unsharable resource just like sigchld. It just cannot
> be done with posix, sorry.

I do realize it couldn't be shared, I meant to offer another loop type
to be used _instead_ of the default loop.  In any case, it's easy enough
to disable in the code.

> > > Other factors would, too, as signal handlers are unsharable, just like
> > > waiting for child exist statuses is unsharable. You'd only postpone the
> > > problem.
> > 
> > I don't follow.  What problem would I postpone?
> 
> As soon as somebody registers a child handler via e.g. sigaction - same
> problem. I was told this were possible, but since it doesn't seem to be,
> there wouldn't be a problem.

Right, that's not a problem.

> > > How does it do so? Looping over all pid's on all calls to sigchld? Thats
> > > extremely inefficient...
> > 
> > Instead, I loop over only list of outstanding calls to waitpid,
> 
> I assume you do this on every call to waitpid, too...

No, on every SIGCHLD.

> > As for efficiency, this makes the SIGCHLD handler O(N) in the number
> > outstanding waitpid calls.  I don't expect that to be a problem,
> > but I'm open to suggestions for improvement.
> 
> The improvement would be to use an event-based model, not a polling model.
> 
> > [ After writing this, I took a look at how ev_childs are handled, and
> > realized that it would be quite easy to modify libev to provide the
> > behavior I want.  I would just remove the waitpid(-1) call, and put a
> > waitpid(pid) call inside the loop over childs[].  As an added benefit,
> 
> That would break it, however.

I guess that depends on your definition of "break".  It would,
however, function exactly the same as my "legacy" sigchld handler, which
is good for me at least. :)

> > semantics of waitpid().  I will very likely do exactly that, and I'll
> 
> You are free to do that, but I would still be happy to understand why you
> feel the urge to patch libev when you can just provide your own sigchld
> handler instead, as you originally wanted.

Good question.  I guess it feels a bit cleaner to use the ev_child
interface.

> For some reason, you keep ignoring the obvious solution that ha sbeen
> mentioned a few times already.

Well, since I've completely ignored the obvious solution that has been
mentioned a few times already, it's increasingly likely that... what?.
Why do you think that might be?

> > send a patch if you're interested. ]
> 
> No, I am not interested in introducing bugs.

:( That's really quite impolite.  What have I done to earn such a
response?  It's just code, remember.


-chris



More information about the libev mailing list