Bug Report: SIGPIPE during call to ev_async_send

Benjamin Mahler benjamin.mahler at gmail.com
Mon Oct 12 19:48:32 CEST 2015


Ah sorry for not testing it earlier, applying this diff didn't fix the
issue in our test suite. Looking at the diff more closely, it seems to only
address one of the postfork cases in epoll_poll, whereas we seem to be
tripping the other case, here:

http://cvs.schmorp.de/libev/ev_epoll.c?revision=1.69&view=markup#l206

I injected print statements to validate that we are indeed hitting that
line in our tests.

On Mon, Oct 12, 2015 at 6:00 AM, Marc Lehmann <schmorp at schmorp.de> wrote:

> On Sun, Oct 11, 2015 at 04:42:03PM -0700, Benjamin Mahler <
> benjamin.mahler at gmail.com> wrote:
> > Interesting, from what I can tell we don't trip the postfork=1 code in
> the
> > link below within our hot paths,
>
> That's good from a performance standpoint. since you say it's a rare event
> in
> production code, it makes sense that it isn't triggered in the hot path.
>
> > We had tried to avoid ignoring SIGPIPE because we use libev in an
> > asynchronous / actor library that we maintain independently from Mesos
>
> Well, unless you fork and try to use the loop in the child, that is fine.
> The
> rationale behind making this a requirement on fork, rather than some other
> alternative, is that fork has a lot of other process-wide requirements as
> well.
>
> > So in theory shell filters could be built atop libprocess, but it seems
> > reasonable to ask users of the library to handle EPIPE in 2015.
>
> shell filters can still be built - as long as you don't fork and do event
> processing with the same loop in the child, that should be fine. It's not
> really difefrent than using threads, threads put a lot more limitations on
> fork, and trying to do multiprocessing in forked processes is generally
> asking for trouble, so this is not seen as a particularly bad extra burden-
>
> Again, you don't have to worry about it if other parts of the program fork,
> as long as they don't use your library in the child.
>
> Alternatively, somebody might come up with a solution, but the problem is
> that we can't use lcoks (it has to work from signal handlers), and the
> other
> thread or signal handler can be delayed wihtout any upper bound.
>
> So while we can replace the write fd atomically (from a userspace
> perspective), the linux kernel will still suffer from a race, where one cpu
> replaces the fd with another file description, but the other cpu has
> already
> fetched it and locked it, and will then trigger the SIGPIPE. (basically,
> when
> you call write in one thread and close on the same fd in another, the
> kernel
> might suffer form bad effects, for example, the write might block forever).
>
> The only solution we found is to not let this happen in general, but avoid
> this case entirely in important situations, such as when epoll recreates
> the
> poll set - we don't have to rebuild the event pipe in this case, so we
> don't.
>
> We do have to rebuild the event pipe, though, in a child process, before
> using the associated loop (and only iff using the loop), and then I don't
> see
> a way to avoid a race between a signal handler and the pipe rebuild code
> (or
> a newly started thread).
>
> > I'm curious, do you know when this would make its way into a release? No
> > worries if not.
>
> Hopefully "soon" there will be a release, and it will have that code. I am
> mainly waiting for you to test whether the change in libev seems to fix
> your
> problem or not.
>
> --
>                 The choice of a       Deliantra, the free code+content
> MORPG
>       -----==-     _GNU_              http://www.deliantra.net
>       ----==-- _       generation
>       ---==---(_)__  __ ____  __      Marc Lehmann
>       --==---/ / _ \/ // /\ \/ /      schmorp at schmorp.de
>       -=====/_/_//_/\_,_/ /_/\_\
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schmorp.de/pipermail/libev/attachments/20151012/862c0b37/attachment.html>


More information about the libev mailing list