why parents and child has to recreate the epoll set after fork

Marc Lehmann schmorp at schmorp.de
Sun Feb 28 23:08:03 CET 2016

On Sun, Feb 28, 2016 at 02:42:54PM +0800, adream <adream307 at gmail.com> wrote:
> The  EVBACKEND_EPOLL part of  libev manual  said:
>            The biggest issue is fork races, however - if
>             a program forks then *both* parent and child process have to
>             recreate the epoll set, which can take considerable time (one
>             syscall per file descriptor) and is of course hard to detect.
> I want to know why?

Because epoll's "design" was just a quick and dirty hack, and it shows badly.

> After did some google, this site said, if the child close fds, it will lead
> the fds clear form the epoll set in the parents.
> https://lkml.org/lkml/2007/10/27/25

The guy who wrote that mail and the guy who wrote the libev manual are
the same person (me), and the mail referenced by that url points out that
the fds will _not_ be cleared from the epoll set. Thus, they are not in
contradiction to each other.

> But my example code show that closing fds in child wouldn't clear the
> parents epoll set.

It wouldn't clear the epoll set, and it would not be removed from the
epoll set. Thus it is possible to close an fd, and afterwards forever
receive events for it, and you can't do anything about that.

> So I want to know why the parents and child should has to recreate the
> epoll set after fork.

Because, if an fd that is not open in the processes receives an event,
there is no way to remove it - the epoll API keys watchers by fd, but
internally does not use fds, so it's possible to get events for something
that isn't an fd in your process and consequently cannot be controlled

Worse, it's possible to receive events for a foreign fds which
could potentially be confused with an fd local to the process. And
implementations using pointers as epoll data might receive pointers to
no-longer valid memory ranges, or memory only valid in another process,
with no way to detect this.

To make things less buggy, libev uses a 32 bit generation counter per fd
- if libev receives an event that it should not have received, it will
create a new epoll set, as the old epoll set cannot be repaired.

                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp at schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

More information about the libev mailing list