After fixing my multiple issues, I can now essentially reproduce your
findings. This is basically my version of your diagram:

this is when i use add instead of mod (major change is in overall time, as
libev does it's syscalls at another time than libevent):

And just for fun, this is using select (showing that the management code
internal to libev is faster, which is good for my sense of reality, as it
is much simpler than the one in libevent2): (note: double-logarithmic)

Effectively, there is now little (practical) performance difference
between libev and libevent, unless one has a really high number of fds to
attend to.

Thanks for prompting me to run the benchmark with libevent-2, this has
been very interesting. I will think whether the added protection against
fork is worth it (we still have the generation counter in libev for
this case), and maybe make some similar change as explained before for
libev-4.0. I am 70% convinced it can be done without sacrificing fork
protection completely.

