EV (epoll) and Mojo::IOLoop::Client bug?

Marc Lehmann schmorp at schmorp.de
Fri Sep 11 17:15:04 CEST 2015


On Fri, Sep 11, 2015 at 10:05:24AM +0500, Andrey Khozov <avkhozov at googlemail.com> wrote:
> When using EV via select or poll all is ok. But when using EV via epoll I
> get the error "connect timeout". That makes me doubt that this is a bug in
> Mojo.

First of all, it would have helped readability immensely if you hadn't
reformatted the strace output.

> connect(3, {sa_family=AF_INET6, sin6_port=htons(27017), inet_pton(AF_INET6,
> "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS
> (Operation now in progress)
> fcntl(3, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
> epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLOUT, {u32=3, u64=4294967299}}) = 0
> epoll_wait(5, {{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=3, u64=4294967299}}}, 64,
> 9996) = 1
> epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLOUT, {u32=3, u64=4294967299}}) = 0
> getsockopt(3, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
> ...

Assuming the intervening strace output doesn't have anything interesting
to add, at this point, the watcher for fd 3 is still registered.

> dup2(4, 3)                              = 3

This closes fd 3, which is however sitll registered, and likely can't be
unregistered anymore (=> app bug).

> connect(3, {sa_family=AF_INET, sin_port=htons(27017),
> sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in
> progress)
> epoll_wait(5, {}, 64, 9995)             = 0

The old file description for fd 3 is still registered, the new hasn't been
registered.

> I see that the second time epoll_wait is not monitoring the descriptor(3)
> and therefore a connect event is missing.

If you see that, then not from the strace, because the original fd 3 is
still being monitored according to that strace - there hasn't been an
epoll_ctl call.

In any case, this is very unlikely to be a bug in EV, but almost certainly
a bug in the calling code (which might or might not be Mojo), which
doesn't stop the watcher before closing the socket. This goes undetected
with select and poll, as these effectively re-register every watcher for
every select/poll call (which is the major reason they are slow, but also
hides this kind of bug).

To fix it, find what causes the dup2 call, and stop the I/O watcher before
closing the file handle that is being dup'ed over. And, presumably, start
it again afterwards on the new file description.

If you are good with a debugger (e.g. gdb), an easy way is to set a
breaskpoint on the dup2 call, and when it triggers (and you amde sure it is
that call), you can get a perl backtrace via e.g. Perl_eval ("use
Carp;Carp::confess",1)

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp at schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\



More information about the anyevent mailing list