Denis F. Latypoff
Tue Dec 29 16:18:31 CET 2009



Tuesday, December 29, 2009, 8:53:59 PM, you wrote:

> Hi Marc,

> Thanks for your fast answer. If I correctly understand what you're
> saying, the same ev_async watcher started ev_async_init()-ed and ev_async_start()-ed
> in each worker thread has to be passed to the ev_async_send() call
> ? It means it cannot be a local variable anymore and has to be accessible from the
> worker threads and the main (accept) thread ?

> I changed my code a little bit so that the ev_async watchers are
> now "global" and persistent, and the whole thing seems to work (but keep reading):

> https://pastee.org/up2hp

> I also commented the console trace out so that I may now bench the
> code using a simple Apache bench command like this one:

> $ ab -c 100 -n 10000 http://localhost:6666/

> (using ab was the whole purpose of simulating a simple HTTP server,
> as I didn't want to write a bench client as well ^_^).

> Most of the time, this will work, but sometimes it'll get stuck
> before the 10k requests are actually done. My guess is I have an overwrite condition while
> passing the accepted file descriptors form the main thread to the
> worker thread, using the "handle" member of the global CONTEXT structure (one per
> worker thread). If the server receives a large load of connections,
> the main thread will probably overwrite the "handle" member of the global structure
> too often, or at least before the corresponding worker thread had
> time to "read" it (in the async callback) and start the appropriate ev_io watcher in its
> own event loop. In other terms, some connections are never answered
> because the associated fd is never "seen" by a worker thread (and accumulate,
> accepted in the server, never to be released).

> So I guess I'm stuck back to my piping queue mechanism in this
> case, because a simple eventfd counter is not enough to hold an high-rate fd flow from
> the main thread. As you pointed out in your documentation
> (http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#Queueing), "ev_async does not support
> queuing of data in any way". Or I could use multiple av_async
> structures per worker threads, but it's just an ugly patch and will also eventually fail as the
> connection rate increases. Locking to make sure the previous handle
> was read before sending the next one is not really an option as it would kill the
> accept performance. I could also combine the pipe (for accepted
> handles queuing) and the eventfd signaling mechanism (indicating that the pipe needs
> to be read for new fds), but it's probably enough to just add the
> reading side of each pipe to an ev_io watcher in each worker thread (I know this is less
> optimal than an eventfd, but still seems to deliver great
> performance (with 4 worker threads on a dual bi-core server (4 way), I get about 16.000 req/s,
> which is not bad afterall)).

Your experience is interesting, but I have a question: why do you need
threads in event-driven machine? I just tested my app based on libev,
it works as followed:


for (i = 0; i < 4; ++i)
    switch (fork())
    case -1: /* epic fail */ break;
    case  0: return run_loop();
    default: /* manage worker (ev_child_init, ev_child_start etc) */
     * Watch workers

int run_loop(void)
     * Ignore all signals as master process will manage them itself.
     * Do accept() which is managed by kernel instead of master
     * process.

With this model I get about 17k req/s (including HTTP/1.0 protocol
parsing) on 4 CPU server.

> In simple words, each communication pipe is acting as a buffering
> queue between the main thread and the worker threads. In the main thread, I've got
> something like this (in the network accept callback) :

> {
>     char handle[10];
>     int  client;

>     client = accept(...);
>     fcntl(client, F_SETFL, fcntl(client, F_GETFL) | O_NONBLOCK);
>     sprintf(handle, "%d\n", client);
>     write(loops[worker_index].cpipe[1], handle, strlen(handle));
> }
> `
> and in each worker thread, I've added the reading side of the
> communication pipe (i.e. loops[worker_index].cpipe[0]) to the worker thread event
> loop, with a read callback parsing the handles from the
> communication pipe and adding them to the worker thread event loop.

> Is there a chance to see a similar generic API directly into libev
> sometime soon ? It would avoid duplicating this code all over the place as I think this is
> a common pattern in high performance network servers where the load
> has to be spreaded among multiple threads to benefit from multiple cores servers.
> Or maybe you know of a different approach ?

>>> I'm quite new to libev and I'm having a hard time figuring out why a call to ev_async_send() will not trigger the corresponding async handler
>>> in the specified target loop.
>> Any watcher, including an async watcher, must be started first. It will
>> likely work much better if you add an appropriate.
>>   ev_async_start (loop, &w);
>>> As you can see, the round-robin distribution to worker threads seems to be fine, but the async_cb is never called. Despite the fact that
>>> I need to add a timer watcher to each worker thread loop (to keep in alive in absence of any other), I've been digging a little but into the
>> Adding a timer is a possibility, another is to call ev_ref (loop).

> Absolutely, I forgot this one. Thx.

>>> libev code and it seems the internal evpipe_init() function (responsible for creating and initializing a communication eventfd/pipe
>>> watcher is actually never sent. ev_async_start() on the other end will call evpipe_init(), but my understanding is that it's not thread-safe
>> Calling evpipe_init is indeed the job of ev_async_start.
>>> (because it's not using the communication pipe and changing directly the loop  internal async table from another thread).
>> You have to start the watcher in the thread that waits for it.
>>> Am i missing something here ? Am I using the right workflow ?
>> Probably not - start the ev_async watcher in the thread that runs the
>> loop, then use ev_async_send from other threads.
>> Whether you then use one async watcher per thread or just one globally (or
>> something else) is then a matter of design.
>>> For now, I'm back at using my own pipes between the main and the worker threads, adding an ev_io watcher to the reading side of each
>>> pipe, and writing accepted network handles to the pipes in a round-robin way (like the above). I actually mimic the ev_async_send()
>>> behavior, and it works quite fine. But I'd really like to use the provided libev async facility if possible, instead of re-inventing the wheel
>>> in my code.
>> Indeed, it will also likely be faster.





 Denis

