ev_async_send() not trigerring corresponding async handler in target loop ?

Pierre-Yves Kerembellec py.kerembellec at gmail.com
Thu Dec 31 10:35:38 CET 2009

> On Wed, Dec 30, 2009 at 03:26:03PM -0600, Brandon Black <blblack at gmail.com> wrote:
>> What he's referring to is that in the general case, a multi-process
>> program will scale better over increasingly large core counts than a
>> multi-threaded program, because the shared address space of the
>> threads tends to lead to memory/cache performance issues even on
>> non-NUMA machines, and obviously on NUMA machines the problems get
>> even worse when the threads that share active chunks of memory might
>> want to be scheduled on several different NUMA nodes.
> Thanks for explaining this in more detail, I always think people should
> know the trade-offs of threads and their purpose, but some marketing
> forces must have contorted the "threads" concept into something completely
> different over the past decades.
> That's not the only issue - the only purpose threads wree invented for was
> keeping values in many cpu registers constant (e.g. mmu table pointers)
> across "processes".
> what is a feature on a single core is a burden on multiple cores as now
> memory allocations, mapping changes etc. will have to be communicated between
> all cpu cores.
> in addition, you do need more locking for almost everything, such as
> memory allocation (which can be mitigated by pools, but not avoided
> completely).
>> However, a *carefully designed* multi-threaded program can avoid these
>> sorts of memory issues in many cases, but then that gets back into
>> another component of why "multithreading is (imho) extremely
>> complicated".
> True, but not in all. I cannot imagine that a shared-everything approach
> can be faster than a shared-selectively approach in general. Maybe if you
> mainly share file descriptors, but cetrainly not for address space.
> That's for theory. In practise, operating systems gained support for
> threads, which makes things slow, but *easy*.
> libeio for example uses threads because there is no good and portable
> communication path between processes. of course, libeio actually suffers
> a lot from the "threads on multicores" issue - libeio for example gains
> considerable speed when you force the process to a single core on linux,
> because all of eventfd, cache and the scheduler work against it in that
> case.

Thanks for the thorough explanation. I'm actually developing a fast socket
server that will be using a storage backend library to store thousands of
millions of "small objects" (basically less than 1K) in a large optimized
hash table. This storage layer maintains  a large objects cache in memory
(so to not hit the HD again on next read for the same object), this memory
cache not being share-able across processes.

That's basically why I need threads: to share the objects cache across
workers while spreading the load on multiple cores. I totally understand
the explanation above, and I guess the objects cache could be split across
multiple processes, with some kind of "mapping" to bind a particular hash
key to the same process over and over again: the clients wouldn't be balanced
at connection time but at applicative request time (when they specify the hash
key they want to access). But there are also "bulk" commands to read and write
multiple hash keys int the course of the same network session, so this would
not really work as expected (and complicate the server code anyway).

FYI I followed your previous advice and already migrated to a lightweight
memory queue + simple pthread mutex + ev_async_send(), instead of
using pipes. The performance gain is substantial (I went from 15kreq/s to
19-20kreq/s up, i.e. more than 20% improvement), and it seems the whole
server uses less CPU in general (less system calls and kernel/user-space
switches I guess). I've tried playing with thread/core affinity, but no conclusive
result sofar.

Thanks again,

More information about the libev mailing list