Optimal multithread model
blblack at gmail.com
Tue Mar 16 00:52:40 CET 2010
On Mon, Mar 15, 2010 at 5:07 PM, James Mansion
<james at mansionfamily.plus.com> wrote:
> Marc Lehmann wrote:
>> Keep in mind thatthe primary use for threads is improve context switching
>> times in single processor situations - event loops are usually far faster
>> context switches.
> No, I don't think so. (re: improve context switching times in single
> situations). Yes, context switch in a state machine is fastest, followed by
> and then threads. But you're limited to one core.
>> Now that multi cores become increasingly more common and scalability to
>> multiple cpus and even hosts is becoming more important, threads should be
>> avoided as they are not efficiently using those configs (again, they are a
>> single-cpu thing).
> You keep saying this, but that doesn't make it true.
His basic point here, which I agree with, is sound. If you're trying
to scale up a meta-task (network server) that does many interleaved
tasks (talking to many clients) on one processor, event loops are
going to beat threads, assuming you can make everything the event loop
does nonblocking (or fast enough for blocking to not matter much).
That's all talking about a single CPU core though.
However, the thread model as typically used scales poorly across
multiple CPUs as compared to distinct processes, especially as one
scales up from simple SMP to the ccNUMA style we're seeing with
large-core-count Opteron and Xeon -based machines these days. This is
mostly because of memory access and data caching issues, not because
of context switching. The threads thrash on caching memory that
they're both writing to (and/or content on locks, it's related), and
some of the threads are running on a different NUMA node than where
the data is (in some cases this is very pathological, especially if
you haven't had each thread allocate its own memory with a smart
>> While there are exceptions (as always), in the majority of cases you will
>> not be able to beat event loops, especially whne using multiple processes,
>> as they use the given resources most efficiently.
> Only if you don't block, which is frequently hard to ensure if you are
> using third-party libraries for database access (or heavy crypto, or
> calc-intensive code that is painful to step explicitly). In fact, all
> those nasty business-related functions that cause us to build systems
> in the first place.
In the case that you can either (a) just use threads instead of event
loops, but you still want one process per core and several threads
within, or (b) use an event loop, but also spawn separate threads for
slow-running tasks (crypto , database, whatever) and queue your I/O
operations to those threads into the event loop for non-blocking
access to them.
I think some of the issue in this argument is a matter of semantics.
You can make threads scale up well anyways by simply designing your
multi-threaded software to not contend on pthread mutexes and not
having multiple threads writing to the same shared blocks of memory,
but then you're effectively describing the behavior of processes, and
you've implemented a multi-process model by using threads but not
using most of the defining features of threads. You may as well save
yourself some sanity and use processes at that point, and have any
shared read-only data either in memory pre-fork (copy-on-write, and
write never happens to these blocks), or via mmap(MAP_SHARED), or some
other data-sharing mechanism. So If you've got software that's
scaling well by adding threads as you add CPU cores, you've probably
got software that could have just as efficiently been written as
processes instead of threads, and been less error-prone to boot.
More information about the libev