Optimal multithread model

Gabriel Kerneis kerneis at pps.jussieu.fr
Thu Mar 25 10:18:39 CET 2010


Marc,

On Wed, Mar 17, 2010 at 01:37:07PM +0100, Marc Lehmann wrote:
> On Wed, Mar 17, 2010 at 12:19:23PM +0100, Gabriel Kerneis <kerneis at pps.jussieu.fr> wrote:
> > You can find the results here:
> > http://hal.archives-ouvertes.fr/hal-00434374/fr/
> 
> That's quite fascinating - I have yet to understand what an additional
> userspace accept queue (thttpd) has as an effect (i.e. how to decide when
> to not dequeue).

I've put the original graphs online (with somaxconn = 1024).
http://www.pps.jussieu.fr/~kerneis/software/files/benchmark/graphs/

They are probably a bit obscure, but here are some explanations:
*-strip-*.png: strip the 1,000 first and last requests (required to
  "ramp up") and plot the same graphs as described below.
*-concurrency.png: mean number of connections for a given load.
  Linear, except for apache, because of its listen(fd, 512).
*-errors.png: mean (in red) and median (in blue) time per request for a
  given load), with error bars displayed (standard deviation).
*-outliers.png: the same with outliers plotted in black.
*-throughput.png: the mean number of hits per ms for a given load.
*-time-x.png: temporal graphs for a given load of x.  Plots the number
  of concurrent requests (in red) and the time took be every request
  (each blue dot corresponds to a request) against the time of the
  experiment.  The black bars below are the scatter graph of the
  requests.

Temporal graphs were indeed fascinating for thttpd. Compare for
instance:
http://www.pps.jussieu.fr/%7Ekerneis/software/files/benchmark/graphs/thttpd-time-1000.png
http://www.pps.jussieu.fr/%7Ekerneis/software/files/benchmark/graphs/st-time-1000.png

> But I am also shocked: "a cursory examination of ST's sources indicates
> that it uses some rather clever data structures (e.g. heaps for storing
> timeouts)".

This is what we thought at the time.  It turned out that ST's data
structures are not that efficient (for instance, their implementation of
heap uses pointers between threads, which is more elegant but less
efficient than a heap based on an extra-array).  To be honnest, I have
no idea what made ST so efficient (but small benchmarks made later
seemed to show that CPC is more efficient on 64 bits architectures).
And that benchmark didn't use timeouts anyway, so this sentence was a
plain wrong way of acknowledging our lack of understanding ;-)

> I am looking for alternatives and improvements (and will implement some
> in one of the next releases) to (binary) heaps in libev, and it is a bit
> shokcing ot read that this is a clever data structure - I always thought
> of it as being the lower acceptable minimum...

Well, if every timeout has the same value (which is not uncommon on
server applications), it is more efficient to use an ordered list, with
a direct pointer to the head and the tail.  The problem is that it is
very efficient as long as its usable, and suddendly freezes your app if
you have too many elements in it, using several timeout values.  So this
is not acceptable for a general purpose library.

> (Of course, as outlines in the libev manual, for any important special
> cases, simpler and faster data structures exist).

Indeed.

> As a side note, however:
> 
>    Thus, event-driven programming avoids the localised cost associated
>    to thread contexts, but pays a price that is spread throughout the
>    program and hence dicult to quantify.
> 
> That is true for threads, too - C doesn't have registers, so it has to
> store verything "in memory" as well (modulo compiler optimisations).  In
> fact, you will not find a threaded webserver that doesn't have a "struct
> connection" type data structure with metadata, simply because local
> variables and registers don't get passed to functions.
> 
> So as soon as you use functions, you have the same issue with threads.
> 
> Of course, the relative costs are likely smaller with threads, but it is
> strictly wrong to claim that it only exists for event-based programs (in
> fact, the costs might even be negative).

I am not sure to understand what you mean here (admittedly because I do
not know much about performance issues with threads - I didn't wrote
that part of the paper).  I'll think about it, but if you could rephrase
it, maybe it would enlighten me.

> > (Disclaimer: if I had to redo this study, I would probably do it
> > differently.  But I moved to something else and do not have time anymore
> > to spend on this.)
> 
> It's still giving me great input, and made me plan to look at those clever
> optimisations inside ST - there is almost always something to learn in
> good software.

If you find out what makes ST really fast, please let me know, I'm still
in the dark here.

> Thanks for making the effort and sharing it!

I'll try to set up a dedicated web page to provide the scripts I used
and the method (someone requested it off-list) but won't have to time to
do so before two weeks.

Regards,
-- 
Gabriel Kerneis



More information about the libev mailing list