Optimal multithread model

Graham Leggett minfrin at sharp.fm
Tue Mar 16 16:21:04 CET 2010


On 16 Mar 2010, at 4:40 PM, Christophe Meessen wrote:

> Regarding the threads vs. processes discussion I see a use case  
> which hasn't been discussed yet.
>
> Consider the C10K application with many cold links and a context  
> (i.e. authentication, data structures, ...) associated to each  
> connection.
>
> With threads we can easily set up a pool of worker threads that  
> easily and efficiently pick up the context associated to the  
> connection becoming active. I don't see how an equivalent model can  
> be efficiently implemented with processes.
>
> I would prefer it was possible to do it with processes because they  
> have the benefit of a separate  memory space which is much better  
> for security and robustness. But I couldn't find a way to do it as  
> easily and efficiently as with threads.

Use copy-on-write for this.

The Apache httpd server goes through a configuration phase, which  
loads in the config, and sets up what is ultimately a big read-only  
data structure to be used by reference by the server over time. Only  
once this is set up completely, does httpd start forking processes (in  
the prefork model).

Each forked process shares the same memory space as the parent,  
regardless of the number of forks created, up until the point the  
first attempt is made to write to this shared space. At this point,  
the memory is automatically copied into its own space by the operating  
system (thus "copy on write"), and the write completed. Obviously, if  
you make sure you don't write anything to this shared space, you'll  
never need the copy, and all your processes can share common data  
initialised before the fork.

(Aside: I am very interested in the content of this thread with regard  
to the original prefork model of httpd and performance. A lot of focus  
has been placed on the event driven model in the HTTP server space,  
which comes with the upside of throughput, but with the (big) downside  
of a lack of reliability - one request going bananas takes out other  
requests on its way down. With the prefork model becoming more  
efficient on modern multiple cores, you get both performance *and*  
reliability).

Regards,
Graham
--




More information about the libev mailing list