alternate approach to timer inaccuracy due to cached times

Shaun Lindsay srlindsay at gmail.com
Fri Oct 14 07:48:38 CEST 2011


>
>
> libev's way of doing things (using loop start time) is just more
> efficient for most normal cases, because normally you don't have large
> delays/blocks in your callbacks in an event-driven program.  In the
> big picture, it sounds like the primary issue in your original code is
> that you're doing blocking database calls in the midst of an event
> callback.  That's going to screw up a lot of assumptions right there.
> Usually the way to handle this (assuming the database driver can't be
> hooked into the loop the way things should be) is to spawn a
> thread/process to handle SQL stuff asynchronously and talk to it over
> a local socketpair.
>

In the original case, the delay was due to a very inefficient
deserialization of objects being returned from cassandra.  The retrieval
itself was properly asynchronous, but turning the result set in to the
corresponding objects was taking around 50ms (and thus leading to the
issues).  Irrespective of the source of the delay, it would be better if
that didn't make timeouts unreliable.  If you look at the repro code, it
illustrates that you can get the same problem from a large number of very
small tasks, making this a potential issue under ordinary, but heavy, load.

Anyway, the point of the alternate solution is that it adds only one
gettime() call for any number of timers plus the cost of traversing a linked
list or array of the any timers queues during that loop cycle.  In exchange
for that modest CPU cost, you get timers that are reliable under load and
slow handler callbacks.


>
> If you're really stuck with this though, you could also switch to
> ev.pod's 4th strategy for timers, where all of your socket error
> timeouts are collapsed into one timeout watcher and go from there on
> the loop time issues.
>

The problem isn't the volume of timer events.  Condensing all the timeouts
to one actual event won't help since, as shown in the example, a single
timeout is vulnerable to early expiration.  Basing all your timeouts off of
that would just cause everything to expire early (Although using one event
and a backing list seems like a really cool way of squeezing a huge number
of events in to the system).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schmorp.de/pipermail/libev/attachments/20111013/ed6d3c98/attachment.html>


More information about the libev mailing list