alternate approach to timer inaccuracy due to cached times
Shaun Lindsay
srlindsay at gmail.com
Fri Oct 14 04:08:08 CEST 2011
I think my description of the issue was lacking. This is not specific to
gevent. To illustrate the problem, I've included a C test program that
reproduces the timeout problems. I'll be referring back to that test code
as I reply inline to your comments. To simulate a real workload, the test
code creates 500 socketpairs, creates io watchers for them and then sends
something to opposite end of the socketpair, ensuring that our first call of
ev_run will have to deal with all 500 callbacks. The callback itself just
spins for one millisecond, then calls ev_io_stop() on the watcher. On the
last callback, it starts a 500 millisecond timer and records the time the
timer was started. When the timer triggers, it prints out the actual amount
of time waited between the ev_timer_start call and the firing of the timer
callback. The expected amount of time spent waiting should be somewhere
around 0.5s, but the timeout fires after 0.00003s, effectively instantly.
> As mentioned in the above link, you can force an update of the cached time
> > via ev_update_now(), but this doesn't practically address the issue.
>
> Well, it should address the issue, thats what it is made for - why do you
> think it doesn't address the issue?
It addresses the issue, but not in a fashion that is generally usable
without incurring a severe performance penalty. In the repro code, I could
avoid the problem by calling ev_now_update before starting the timer, but
what if I was starting a timer in each of the callbacks, rather than just
the last? The callbacks are each burning 1 millisecond worth of time; is
that a sufficient threshold to call ev_update_now? One of the dangerous
things here is that the real value of the timeout is being skewed by the
earlier callback's execution time, so the only way to be safe is to call
ev_now_update before every timer start. If you have a lot of timers (for
instance, one on every connection with a large number of active
connections), the performance penalty of doing that will be high; however,
as indicated in the test, that's really the only sane option.
> case, the triggering of any of the timeouts would constitute an error
> case,
> > so I'd need to call ev_update_now() before every timer start.
>
> In most cases, you would have events that invalidate the error case -
> having these at higher priority would solve that problem in a better way.
The only way to invalidate the error case is to actually check the time in
the callback for any timer event, which, again, puts us back in the system
call per timer case.
> > timeouts where immediate, nondeterministic expiration is an acceptable
> > condition,
>
> You are not talking about libev, are you? Even without calling
> ev_update_now there is nothing nondeterministic, and timers don't timeout
> immediately in any case.
Yes, I am talking about libev.
As illustrated in the example, the amount of work done before starting a
timer negatively affects the real time on that timer. In a real world
scenario, the amount of work done during event dispatch will be dependent of
whatever network traffic you're serving, so the actual values of your
timeouts are then subject to that workload and thus nondeterministic. The
repro case is an example of the timer expiring immediately.
> > Deferring calls ev_timer_start to the end of the dispatch cycle, then
> > starting them in a batch would fix this issue.
>
> Well, it doesn't make anything more deterministic, or avoids timers timing
> out immediately, you are just trading one behaviour for another.
Not really. The problem is that the amount of time taken to handle the
event callbacks is unknown. The time taken to add timers to the heap,
however, is relatively bounded and, even for large numbers of timers, still
on the order of microseconds. If we finish the callbacks, refresh the time,
then add the timers, the disparity is then limited to the time taken to
start the timers alone, decoupling it from the time taken to handle all the
event callbacks.
> > One nice advantage of this approach is that under heavy load, the
> timeouts
> > will occur later than the requested time, rather than before the
> requested
>
> Libev guarantees that timeouts never trigger before the requested time
> already.
Already addressed.
> > time, which seems saner from an application perspective -- I'd rather
> have a
> > 100ms timeout fire at 200ms than fire in 0ms.
>
> The latter can only happen if you use two different clocks to compare
> times. This is rather meaningless - libev will never timeout a timer
> requesting a 100ms timeout before or even at 100ms after the requested
> time.
Not meaningless at all. If I call ev_timer_init with a requested time of
0.1, I expect it to fire 100ms after that function call or later. As shown
in the repro code, that is not the actual behavior.
> probably need modification to be consistent, like adding a struct ev_timer
> > *next to all ev_timers for use in a linked list, for instance. I'm also
> not
>
> Unrelated to the problem: the libev way of doing things would be to have
> an array for those timers, which would likely be faster and save some
> memory.
Yeah, an array is definitely a better solution.
> > Now that I think about it, timers started by callbacks from other timers
> > might not get picked up until the next event cycle, which might be bad,
> I'm
> > not sure. Somebody more familiar with the code can probably point me in
> the
> > right direction on that.
>
> I think if you are hitting such a problem in gevent, this is a design issue
> in gevent - any event system should be able to deal with delays in event
> processing in a sane way.
>
I wasn't actually seeing that issue; I was just positing that my patch to
libev could actually cause that behavior and was entirely unrelated to
gevent.
Anyway, here's the output from the test code:
Unpatched libev:
------------------------
hit last cb, setting up timer
in timeout_cb
Expected to wait 0.5s, actually waited 3.09944e-05s
Patched libev:
---------------------
hit last cb, setting up timer
in timeout_cb
Expected to wait 0.5s, actually waited 0.50028s
If you're at all interested in the patch after reading this, I made some
changes regarding where the deferred timers were actually started in ev_run.
I can send out the updated version, assuming you agree that this is
actually a problem and actually worth fixing.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schmorp.de/pipermail/libev/attachments/20111013/5584bc3b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: multicb_timeout_test.c
Type: text/x-csrc
Size: 1978 bytes
Desc: not available
URL: <http://lists.schmorp.de/pipermail/libev/attachments/20111013/5584bc3b/attachment.c>
More information about the libev
mailing list