libev-4.31 has just been released

Olivier Langlois olivier at trillion01.com
Wed May 5 15:54:38 CEST 2021


On Tue, 2021-05-04 at 12:57 +0200, Marc Lehmann wrote:
> Thanks for trying out the iouring backend.
> 
> On Wed, Apr 28, 2021 at 11:24:49AM -0400, Olivier Langlois
> <olivier at trillion01.com> wrote:
> > I believe that in order to achieve the performance gain that io_uring
> > can deliver, you would need to service I/O through io_uring as well
> > to
> > save on the associated system call cost instead of just using
> > io_uring
> > for polling.
> 
> iouring being quite a bit slower than epoll was my own experience. That
> and it being too buggy for general use (and obviously no movement of
> the
> maintainer to fix things) made me kind of give up on this as a libev
> backend. It can only work in some case,s or when you embed it into
> another
> event loop. And then you can just embed iouring into an epoll loop to
> get
> the best of both worlds.

I tend to disagree on the future of this new API. It seems to have a
lot of potential. It has a lot of visibility, Linus himself seems to be
heavily involved in it. From my point of view, they do put a lot of
efforts and resources into its development. I do skim through the
kernel release notes at each release and as of right now io_uring is
one of the most actively developped kernel component. With that kind of
effort, I have a hard time imagining something else than a bright
future for it.

I am about to upgrade my system to 5.12.1 with a fresh compile
including nohzfull option. (but Mellanox mlx5 driver code is broken in
this release!) According to Jens io_uring is 10% faster previous
iteration...

https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.12-Faster-IO_uring

I wish he would care to explain what changes are responsible for this
boost. I did try to find out and I couldn't see what could explain this
improvement.

As far as I can tell, it is only code clean up and optimizations but
without more details, I find the claims hard to believe...

and 5.13 appears to be a game changer for libev users... It will have
multishot poll support in it:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=625434dafdd97372d15de21972be4b682709e854

> 
> > but if adding a function specific to libev io_uring backend that
> > would
> > let the watcher code perform their i/o requests through io_uring was
> > thinkable, that could be the performance holy grail for libev users
> > by
> > only making 1 system call to service the 64 i/o operations.
> 
> The obvious way to do this would be to expose request submission. There
> are
> two problems to solve though: a) how to identify/distinguish those
> requests
> from libev's own ones and b) libev can tear down and open a new iouring
> at
> any time, and this is hard to synchronise with external users.
> 
> It might be possible to do some kind of iouring watcher.
> 
> However, in the need, you can also use your own iouring and embed it
> into
> libev. This will also take care of extra system calls, and is rather
> clean
> and works with e.g. epoll as well, and might even be faster (certainly
> more correct, as iouring is too buggy as a generic event backend).

I have a working proof of concept modified libev that supports async
reading.

The project that I did use as a source of inspiration is:
https://github.com/frevib/io_uring-echo-server/

Here are the big lines of how it works:
1. I create a new eventmask bit.
2. Reuse the ev_io watcher type that I extend using its data pointer
with an agreed content (struct ev_io_uring_buffer_read_params) to
support the new event type
3. Refine the sqe user_data usage by encoding an event type in it
4. Add a io_uring loop specific function to return back the buffers to
io_uring when the user code is done with it.

I have tested it and it works fine. Here are some random comments about
the experience:

1. Async reading only works on blocking fds (It returns EAGAIN
otherwise). This makes the transition of existing multiplexing code
harder (I wish there was a flag for io_uring to disregard O_NONBLOCK
flag for this case)
2. I have started using liburing for the io_uring loop implementation.
I understand that this is something you cannot afford to do to not
force a new dependency on your users but I do not have this constraint
and when I saw how passing a timout to io_uring_enter syscall was done,
I just gave up the idea of doing my code in the right way. My goal is
to reach a working proof-of-concept prototype ASAP. Not fixing bugs in
complex boilerplate code. I guess that the resulting code can still
have some learning value despite not being pullable as-is
3. ev_io_uring code is currently littered with printf. I am currently
trying to fix an odd behavior observed from io_uring:
https://lore.kernel.org/io-uring/8992f5f989808798ad2666b0a3ef8ae8d777b7de.camel@trillion01.com/T/#u

imho, this should be of interest to you because AFAIK, the original
libev 4.33 must be plagued by the same issue...

I will attach the mods to this email.

My app is mostly one sided on the receive side so Async reading is the
low hanging performance fruit. My end goal is having an async io_uring
openssl BIO module 100% fed by libev.

I guess once I have the reading part done, starting doing the same for
output should be easy to do.

> 
> > In the meantime, I took care of one of the TODO item. That is using a
> > single mmap() when possible. It is essentially code from liburing
> > adapted to libev coding style...
> 
> Thanks, when I come around to implement this I will certainly take
> advantage of your work, although this is currently on the back burner
> due
> to the issues with iouring.
> 
> Would it be possible to re-send the patch properly though? The version
> you
> sent is completely garbled because there are extra spurious newlines
> all
> over the place.

Yes. I am going to do that. Check this email attachement.
> 
> > By switching from epoll backend to io_uring one, my process CPU usage
> > did drop from 20-30% to below 5%. It seems too good to be true!
> > What I suspect happening is that my socket option SO_BUSY_POLL
> > setting
> > might not be honored by io_uring.
> 
> That indeed sounds too good to be true. In my tests iouring is
> consistently
> slower, although I can imagine that in workloads which are very heavy
> on
> syscalls (e.g. epoll_ctl) this might change.
> 
> On the other hand, epoll now has a mode where it can also queue things
> with few syscalls, and as much as I hate epoll, since iouring is going
> down the same road as linux aio (buggy, never getting fixed), it is
> probably the way to go for the future.

I was a big epoll user 10 years ago and well served by it. Ive stopped
following its development when I delegated the implementation details
to good libraries like libev. I wasn't aware at all that epoll was now
allowing its users to queue things.

> 
> On Wed, Apr 28, 2021 at 11:31:47AM -0400, Olivier Langlois
> <olivier at trillion01.com> wrote:
> > Here is a last quick sidenote concerning my CPU usage observation.
> > 
> > CPU usage reported by top is now below 5% when using io_uring backend
> > but it seems like the CPU is spent by something else inside the
> > kernel
> > as my average load did pass from 2.5 to ~3.1...
> 
> I would expect that all other things being similar, that things take
> more
> cpu, as iouring seems to be vastly less efficient as epoll (e.g. its
> use of
> hash tables instead of a simple array lookup for everything is bound to
> slow
> things down).
> 
> It might be possible that this is improved in future versions of the
> kernel, but I am doubtful that it can ever reach epoll speeds,
> especialy
> if a queueing system is used for epoll as well (which libev does not
> yet
> implement).
> 
> All of this points ot being the right solution to use iouring for the
> things
> only it can do (I/O) and using epoll with a submission queue for the
> rest.
> 
This is a good point. I guess that the only way to find out is to try
out those ideas. my bet isn't only on the queuing mechanic but also the
elimination of system calls for performing the I/O that io_uring can
offer.

I'll go through the experiment. We will see if it is beneficial... and
adjust until the bext combination is found...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1mmap.patch
Type: text/x-patch
Size: 3678 bytes
Desc: not available
URL: <http://lists.schmorp.de/pipermail/libev/attachments/20210505/fe83e8a8/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: buffer_select.patch
Type: text/x-patch
Size: 30812 bytes
Desc: not available
URL: <http://lists.schmorp.de/pipermail/libev/attachments/20210505/fe83e8a8/attachment-0003.bin>


More information about the libev mailing list