libev-4.31 has just been released

Olivier Langlois olivier at trillion01.com
Thu May 6 18:32:13 CEST 2021


I wasn't aware of libeio. I will take a look into it...

Just to clarify my intent, the second patch wasn't meant for
submission. Of course, I know that this cannot be accepted. The purpose
for sharing it was to share what I was doing in terms of
experimentation with your lib.

On Thu, 2021-05-06 at 14:05 +0200, Marc Lehmann wrote:
> > 3. ev_io_uring code is currently littered with printf. I am
> > currently
> > trying to fix an odd behavior observed from io_uring:
> 
> Is that your code or are you talking about libev? If you want to
> seriously
> work on io_uring in libev, you should use the CVS version.
> 
> I, again, strongly think libeio is the correct place for I/O related
> requests, as it is not io_uring dependent.
> 
> > https://lore.kernel.org/io-
> > uring/8992f5f989808798ad2666b0a3ef8ae8d777b7de.camel at trillion01.com
> > /T/#u
> 
> I am not sure what the issue exactly is. As for #2, libev should
> already
> handle this when receiving the event - it will not queue an event for
> a
> watcher that does not want it.

I am starting to understand what you are saying.

I started to have the problem when I did replace your io_uring
boilerplate code with liburing. The motivation behind this move was to
eliminate the risk of getting bitten by some boilerplate bugs while I
was developing my prototype...

It seems like my problem is because the sqe user_data field is not
properly protected by memory barriers. The value that the kernel is
reading is not the one set by my code.

I am getting the opposite of what I was thinking getting by using the
reference lib. I wasted 2 days to track down this problem and it kinda
support your point about new API and bugs.

Someone has to respect your wisdom Marc or else we are doomed to repeat
the same mistakes!
> 
> As for #3, I don't understand your description.

Pavel, one of the io_uring maintainer did explain me that the code in
liburing is in place but the kernel support will only become available
with 5.13.

This is basically a modification of the POLL_REMOVE operation to allow
to update a poll entry with a single sqe.

IOW, instead of remove+add a poll entry with 2 sqe like it is currently
done, you could do it with a single operation. but this will only be
possible with 5.13
> 
> As for OOB data, TLS does not use it for sure. The only protocols
> that
> ever used oob data are telnet and ftp, although these only use the
> out-of-abdn API, not out-of-band data. One reason for this is that
> TCP
> does not support out of band data, only urgent data, which has
> different
> semantics (urgent data is in-band, for one thing).
> 
> As for bugs, the libev backend is probably super buggy, as it hasn't
> been
> tested seriously, as most of my programs crahs instantly due to
> kernel bugs.
> In this state, I consider it pointless to even try to work on this
> backend.

io_uring seem to proxy the polling request and ask more than the user
is asking by adding extra bits to the poll event mask. From
fs/io_uring.c io_arm_poll_handler():

	mask = 0;
	if (def->pollin)
		mask |= POLLIN | POLLRDNORM;
	if (def->pollout)
		mask |= POLLOUT | POLLWRNORM;

	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN
*/
	if ((req->opcode == IORING_OP_RECVMSG) &&
	    (req->sr_msg.msg_flags & MSG_ERRQUEUE))
		mask &= ~POLLIN;

	mask |= POLLERR | POLLPRI;

or from
static __poll_t io_uring_poll(struct file *file, poll_table *wait)
{
	struct io_ring_ctx *ctx = file->private_data;
	__poll_t mask = 0;

	poll_wait(file, &ctx->cq_wait, wait);
	/*
	 * synchronizes with barrier from wq_has_sleeper call in
	 * io_commit_cqring
	 */
	smp_rmb();
	if (!io_sqring_full(ctx))
		mask |= EPOLLOUT | EPOLLWRNORM;

	/* prevent SQPOLL from submitting new requests */
	if (ctx->sq_data) {
		io_sq_thread_park(ctx->sq_data);
		list_del_init(&ctx->sqd_list);
		io_sqd_update_thread_idle(ctx->sq_data);
		io_sq_thread_unpark(ctx->sq_data);
	}

	/*
	 * Don't flush cqring overflow list here, just do a simple
check.
	 * Otherwise there could possible be ABBA deadlock:
	 *      CPU0                    CPU1
	 *      ----                    ----
	 * lock(&ctx->uring_lock);
	 *                              lock(&ep->mtx);
	 *                              lock(&ctx->uring_lock);
	 * lock(&ep->mtx);
	 *
	 * Users may get EPOLLIN meanwhile seeing nothing in cqring,
this
	 * pushs them to do the flush.
	 */
	if (io_cqring_events(ctx) || test_bit(0, &ctx-
>cq_check_overflow))
		mask |= EPOLLIN | EPOLLRDNORM;

	return mask;
}

I am not 100% sure of why they are doing this but this might explain
why I receive the extra unrequested notifications. AFAIK, this is
harmless...





More information about the libev mailing list