Patch: fix libeio cancellation

Hongli Lai hongli at phusion.nl
Tue Aug 19 11:03:42 CEST 2014


On Tue, Aug 19, 2014 at 1:06 AM, Marc Lehmann <schmorp at schmorp.de> wrote:
> On Mon, Aug 18, 2014 at 01:28:55PM +0200, Hongli Lai <hongli at phusion.nl> wrote:
>> 1. If I cancel eio_open(), and the system call has already finished,
>> then if the callback isn't called, the file descriptor becomes lost
>> and there would be no way to close it.
>
> I am not sure why you think that, does IO::AIO lose fd's on cancelled
> requests? If yes, I'd like to understand how it happens.

I don't know about IO::AIO. I am only talking about the libeio C library.

Here's how libeio loses the fd:
1. I call eio_open().
2. libeio puts the request in a queue.
3. A libeio thread (function etp_proc) calls eio_execute(), which
performs the open() system call and stores the result in the request
object.
4. Right after step 3, my code cancels the eio_open() call, so the
cancel flag is now 1.
5. etp_proc() pushes the request on the response queue and calls want_poll_cb().
6. At some point, etp_poll() is invoked. etp_poll() pops the
eio_open() response from the queue and calls ETP_FINISH.
7. ETP_FINISH sees that the cancel flag is 1, and returns. The
callback is never called, and libeio never attempts to close the file
descriptor returned by open(). The fd is now lost.


>> 2. The user data that I pass to eio functions is allocated on the
>> heap. Currently, I free the user data in the callback. If the callback
>> isn't called, then I would have to introduce additional book keeping
>> data structures and timers just to clean up the user data correctly.
>> This would introduce all sorts of new code paths, significantly
>> increasing the complexity of my codebase.
>
> Same thing, I don't see a lot of this in IO::AIO, in both cases, the
> resources will be freed when the request is destroyed.

I don't think we're talking about the same thing. It isn't a problem
in Perl because everything's garbage collected, but I'm not talking
about the Perl IO::AIO module, I'm talking about the libeio C library.
Libeio does not free the user data. I have to do it myself.

Consider this code:

    struct Foo {
        eio_req *request;
    };

    struct UserData {
        Foo *foo;
        char *str;
    };

    static int nop_done(eio_req *req) {
        UserData *data = req->data;
        Foo *foo = data->foo;

        // 'foo' might be a dangling pointer if we're canceled
        if (!EIO_CANCELLED(req)) {
            foo->request = NULL;
        }

        free(data->str);
        free(data);
        return 0;
    }

    Foo *foo_create() {
        Foo *foo = malloc(sizeof(Foo));
        foo->request = NULL;
        return foo;
    }

    void foo_perform_work(Foo *foo) {
        UserData *data = malloc(sizeof(UserData));
        data->foo = foo;
        data->str = strdup("hello");
        foo->request = eio_nop(0, nop_done, data);
    }

    void foo_destroy(Foo *foo) {
        if (foo->request != NULL) {
            eio_cancel(foo->request);
        }
        free(foo);
    }

It begins like this:

    Foo *foo = foo_create();
    foo_perform_work(foo);

Then, when some event occurs (e.g. user clicks on Cancel or something):

    foo_destroy(foo);

If I don't call the free()s inside nop_done, then the malloc()'ed
UserData and the strdupped "hello" string are lost, and we have a
memory leak.


> I would say doing
> it in the completion callback involves more bookkeeping.

Let me show you an example in which this is not the case, and that
doing things in the completion callback involves less bookkeeping.

In the previous code snippet, I assumed that the callback is always
called (which is what my patch does), even after cancellation. So the
only place where I need to free the UserData, is inside nop_done().
But consider the current situation, in which libeio might or might not
call the callback after cancellation. My code would have to become
this:

    struct Foo {
        eio_req *request;
    };

    struct UserData {
        Foo *foo;
        char *str;
    };

    static int nop_done(eio_req *req) {
        UserData *data = req->data;

        // ADDED: because foo_destroy() might already have cleaned
        // things up.
        if (data == NULL) {
            return 0;
        }

        Foo *foo = data->foo;

        // 'foo' might be a dangling pointer if we're canceled
        if (!EIO_CANCELLED(req)) {
            foo->request = NULL;
        }

        free(data->str);
        free(data);
        return 0;
    }

    Foo *foo_create() {
        Foo *foo = malloc(sizeof(Foo));
        foo->request = NULL;
        return foo;
    }

    void foo_perform_work(Foo *foo) {
        UserData *data = malloc(sizeof(UserData));
        data->foo = foo;
        data->str = strdup("hello");
        foo->request = eio_nop(0, nop_done, data);
    }

    void foo_destroy(Foo *foo) {
        if (foo->request != NULL) {
            eio_cancel(foo->request);

            // ADDED: because libeio might not call nop_done()
            UserData *data = foo->request->data;
            free(data->str);
            free(data);
            foo->request->data = NULL;
        }
        free(foo);
    }

Notice the ADDED lines. Now, I suddenly have to add free() calls to
foo_destroy(). I can no longer assume that the nop_done() is the only
place where I need to free things. Also, in nop_done(), I need to
check whether foo_destroy() hasn't already cleaned things up.

This is a simple example so the added code isn't too much, but in more
complex real-world code, having to free things from two places quickly
becomes a mess.

-- 
Phusion | Web Application deployment, scaling, and monitoring solutions

Web: http://www.phusion.nl/
E-mail: info at phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)



More information about the libev mailing list