ev_stat inotify implementation might miss events

Yoann Vandoorselaere yoann.v at prelude-ids.com
Tue Jan 26 19:53:18 CET 2010

Le mardi 26 janvier 2010 à 18:41 +0100, Marc Lehmann a écrit :
> On Tue, Jan 26, 2010 at 05:05:41PM +0100, Yoann Vandoorselaere <yoann.v at prelude-ids.com> wrote:
> > > That is impossible unless you make everything synchronous, so inotify
> > > cannot guarantee it.
> > 
> > Again, as I understand it, Inotify events are generated at syscall
> > levels, so they are synchronous to a syscall.
> Again, your understanding is flawed - that would be a security bug :)

After looking at Linux kernel 2.6.32 kernel code, I can confirm that my
understanding is not flawed. 

I guess you then need to explain why this is a security bug ;)

static inline void fsnotify_modify(struct dentry *dentry)
        struct inode *inode = dentry->d_inode;
        __u32 mask = FS_MODIFY;

        if (S_ISDIR(inode->i_mode))
                mask |= FS_IN_ISDIR;

        inotify_inode_queue_event(inode, mask, 0, NULL, NULL);

        fsnotify_parent(dentry, mask);
        fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);

ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
        ssize_t ret;

        if (!(file->f_mode & FMODE_WRITE))
                return -EBADF;
        if (!file->f_op || (!file->f_op->write && !file->f_op->aio_write))
                return -EINVAL;
        if (unlikely(!access_ok(VERIFY_READ, buf, count)))
                return -EFAULT;

        ret = rw_verify_area(WRITE, file, pos, count);
        if (ret >= 0) {
                count = ret;
                if (file->f_op->write)
                        ret = file->f_op->write(file, buf, count, pos);
                        ret = do_sync_write(file, buf, count, pos);
                if (ret > 0) {
                        add_wchar(current, ret);

        return ret;

Google Source search is your friend


> > If inotify already reported the event, I don't think it will update the
> > WD associated with it. Theses event are already stored in the userspace
> > buffer declared by libev, and you cannot expect the Kernel to update the
> > WD associated with event that you have already read().
> That would be a kernel bug - when I ask the kernel to give me events for a
> specific path, and it doesn't give me them, that's a bug in the kernel.

It does. The problem is that libev read a pile of Inotify events, but
every events in the pile, except the first one will get ignored.

> > infy_cb() loop through the events, and call infy_wd(), which will
> > sucessfuly lookup the first event. However, since libev registered WD is
> > changed at this point, the second iteration of infy_wd() won't find the
> > correct ev_stat handler.
> That's fine, because libev is interested only in events that arrive after
> the ev_stat_stat call. If your kernel doesn't deliver them, it is buggy.
> > It would be interesting for libev to provide the ability to keep
> > monitoring the file until no application reference it anymore
> > (st_nlink).
> I don't have any interest in making libev a linux-only library, sorry.
> If you want that, you cna use inotify, it's not that hard to use.
> Libev needs to stay portable.

The behavior described in my previous mail can be emulated with stat(),
and is fully portable, as per POSIX specification:

"Unless otherwise specified, the structure members st_mode, st_ino,
st_dev, st_uid, st_gid, st_atime, st_ctime, and st_mtime shall have
meaningful values for all file types defined in this volume of IEEE Std
1003.1-2001. The value of the member st_nlink shall be set to the number
of links to the file."

For the record, Prelude-LML used to do it before using libev. And it has
been working on much of the architecture available out there.

> > If the file has been deleted but is still referenced by others
> > applications, then continue monitoring it, and open a new ev_stat()
> > handler in case any application re-create the filesystem file.
> That behaviour would be a bug in libev.
> If you want inotify behaviour, you have to use inotify. ev_stat watchers
> do what they do because that behaviour is portable.
> What you describe will not help ev_stat in any way - was that what you had in
> mind (make libev linux-only) or do you have a real idea on how to improve it?
> (Obvious improvements would be to avoid rearming if the file wasn't
> renamed, but that's just an optimisation).

The improvement I am talking about consist of moving the decision into
the application hands. There is no loss of functionality. I could also
argue that automatically rearming a file whose st_nlink is higher than 0
can be considered as a bug.

1) "/file" deleted, st_nlink == 0
   -> libev automatically rearm the event.

2) "/file" deleted, st_nlink > 0
   -> libev continue monitoring "/file"
   -> application can open a new ev_stat watcher for an eventual,
upcoming filesystem "/file"

Yoann Vandoorselaere | Directeur Technique/CTO | PreludeIDS Technologies
Tel: +33 (0)1 40 24 65 10                      Fax: +33 (0)1 40 24 65 28

More information about the libev mailing list