Strict aliasing rule strikes back

Marc Lehmann schmorp at schmorp.de
Fri May 22 09:09:57 CEST 2015


On Thu, May 21, 2015 at 09:43:24PM -0700, Andrey Pokrovskiy <wonder.mice at gmail.com> wrote:
> But I'm not saying that there is something wrong with libev and that
> those warnings should be fixed asap (surprise-surprise).

Indeed :)

In fact, those warnings are slowly being fixed, as far as I can tell, at
the place where they needed fixing, namely inside the compilers: AFAICT,
none of these were ever correct, or if they were, they were misleading.

For example, the gcc warning text is:

   dereferencing type-punned pointer will break strict-aliasing rules

and is wisely not emitted by default nor with -W. This text is unfortunate
due to multiple reasons, one being that "type-punned pointer" is not an
ISO C concept, so it's unclear what it refers to (it has no standard
definition). Furthermore, if you actually read_ the text closely, it says
"if you do that, then you break". It does _not_ say "you break, because
you did that", a subtle distinction (I have no evidence that the author of
this message was aware of this distinction though).

As for libev, the pointer isn't "type-punned" (again, for lack of struct
definition of the term, it's hard to tell, I am using the wikipedia
definition of the day), so no type-punned pointer will be dereferenced and
aliasing rules are not being broken - the warning is actually correct,
because it doesn't claim to apply to the code it warns about, which makes
it a spurious warning.

In your second example (ev_cb_set), we did not really dereference a
type-punned pointer either, but we did break the aliasing rules: EITHER
the code in ev.c which casts the ev_io * to an ev-watcher * and uses it
to access the callback does, OR ev_cb_set does, but it's not at all clear
which one is the type-punned access, and which is the "real" access.

Which is why I think that the "type-punned" concept isn't that useful, and
this might also be the reason why the C standard doesn't talk about that,
but about aliasing.

This is not altogether different than for many other warnings, which
really warn about potential problems (for example, -Wreorder, ). I think
the problem is that a lot fewer people understand the warning (and
conservatively treat it as an error), and that it is not at all trivial to
avoid (so we can't just do some little magic and silence it).

For example, gcc's -Wparentheses warns about this potentially (we don't
know) correct code:

   if (a = b) ...;

But not abut this one:

   if ((a = b)) ...;

So this warning is easy to avoid. I don't know of such an easy workaround
for the type-punned warnings.

> #define ev_init(ev,cb_) do {                   \
> -  ((ev_watcher *)(void *)(ev))->active  =      \
> -  ((ev_watcher *)(void *)(ev))->pending = 0;   \
> +  (ev)->active  = \
> +  (ev)->pending = 0;   \

Because C will get angry: while this doesn't trigger strict aliasing
warnings, it actually does break the rules (and does cause code to be
miscompiled).

> Since ev must be a watcher, it will have "active" and "pending"
> members and two-steps cast is not really necessary.

The problem is that some compilers interpret the C rules in the way that
this active member is not the same one as accessed in the libev code
later, so all accesss have to be against the ev_watcher struct, everywhere
(I think this is a valid but not conclusive interpretation of the rules,
but anyways, its what is done).

> -# define ev_set_cb(ev,cb_)                   (ev_cb_ (ev) = (cb_), memmove (&
>                           ((ev_watcher *)(ev))->cb, &ev_cb_ (ev),
> sizeof (ev_cb_ (ev))))
> +# define ev_set_cb(ev,cb_)                   (ev)->cb = (cb_)
> 
> Why to use memmove when "ev->cb" and "cb_" must have the same type and
> are simply assignable?

Note that the memmove is new, and that this was a part of libev where we
considered the effects carefully, as it was the only (known) place where
struct aliasing rules were actually broken, but we were very confident that a
compiler wouldn't break, and that a memmove might be counterproductive.

Nowadays, we think that memmove is a no-op, apart from telling the
compiler that both locations alias.

Note that, while the type of the variables is the same, the aliasing rule is
about the types used to access them, and ((ev_watcher *)(ev))->cb has a
different type then &ev_cb_ (ev), so these variables (although they reside at
the same location) do not alias.

The memmove tells the compiler that they actually do alias, and is
otherwise compiled away (at least with reasonably current gcc and clang
compilers).

> Curiously enough, I noticed that those memmove's were added exactly to
> "fix a potential aliasing issue".

Indeed, and as you might want to note, no compiler has ever warned about
that, although we (the libev authors) were aware of it...

> I'm sure all those things are done for a reason, and I'm wondering
> what those reasons are. Maybe somebody could write a few lines
> explaining why libev needs those casts and memmove's?

Basiclaly, without the memmove, the following order of instructions
(potentially resulting from inlining for example):

   ev_set_cb (ev, xcb);
   ev_watcher *wev = (void *)ev;
   wev->cb (...);

could be legally optimised to:

   ev_watcher *wev = (void *)ev;
   wev->cb (...);

Because wev->cb is never written and ev->cb is only written but never
read.  The memmove aliases both, so the compiler knows that ev->cb is
written and later read, and likewise, wev->cb is written before.

It is very unlikely that the compiler will ever be able to make this
optimisation in practise, certainly when libev is used as a library.

Likwise, you could probably remove the casts from macros like ev_init, and
as long as you use it as a library, it would work. However, when embedded,
gcc is able to "miscompile" the intent of the code then, due to inlining.

(And gcc already is able to inline libraries by doing whole program
optimisation. "Nobody" uses this at the moment, because it is hard to do
so on a typical multi-user system, but it's better to be safe than sorry,
especially if the overhead is nil).

All this could also have been avoided by embedding the ev_watcher struct
inside the ev_io etc. structs for example, but that would have caused
structure sizes to grow due to padding on some platforms, and we really
wanted to squeeze out those extra 4 bytes from every watcher.

I am not sure I hit the target with my answer, but since this is a
complicated and subtle question, if anything is unclear, feel free to
shoot more questions, or ask for clarifications.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp at schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\



More information about the libev mailing list