MSVC memory barrier issue.

Marc Lehmann schmorp at schmorp.de
Sun Jan 5 03:42:30 CET 2014


On Sat, Jan 04, 2014 at 11:02:00AM -0800, Nicolas Noble <nicolas at grumpycoder.net> wrote:
> In a nutshell, what I want to say is that to POSIX - and Visual C's default
> is that mode 

Do you have a reference for that (for both posix and visual studio)? I
didn't even know that POSIX had anything to say on volatile at all. Are you
confusing this with the iso mode for volatiles (which isn't the default).

In any case, no, I do not believe that visual C has anything like a posix
mode, that would be great if it were true (finally, snprintf :).

> volatile is merely an un-optimization request from the
> programmer to the compiler. It in no means affects the internal state of
> the CPU. And that was the crux of the problem here: internal CPU caches.

Did you read the msdn reference I gave you? I am pretty sure you
didn't. And neither my reply, which explicitly says that visual studio
gives volatile acquire/release semantics, or at least, is clearly
documented so.

> will re-iter saying this several time over the course of that e-mail,
> because it's really important in my opinion. Compiler-barrier and
> CPU-barriers, while sounding very similar, aren't being issued the same
> way. C's volatiles, or Visual C's _ReadWriteBarrier() are only
> compiler-barriers. C11's _Atomics, or Visual C's MemoryBarrier() are
> CPU-barriers.

As is volatile on visual C, according to the official
documentation. Again, please read it.

> http://msdn.microsoft.com/en-us/library/12a04hfd%28v=vs.90%29.aspx says
> that volatiles work as memory barriers,

No, please read it. It explicitly says it has acquire/release semantics
and "This allows volatile objects to be used for memory locks and releases
in multithreaded applications."

It doesn't say you need to disable posix mode for this, and it clearly
says it is a microsoft-specific extension (volatile already has compiler
memory barrier semantics in standard C).

> which is true, but they work as compiler-memory barriers, not as
> CPU-memory barriers.

The page explicitly disagrees with you on that. Where does this
confilcting information come from?

> compiler to avoid optimizing around and to not re-order or cache reads and
> writes to that variable, it won't affect the CPU's internal caches, and
> reading or writing a volatile variable will not issue CPU synchronization
> instructions.

Again, microsoft explicitly disagrees with you, as a simple google search
shows:

http://blogs.msdn.com/b/oldnewthing/archive/2011/04/19/10155452.aspx

 If you are using Visual Studio 2005 or later, then you don't need the weird
 Interlocked­Read­Acquire function because Visual Studio 2005 and later
 automatically impose acquire semantics on reads from volatile locations

http://msdn.microsoft.com/en-us/library/12a04hfd.aspx

 When the /volatile:ms compiler option is used—by default when architectures
 other than ARM are targeted—the compiler generates extra code to maintain
 ordering among references to volatile objects in addition to maintaining
 ordering to references to other global objects. In particular:

However, I found that this isn't true on arm for visual studio 2013:

http://msdn.microsoft.com/en-us/library/jj204392.aspx

    /volatile:ms

     Selects Microsoft extended volatile semantics, which add memory ordering
     guarantees beyond the ISO-standard C++ language. Acquire/release
     semantics are guaranteed on volatile accesses. However, this option also
     forces the compiler to generate hardware memory barriers, which might
     add significant overhead on ARM and other weak memory-ordering
     architectures. If the compiler targets any platform except ARM, this is
     default interpretation of volatile.

> In the example, they implement a spin lock semaphore using a volatile
> boolean. Thread1 is busy-waiting on the boolean variable to become true,
> and Thread2 will eventually set it to true. The trick however is that it's
> busy looping, meaning that it'll eventually get around the CPU caches in an
> SMP environment.

libev only relies on atomicity, it doesn't implement locks.

> What I am saying is that volatiles are never enough to provide SMP
> synchronization.

In standard C, yes, but mcirosoft visual studio explicitly implements
that.

I didn't know that later versions of visual studio make this configurable
(if you look at earlier documentation you cna see the volatile switch
didn't exist back then and the behaviour defaulted to microsoft
behaviour).

> In the case of libev, here is what was happening.

Thank you for all your explanations, but I do know what atomic memory
accesses are. The problem is not that we didn't know what we were doing, the
problem is that microsoft explicitly documents that volatile accesses are
atomic in multi-cpu environments, and we relied on that.

Apparently, visual studio 2013 makes this a configurable option (while
still defaulting to acquire/release semantics).

> about the fact ECB_MEMORY_FENCE* were compiler-barriers only when using
> Visual C's _ReadWriteBarrier().

Again, the problem here is not that libev used the wrong memory fence, the
problem here is that microsoft changed semantics in later compiler versions
and broke it.

In the visual c version I developed with at the time, _ReadWriteBarrier
was an actual #define in the header file and compiled to a lock xchg
instruction or equivalent.

It is good to know that microsoft broke this in later releases, but,
again, that just means that one should avoid microsoft at all cost,
because they actively keep breaking things that were documented to work in
earlier releases.

> The end result for the actual race condition relies on the CPU caches

Which is a compiler bug which we can work around at some extra cost. No
Problem.

> It's pretty visible here that the ECB_MEMORY_FENCE have emitted absolutely
> no instruction whatsoever,

The problem is that volatile is documented to do what is necessary on your
platform, and ecb_memory_fence requires volatile.

> and the writes and reads on pipe_write_skipped
> and pipe_write_wanted are simple movs and cmps, without any memory bus lock.

We don't actually need any bus lock in libev to make it work
correctly. All that is needed is that memory accesses to volatile have
acquire/release semantics.

> have the same caches - since Intel CPUs are pretty scorched-earth-method
> when it comes to cache-lines purging, a simple lock anywhere will do.

And I still wonder why we deserve this lecture - a simple look at ecb.h
at the plac where you patched shows that we are aware of that, and that
somethign else must go on that you didn't get. In this case, we rely on
the documented guarantee that volatile is atomic w.r.t. multi-core cpus
for example.

> So, again, I want to emphasis that volatiles aren't atomics.

But they officially are. The fatc that they aren't is a bug in your
compiler, as they are explicitly dicumented to be atomic.

> They don't guarantee CPUs caches synchronization,

They do. The fact that microsoft fails to deliver on this guarantee is
telling, but they still guarantee that the cpu caches synchronise.

> optimizations. Which is why C11 and C++11 introduced <stdatomic.h> and
> <atomic> respectively.

No, you gte it all backwards. The volatile semantics were introduced
*because* C11 and C++11 didn't exist yet - microsoft volatile extensions
came first, it's not the other way round.

> And Visual C's _ReadWriteBarrier() is the same as a volatile compiler
> hint. Since the 2008 release, it no longer issues CPU cache flushes,

See, that's the crux. Microsoft broke another guarantee.

> it simply is a broader volatile.

If it were a broader volatile, it would work, as volatile on visual
C is documented to result in atomic read and write accesses (but not
read-and-write).

> As I said in the preamble, sorry if I sound condescending, but I rather
> wanted to make sure we were on the same page than leaving any doubt around.

Yes, you sound very condescending, especially as a quick look just a few
lines above or below your patch would have shown you that we are quite aware
of memory locks, so all your explanations about memory races are just
condescending, as even you should have been able to read the microsoft
documentation, google a bit.

Worse, you are obviously aware that microsoft broke barrier semantics, so why
lecture us about race conditions that were introduced by microsoft in our
code that didn't have race conditions before.

Shouldn't you explain all this to your compiler vendor?

Now to something constructive - does visual studio 2008, the version you
refer to, already have the volatile:iso option, does it enable it by
default (contrary to documentation), and/or are you runing on arm, where
the option apparently is off by default?

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp at schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\



More information about the libev mailing list