syzygy wrote:I had second look. The idea seems to be as follows.
Code: Select all
r1 = x.load( memory_order_relaxed );
if ( r1 == 42 ) y.store( r1, memory_order_relaxed );
can be rewritten as
Code: Select all
r1 = x.load( memory_order_relaxed );
if ( r1 == 42 ) y.store( 42, memory_order_relaxed );
Now the store to y can be performed speculatively before the load and the comparison take place. It needs to be rolled back without other threads ever seeing it if the comparison results in false.
Similar for thread 2.
I suppose you would need hardware with some sort of double-threaded speculative execution that at the moment probably does not exist for this to make sense (and to allow the nonsensical result..).
Similar examples are discussed here:
http://lwn.net/Articles/586838/
I don't believe any hardware ever has, nor ever will, speculatively perform a write to memory. Lots of reasons.
(1) who would want to speculatively mark a cache block as dirty, because it would be quite difficult to "undirty it" later since other valid writes could have also set the block to dirty;
(2) who would want to deal with I/O where a speculative write changes a buffer that is written out. Now you not only have to undo the write to cache/memory, you also have to undo the file-system write which is way beyond the CPU's ability to do internally.
Hennessy/Patterson has a good discussion on speculation and where it should be avoided and why. Memory writes are at the top of the list. Right above memory reads that would produce a TLB miss or, even worse, a "page invalid" page fault.
I think your first guess was correct, that r1=r2=42 is legal, even if not possible, given existing (or future) hardware.
I suppose a compiler could change this (I am using normal memory reads to avoid excessive typing with the relaxed order stuff):
r1 = x
temp = y
if (r1 == 42)
y = 42
else
y = temp
Now, good branch prediction might correctly guess that most of the time x == 42 and do the write and the read, speculating on the write before the read completes and the conditional is actually evaluated.
I can't imagine a compiler doing that, but I suppose one might go that far for some obscure reason, and the standard says "this is ok". Even if it is slower, even if it might break a program that does asynchronous I/O, etc. In other words, the new standard is just like the old standard. Leaves a lot of stupid holes.