c++11 std::atomic and memory_order_relaxed

bob · Post by **bob** » Sat Apr 05, 2014 11:37 pm

hgm wrote:
syzygy wrote:You have no clue.

http://cs.brown.edu/~mph/HerlihyM93/her ... tional.pdf
These guys are real computer scientists.
Btw, your link explicitly states other processes cannot see transactional stores before they are committed. So they are not really speculative stores at all. It is just postponing the actual write (=global visibility) to the time where it would no longer be speculative (like all out-of-order CPUs do). In other words, it states that Bob is right, and you are wrong.

This is a memory-based approach to solve the age-old transaction processing problem where you have to update a lot of different things at once, and you can't stand to deal with interleaved updates. So you post all sorts of stuff, get it ALL done, then tell the TPS to "commit". Which can fail if something was changed since you started, as it should. This transaction memory is just a hardware assist to make the same sort of thing work in memory, it STILL requires software to make it work, as the software does all the speculative changes, and then says "commit this."

This problem has only been around for 50+ years now. Airlines deal with it. Hotels'motels deal with it. Can't possibly allow you to "lock" a motel's booking database while you update it, too many others would pile up waiting. Ditto for a particular airline flight's seat availability data.

Don't know why we go down this path so often...

bob · Post by **bob** » Sat Apr 05, 2014 11:38 pm

syzygy wrote:
hgm wrote:
syzygy wrote:
bob wrote:
syzygy wrote:The hardware logic for implementing this speculative execution is now being reused to implement support for transactional memory in Haswell processors. This allows the implementaiton of explicit speculative stores (i.e. by the programmer).
Can we stop the nonsense? "by the programmer" means it is NOT a hardware speculative store.
You have no clue.

http://cs.brown.edu/~mph/HerlihyM93/her ... tional.pdf
These guys are real computer scientists.
Btw, your link explicitly states other processes cannot see transactional stores before they are committed. So they are not really speculative stores at all.
Hah. Definitions are to be bended, right?

I reinserted the precise context. Bob denied exactly what is disclosed in the linked paper. Instructions to be used by the programmer that make use of the speculative execution logic of out-of-order processors.

It is just postponing the actual write (=global visibility) to the time where it would no longer be speculative (like all out-of-order CPUs do). In other words, it states that Bob is right, and you are wrong.
Sure.
syzygy wrote:I suppose you would need hardware with some sort of double-threaded speculative execution that at the moment probably does not exist for this to make sense (and to allow the nonsensical result..).

syzygy wrote:Processors have been performing speculative writes since many years, but you probably mean writes that actually land in DRAM. But that wouldn't be necessary for the above scenario to occur. It is sufficient that two threads see each other's speculative writes, for example in a shared cache or store buffer.

It also does not have to be the processor logic that decides by itself to perform such writes speculatively. It could be the compiler explicitly making use of a speculative store instruction of the processor.

Again, as far as I know the scenario is not possible on hardware that exists today. It would also not seem to make much sense to involve two hardware threads in one speculative execution that must be rolled back for both threads if anything goes wrong. But maybe I am wrong and someone can come up with a valid reason to do this.
A transactional memory (hardware) design that allows other processors to see speculative writes:
http://www.cs.utexas.edu/~rossbach/pubs/tx-micro08.pdf
I think it prohibits the causality violation by requiring that cyclic dependencies among transactions are broken by restarting one or more transactions. It would obviously be feasible to not detect such problems, but it would lead to difficult to understand multithreaded memory ordering with causality violations.

Btw, I'm sure you continue to be right about this as well:
hgm wrote:But in real life of course there are more compelling laws than the C++11 standard, and physics could not care less about what the C standard allows. Don't expect causality to be violated just because some standard allows it.
Never mind any degree of intellectual honesty. As long as you can make yourself ignore the cognitive dissonance everything is just fine, right.

The instructions are NOT dealing with out of order speculating within a CPU. That is inter-CPU. And there is no inter-CPU speculation, only what the SOFTWARE does by using the transactional memory model.

Read first...

bob · Post by **bob** » Sat Apr 05, 2014 11:42 pm

syzygy wrote:
hgm wrote:
syzygy wrote:Hah. Definitions are to be bended, right?
If you want to bend them, fine. I will stick to the one and only valid definition, though:

A memory store is executed when loads by other memory users return its value.
I am talking about speculative stores. In your dictionary they have apparently been defined away. I suppose speculative execution does not exist, either.

In reality there are hundreds of technical papers discussing speculative stores. I happen to use the term as it is used in the art.

This whole nonsense about transactional memory is just obfuscation, and has absolutely nothing to do with the issue. Nothing is speculative there, and writes are not visible to any other core until after they are committed.
Depends on the implementation. In some implementations writes are invisible to other threads until the transaction is committed (lazy versioning). In other implementations writes are visible before the transaction is committed (eager versioning). In that case the threads seeing those writes will have to be aborted if the writing transaction does not get committed.

Also in the 'dependency-aware' scheme of the second link the data forwarded to the other CPU will only be used after it is certain (and thus not speculative) that the data will be committed. When there is a dependency cycle, it will abort.
Nope. If T1 writes to X and T2 reads from X, then T2 is dependent on T1 but not the other way around. No cyclic dependency. T2 can use the result, obviously, or it should not have seen it. T2 must be rolled back if T1 is.

In particular, it does not just assume an arbitrary link in the circular dependency chain is correct, to propagate it around in the hope this will confirm itself. Like you want to do.

syzygy wrote:I think it prohibits the causality violation by requiring that cyclic dependencies among transactions are broken by restarting one or more transactions. It would obviously be feasible to not detect such problems, but it would lead to difficult to understand multithreaded memory ordering with causality violations.

syzygy wrote:I am not "wanting" this. I am simply explaining how the strange result could happen. You insisted:
hgm wrote:But in real life of course there are more compelling laws than the C++11 standard, and physics could not care less about what the C standard allows. Don't expect causality to be violated just because some standard allows it.
so I explain that there is no law of physics that would prevent it.

The term speculative store is NOT used as you are using it. The typical use is what is done in the Intel CPU. Speculative execution fetches a write to memory, all the work is done (for example, an inc instruction where we fetch the value, add 1 to it, and then let the instruction sit until it is retired at which time the store is executed and L1 gets the new value. Never gets out of the CPU until the speculative condition has been resolved.

Speculative stores applied to software is a bit different. But YOU get to do all the work, NOT the hardware. You write something that actually gets to L1, then YOU are stuck with fixing it when you decide you should not have done the write. Hardware does nothing to help there.

So hardware speculative stores or software speculative stores? Pick 1. They are NOT interchangeable.

bob · Post by **bob** » Sat Apr 05, 2014 11:50 pm

syzygy wrote:
hgm wrote:
syzygy wrote:Call it what you want. I have explained how the result from the opening post can arise without breaking any laws of nature. Obviously it is fair to call this result "broken" and therefore to call any compiler/hardware combination that allows it "broken". But laws of nature aren't violated (and neither is the C++11 standard, as it seems).
It is broken, because it delivers results that are non-compliant with the standard. If that does not worry you, OK.
Can you give an example where it would deliver a result non-compliant with the standard? The example of the OP is compliant as far as I understand. At least, that was the assumption.

You are so fixated on arguing that your compiler/hardware scheme would work to produce the non-sensical 42 out of thin air, that you are completely blind to the fact that this scheme will produce outright invalid results in the vast majority of other cases.
I don't think so. I am assuming a compiler that knows what it is doing and is not e.g. speculatively executing divisions by zero leading to exceptions etc.

Obviously it possible to abuse speculative modes to get wrong results, like it is possible to use regular locks to deadlock a program.

Feel free to give an example.

Just to be clear "my" system commits both threads because:
- both conditions evaluate to true (yes, based on forwarded speculative values);
- assuming the speculative stores are valid, everything else is valid as well.

If you can get away with

if (x == 42) y = 42

and know that initially x=0, and yet Y can STILL be 42, that is broken.

Just as if (x==42) y=42 in one thread

and

if (y==42) x=42 in the other thread

can NEVER produce x=42 AND y=42 if x and y both start off at zero. Only way is for the compiler to do something stupid like:

t = x;
x = 42;
if (y != 42) x = t

With threads (the subject of this thread, C++11 threads) the above is broken. NOT by the hardware which will NEVER do that write to X unless the value in y == 42. But by the compiler, which thought the above transformation is OK. Might be OK for a single thread, certainly not ok for more than one. And since we are talking about atomic stuff in C++11, the compiler would be broken. That the standard says this is OK is nonsensical enough, but that the compiler would produce code so that x = y == 42 is ridiculous in terms of optimization.

bob · Post by **bob** » Sun Apr 06, 2014 12:06 am

Here is the thing that you originally quoted:

[Note: The requirements do allow r1 == r2 == 42 in (x, y initially zero):

Thread 1:
r1 = x.load( memory_order_relaxed );
if ( r1 == 42 ) y.store( r1, memory_order_relaxed );
Thread 2:
r2 = y.load( memory_order_relaxed );
if ( r2 == 42 ) x.store( 42, memory_order_relaxed );
Implementations are discouraged from allowing such behavior. —end note]

Note the bolded/italicized part of the quote. The person writing that CLEARLY understood that (a) this will never be done in hardware and (b) that a compiler could take optimization too far and cause this (implementations refers to the compiler, obviously.)

So, HE knows the hardware will NEVER speculatively store like this because it would break everything. He clearly called out the compiler writers to not take a speculative store optimization this far so that it will produce such a nonsensical result. Why are we STILL arguing about the hardware at all? This doesn't apply to any discussion involving transactional memory, since the PROGRAM chooses when to commit things done in a transaction group, the hardware has no say-so at all, it just has to be able to hide the changes until they are committed.

This has gone FAR afield from reality.

syzygy · Post by **syzygy** » Sun Apr 06, 2014 12:16 am

bob wrote:Do you REALLY know anything about transactional memory?

As a matter of fact, I do. Certainly a lot more than do you.

Speculative CPUs as you describe can not work, period. A broken compiler can certainly "excite" the 2x42 bug, but NOT speculative hardware.

It is obvious that what I have written has not penetrated your skull, at all.

Nowhere did I write that hardware would do this on its own. I have consistently explained that it would have to be a compiler making use of particular hardware. You are just hopeless.

bob · Post by **bob** » Sun Apr 06, 2014 1:44 am

syzygy wrote:
bob wrote:Do you REALLY know anything about transactional memory?
As a matter of fact, I do. Certainly a lot more than do you.

Speculative CPUs as you describe can not work, period. A broken compiler can certainly "excite" the 2x42 bug, but NOT speculative hardware.
It is obvious that what I have written has not penetrated your skull, at all.

Nowhere did I write that hardware would do this on its own. I have consistently explained that it would have to be a compiler making use of particular hardware. You are just hopeless.

You started the hardware idea. You talked about one core speculatively doing a write. It will NEVER speculatively write something since once that hits L1, it is over. Transactional memory is software-controlled. Totally. Your example even included a "commit" in the source code. That is NOT "hardware speculation". You have gone on and on about how this speculation could work, stupid ideas like one core causing another core to go into "speculative mode" as well. Two of us pointed out why this is a potential deadlock problem, but somehow you see to think there is some magic bullet that kills that problem. There isn't.

So you might THINK you know something about this stuff, but thinking is it. Hardware does NOT do anything speculatively that can be seen outside that specific CPU, for obvious reasons. There is no reason to think that it will at some point in time in the future either, for the same reasons it is not done today.

Even the guy that gave the example you quoted realized that, and specifically pointed out that the implementation (the compiler) should ALSO not do such "speculative stores" as part of their optimization, even though the rather silly C++11 spec allows it.

syzygy · Post by **syzygy** » Sun Apr 06, 2014 2:10 am

bob wrote:You started the hardware idea.

Just read
http://talkchess.com/forum/viewtopic.ph ... 733#564733
http://talkchess.com/forum/viewtopic.ph ... 844#564844
Read both posts. Very slowly, very carefully. Then read them again.

hgm · Post by **hgm** » Sun Apr 06, 2014 8:56 am

syzygy wrote:What "my" system requires is that speculative dependencies between transactions are detected (such as reads of speculative values), and that cyclic dependencies between threads are permitted but must result in these threads being committed all together or not at all.

The problem is that it cannot know that. Your 42 example hinges on the fact that when there is a test on speculatively transferred data that happens to evaluate to true, you know that committing the data will clear all dependencies in the cycle, and eventually propagate back to you to confirm the original speculation. But you cannot know that. Part of that dependency cycle is in the other CPU. And that other CPU could have been speculating on something that is completely independent of whether you will commit the store or not. The only thing you know is that you obtained a value (42) which at any time in the future can be withdrawn. Committing your own results and send the missiles flying, just based on the blind gamble that doing so will prevent the source of the speculative data from retracting it, will in general not perform according to the standard.

It seems that you assume the hardware will at run-time be able to analyze complete sections of code from all involved CPUs, and detect deadlocks that could be solved by 'self-confirmation'. For which iron-clad algorithms do not even exist (even if we want to ignore the ridiculous performance hit you would take for applying them when they did).

hgm · Post by **hgm** » Sun Apr 06, 2014 9:18 am

bob wrote:If you can get away with

if (x == 42) y = 42

and know that initially x=0, and yet Y can STILL be 42, that is broken.

On the other hand, for

do { y = 42; } if(y == 42);

this would be a normal result.

c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed