c++11 std::atomic and memory_order_relaxed

bob · Post by **bob** » Sat Apr 05, 2014 2:50 am

syzygy wrote:
bob wrote:2. Potential for indefinite postponement if one core is waiting on the other to commit stuff, but the other is working on another process entirely;
This is a complete non-issue.

In multithreaded programming potential for deadlock is everywhere. One thread takes a lock and never releases it, another thread will hang.

My chess program will NEVER deadlock. Your hardware will deadlock frequently. And there is no good deadlock avoidance methodology available. Check any operating system book. I don't mind deadlocks if I cause them due to bugs. I do NOT want deadlocks if they can happen spontaneously with out my having made ANY programming mistakes. This is apples and oranges.

If you have transactions that must synchronise when committing, one thread will also hang if the other thead never attempts to commit. Same story.

The programmer must write a program that works. The compiler must generate code that conforms to the program. If the compiler sees a way of speeding up the program using speculative execution that requires some synchronisation at commit time, that is just fine. (But preferably the compiler will avoid generating code that appears to break causality.)

Rules for committing data

With thread-level speculative execution, tasks are committed according to the following rules:

- Before a task is committed, the data is in a speculative state.
- Tasks are committed in program order.
- Therefore, a later task in program order can only be committed when all the earlier tasks have been committed. If a thread running a task encounters a conflict, all the threads running later tasks must roll back and retry. Eventually, all tasks are committed.
Oh gosh, speculatively executing threads on Blue Gene/Q may have to wait for other threads to commit first. That might lead to deadlock, yikes!!

On blue gene it is ALL software issues. Have you looked at this architecture in any detail? Supporting transactional memory is one thing. Supporting the kind of speculation YOU are describing is NOT done in blue gene or any other machine on the planet, it is a pure software issue. But who really cares? Anyone can write code that will break. But if the program is correct, I expect the hardware to execute it without extraneous deadlocks. Blue gene will certainly do that. As will any current Intel, AMD, etc system available.

bob · Post by **bob** » Sat Apr 05, 2014 2:53 am

syzygy wrote:
hgm wrote:
syzygy wrote:You have never heard about transactional memory?
No. Is that another physics-violating theoretical construct?
Ok, so not much reason to continue this discussion.
From the beginning of this thread:
syzygy wrote:The hardware logic for implementing this speculative execution is now being reused to implement support for transactional memory in Haswell processors. This allows the implementaiton of explicit speculative stores (i.e. by the programmer).

Can we stop the nonsense? "by the programmer" means it is NOT a hardware speculative store. As a programmer, I can do speculative reads to memory, speculative writes to memory, speculative reads from disk and speculative writes to disk. But it becomes MY problem to undo whatever was done should the speculation be proven wrong. That is NOT the same thing as doing this in hardware, and Haswell certainly does NOT do that at all. YOU have to initiate it.

See e.g. http://cs.brown.edu/~mph/HerlihyM93/her ... tional.pdf
Haswell implements it. Not in a way that would be needed for the example to "work", but I already said this:
syzygy wrote:I suppose you would need hardware with some sort of double-threaded speculative execution that at the moment probably does not exist for this to make sense (and to allow the nonsensical result..).

For reasons that are still obscure (as nothing can ever be gained by it, and it always thoroughly wrecks things, most of all performance) you want speculatively executed stores to be seen by other cores.
I am not "wanting" this. I am simply explaining how the strange result could happen. You insisted:
hgm wrote:But in real life of course there are more compelling laws than the C++11 standard, and physics could not care less about what the C standard allows. Don't expect causality to be violated just because some standard allows it.
so I explain that there is no law of physics that would prevent it.

1. Those other cores treat loading that data also as speculation, and must thus wait until the core that stored it commits it before it can commit any of the stuff that depended on it.
2. The other cores treat data from a speculative write as real.
The idea is simply that the two transactions get committed if and only if both conditions are true. Both inevitably evaluate the condition. If one or both of them are false, abort and re-execute whatever is needed, for example non-speculatively.

syzygy · Post by **syzygy** » Sat Apr 05, 2014 3:27 am

bob wrote:
syzygy wrote:
bob wrote:Your example will NEVER happen unless a compiler is broken.
You seem to have forgotten what this thread was all about.

The OP mentions some strange behaviour that is not explicitly forbidden by the standard (or at least not by a draft of the standard).

Some people argue that it would be physically impossible to happen. Either because of laws of physics imposing causality, or because the hardware could not be built.

I simply explain that such hardware can very well be built. Hardware on which a compiler could make the example happen. It would be a simple variation of existing hardware.

That's it. Of course the usual tactic applies: argue that something is false that I have simply never said. Nowhere did I write that hardware would do this all by itself. Nowhere did I write that a compiler will or should do it.
Such hardware can not, will not be built. Two of us have explained the problems to you repeatedly, but you seem to ignore what is written.

Yes, I am talking to two guys that have no clue about modern hardware techniques. Just forget it... It is more interesting to talk to a stone wall.

syzygy · Post by **syzygy** » Sat Apr 05, 2014 3:31 am

bob wrote:
syzygy wrote:The hardware logic for implementing this speculative execution is now being reused to implement support for transactional memory in Haswell processors. This allows the implementaiton of explicit speculative stores (i.e. by the programmer).
Can we stop the nonsense? "by the programmer" means it is NOT a hardware speculative store.

You have no clue.

http://cs.brown.edu/~mph/HerlihyM93/her ... tional.pdf
These guys are real computer scientists.

hgm · Post by **hgm** » Sat Apr 05, 2014 8:42 am

syzygy wrote:I simply explain that such hardware can very well be built. Hardware on which a compiler could make the example happen. It would be a simple variation of existing hardware..

And you were proven wrong. The compiler you propose was broken.

You seem to have a funny notion of what 'optimization' means. A change in a program that causes its results to differ from what they should be is not an optimization, but broken.

What you propose simply does not work. It cannot work. Ever.

hgm · Post by **hgm** » Sat Apr 05, 2014 9:00 am

syzygy wrote:You have no clue.

http://cs.brown.edu/~mph/HerlihyM93/her ... tional.pdf
These guys are real computer scientists.

Btw, your link explicitly states other processes cannot see transactional stores before they are committed. So they are not really speculative stores at all. It is just postponing the actual write (=global visibility) to the time where it would no longer be speculative (like all out-of-order CPUs do). In other words, it states that Bob is right, and you are wrong.

syzygy · Post by **syzygy** » Sat Apr 05, 2014 11:15 am

hgm wrote:
syzygy wrote:
bob wrote:
syzygy wrote:The hardware logic for implementing this speculative execution is now being reused to implement support for transactional memory in Haswell processors. This allows the implementaiton of explicit speculative stores (i.e. by the programmer).
Can we stop the nonsense? "by the programmer" means it is NOT a hardware speculative store.
You have no clue.

http://cs.brown.edu/~mph/HerlihyM93/her ... tional.pdf
These guys are real computer scientists.
Btw, your link explicitly states other processes cannot see transactional stores before they are committed. So they are not really speculative stores at all.

Hah. Definitions are to be bended, right?

I reinserted the precise context. Bob denied exactly what is disclosed in the linked paper. Instructions to be used by the programmer that make use of the speculative execution logic of out-of-order processors.

It is just postponing the actual write (=global visibility) to the time where it would no longer be speculative (like all out-of-order CPUs do). In other words, it states that Bob is right, and you are wrong.

Sure.

syzygy wrote:I suppose you would need hardware with some sort of double-threaded speculative execution that at the moment probably does not exist for this to make sense (and to allow the nonsensical result..).

syzygy wrote:Processors have been performing speculative writes since many years, but you probably mean writes that actually land in DRAM. But that wouldn't be necessary for the above scenario to occur. It is sufficient that two threads see each other's speculative writes, for example in a shared cache or store buffer.

It also does not have to be the processor logic that decides by itself to perform such writes speculatively. It could be the compiler explicitly making use of a speculative store instruction of the processor.

Again, as far as I know the scenario is not possible on hardware that exists today. It would also not seem to make much sense to involve two hardware threads in one speculative execution that must be rolled back for both threads if anything goes wrong. But maybe I am wrong and someone can come up with a valid reason to do this.

A transactional memory (hardware) design that allows other processors to see speculative writes:
http://www.cs.utexas.edu/~rossbach/pubs/tx-micro08.pdf
I think it prohibits the causality violation by requiring that cyclic dependencies among transactions are broken by restarting one or more transactions. It would obviously be feasible to not detect such problems, but it would lead to difficult to understand multithreaded memory ordering with causality violations.

Btw, I'm sure you continue to be right about this as well:

hgm wrote:But in real life of course there are more compelling laws than the C++11 standard, and physics could not care less about what the C standard allows. Don't expect causality to be violated just because some standard allows it.

Never mind any degree of intellectual honesty. As long as you can make yourself ignore the cognitive dissonance everything is just fine, right.

hgm · Post by **hgm** » Sat Apr 05, 2014 2:56 pm

syzygy wrote:Hah. Definitions are to be bended, right?

If you want to bend them, fine. I will stick to the one and only valid definition, though:

A memory store is executed when loads by other memory users return its value.

This whole nonsense about transactional memory is just obfuscation, and has absolutely nothing to do with the issue. Nothing is speculative there, and writes are not visible to any other core until after they are committed. Also in the 'dependency-aware' scheme of the second link the data forwarded to the other CPU will only be used after it is certain (and thus not speculative) that the data will be committed. When there is a dependency cycle, it will abort.

In particular, it does not just assume an arbitrary link in the circular dependency chain is correct, to propagate it around in the hope this will confirm itself. Like you want to do.

syzygy · Post by **syzygy** » Sat Apr 05, 2014 3:13 pm

hgm wrote:
syzygy wrote:Hah. Definitions are to be bended, right?
If you want to bend them, fine. I will stick to the one and only valid definition, though:

A memory store is executed when loads by other memory users return its value.

I am talking about speculative stores. In your dictionary they have apparently been defined away. I suppose speculative execution does not exist, either.

In reality there are hundreds of technical papers discussing speculative stores. I happen to use the term as it is used in the art.

This whole nonsense about transactional memory is just obfuscation, and has absolutely nothing to do with the issue. Nothing is speculative there, and writes are not visible to any other core until after they are committed.

Depends on the implementation. In some implementations writes are invisible to other threads until the transaction is committed (lazy versioning). In other implementations writes are visible before the transaction is committed (eager versioning). In that case the threads seeing those writes will have to be aborted if the writing transaction does not get committed.

Also in the 'dependency-aware' scheme of the second link the data forwarded to the other CPU will only be used after it is certain (and thus not speculative) that the data will be committed. When there is a dependency cycle, it will abort.

Nope. If T1 writes to X and T2 reads from X, then T2 is dependent on T1 but not the other way around. No cyclic dependency. T2 can use the result, obviously, or it should not have seen it. T2 must be rolled back if T1 is.

In particular, it does not just assume an arbitrary link in the circular dependency chain is correct, to propagate it around in the hope this will confirm itself. Like you want to do.

syzygy wrote:I think it prohibits the causality violation by requiring that cyclic dependencies among transactions are broken by restarting one or more transactions. It would obviously be feasible to not detect such problems, but it would lead to difficult to understand multithreaded memory ordering with causality violations.

syzygy wrote:I am not "wanting" this. I am simply explaining how the strange result could happen. You insisted:
hgm wrote:But in real life of course there are more compelling laws than the C++11 standard, and physics could not care less about what the C standard allows. Don't expect causality to be violated just because some standard allows it.
so I explain that there is no law of physics that would prevent it.

hgm · Post by **hgm** » Sat Apr 05, 2014 3:50 pm

syzygy wrote:I am talking about speculative stores. In your dictionary they have apparently been defined away. I suppose speculative execution does not exist, either.

Speculative stores are stores. I don't see your point.

Depends on the implementation. In some implementations writes are invisible to other threads until the transaction is committed (lazy versioning). In other implementations writes are visible before the transaction is committed (eager versioning). In that case the threads seeing those writes will have to be aborted if the writing transaction does not get committed.

Indeed. If a speculation cannot be corroborated by non-speculative data it will have to aborted together with all its consequences. The papers do that. Your examples don't.

Nope. If T1 writes to X and T2 reads from X, then T2 is dependent on T1 but not the other way around. No cyclic dependency. T2 can use the result, obviously, or it should not have seen it. T2 must be rolled back if T1 is.

Well, that is exactly what I said, not? T2 cannot commit unless T1 commits. It must be sure that T1 commits before the other thread can use it (i.e. T2 can commit). You cannot use the data as long as it is speculative, and might have to be rolled back.

syzygy wrote:I am not "wanting" this. I am simply explaining how the strange result could happen. You insisted:
hgm wrote:But in real life of course there are more compelling laws than the C++11 standard, and physics could not care less about what the C standard allows. Don't expect causality to be violated just because some standard allows it.
so I explain that there is no law of physics that would prevent it.

And that explanation was shown to be wrong. Because you used speculative data to decide other speculative data could be committed. That is broken. Because there is (by definition) no way to unroll committed data, if it later proves that the speculative data that 'cleared' it was false.

c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed