c++11 std::atomic and memory_order_relaxed

hgm · Post by **hgm** » Wed Apr 02, 2014 10:50 pm

Not unless they are broken. Executing instructions that cannot be undone is not 'speculative execution'. It is just a broken architecture that doesn't work according to the specs of the machine language.

This story about speculative stores visible by other agents is absolute gobledegook. If any architecture would do that it would make it totally useless for SMP. Because it is just coincidental that in the example the speculative execution was dependent on a branch and test on something read from memory first. No CPU could be smart enough to know that. So if it would do these kind of speculative stores, it would be constantly flooding the caches of all other CPUs with totally invalid data that never should have been calculated, from all stores that never should have been executed, but happened to be on a path of a branch misprediction concering data that was totally local to that core.

syzygy · Post by **syzygy** » Wed Apr 02, 2014 10:55 pm

hgm wrote:Not unless they are broken. Executing instructions that cannot be undone is not 'speculative execution'. It is just a broken architecture that doesn't work according to the specs of the machine language.

Speculative implies that they can be undone.

They would need to be undone if the condition turns out to be false. If the condition turns out to be true, there is no need to undo them.

The complication is that if thread 2 is allowed to see the speculative store of thread 1 before it is finally committed, then also thread 2 must be speculatively executing. But why not, if both happen to get into speculative mode at the same time?

Everything would work according to the specs.

I don't know what you mean by (the specs of) "the machine language".

hgm · Post by **hgm** » Wed Apr 02, 2014 11:16 pm

Effects cannot be undone when other cores can have already seen, and acted on them.

Specs of the machine language define the effect of instructions. Like for instance that there is a flow of control that decides which instructions are executed and which not, and that branch instructions steer that flow of control one way, and not both ways.

An architecture where one core could cause other cores to see effects of store instructions that are not supposed to be executed would be totally useless. Even if you were running two independent pieces of code on the CPUs they would be constantly corrupting each other's memory accesses with invalid data written to unintended memory addresses.

bob · Post by **bob** » Wed Apr 02, 2014 11:30 pm

hgm wrote:Effects cannot be undone when other cores can have already seen, and acted on them.

Specs of the machine language define the effect of instructions. Like for instance that there is a flow of control that decides which instructions are executed and which not, and that branch instructions steer that flow of control one way, and not both ways.

An architecture where one core could cause other cores to see effects of store instructions that are not supposed to be executed would be totally useless. Even if you were running two independent pieces of code on the CPUs they would be constantly corrupting each other's memory accesses with invalid data written to unintended memory addresses.

I agree. This has gotten badly mangled within the intel context of speculative memory writes. These are instructions like movd $1, x, where the instruction is executed but not retired, which means the actual write that makes it out to L1 is NOT done until the instruction is retired, if it ever is. It is only retired when the speculation has proven to be correct and the instruction is actually retired and the write to L1 is then done.

I don't call that "speculative write" at all since the write is not done so long as it remains speculative as to whether or not it should be done. If it makes it to L1, the world ends as we know it in computer architecture and programming, because once it gets there, there is no "changing your mind and undoing the write."

syzygy · Post by **syzygy** » Wed Apr 02, 2014 11:31 pm

hgm wrote:Effects cannot be undone when other cores can have already seen, and acted on them.

As I said, the other core would have to be in speculative execution mode as well and the results of it would have to be rolled back as well. The two cores would be in some kind of "joint speculative mode".

As I said, I don't know any hardware that supports this, and I don't see any clear benefit of hardware that would support it, but that does not mean there is no benefit (that I just can't think of right now) and it certainly does not mean there will never be such hardware. It is certainly quite feasible to construct such hardware.

Specs of the machine language define the effect of instructions. Like for instance that there is a flow of control that decides which instructions are executed and which not, and that branch instructions steer that flow of control one way, and not both ways.

I know, but what machine language are you talking about that this architecture would violate the specs of....

An architecture where one core could cause other cores to see effects of store instructions that are not supposed to be executed would be totally useless.

See above and see my earlier posts.

If the compiler somehow has predicted that the condition of an if-statement will almost always be true, then it may make sense to speculatively execute a store that is conditional on this condition being true before the condition is evaluated. If another core at the same time accesses the memory location being written to, it is not completely unreasonable to switch that core to speculative execution as well. After all, we are pretty sure that the condition will turn out to be true, so most likely everything will just turn out fine. Should the condition unexpectedly evaluate to false, then obviously these two cores/threads have both to be rolled back.

hgm · Post by **hgm** » Thu Apr 03, 2014 8:48 am

syzygy wrote:As I said, the other core would have to be in speculative execution mode as well and the results of it would have to be rolled back as well. The two cores would be in some kind of "joint speculative mode".

If it would be rolled back in both of them, none of the memory variables could end up at 42.

I know, but what machine language are you talking about that this architecture would violate the specs of....

Any machine language that would specify programs written in it had a defined effect, rather than always resulting in completely undefined behavior...

If the compiler somehow has predicted that the condition of an if-statement will almost always be true, then it may make sense to speculatively execute a store that is conditional on this condition being true before the condition is evaluated. If another core at the same time accesses the memory location being written to, it is not completely unreasonable to switch that core to speculative execution as well. After all, we are pretty sure that the condition will turn out to be true, so most likely everything will just turn out fine. Should the condition unexpectedly evaluate to false, then obviously these two cores/threads have both to be rolled back.

If reading a speculatively written value would bring a core in a speculative state, swapping the order of reads and speculative writes in the example (what you would need to get 42) would lead to a deadlock. The cores would never get out of the speculative state. Only a read of a non-speculative value could resolve the branch, and none would be scheduled anymore.

It just doesn't work.

syzygy · Post by **syzygy** » Thu Apr 03, 2014 9:08 am

hgm wrote:
syzygy wrote:As I said, the other core would have to be in speculative execution mode as well and the results of it would have to be rolled back as well. The two cores would be in some kind of "joint speculative mode".
If it would be rolled back in both of them, none of the memory variables could end up at 42.

No, both conditions end up true so no need to roll back anything.

The speculation was conditional on (r1 == 42) and (r2 == 42) evaluating to true.

The reads and loads get reordered in a ridiculous way, but that's what memory_order_relaxed allows.

I know, but what machine language are you talking about that this architecture would violate the specs of....
Any machine language that would specify programs written in it had a defined effect, rather than always resulting in completely undefined behavior...

Single-threaded execution is completely predictable. Useful multi-threaded execution is never completely predictable. Here the results are rather surprising, because they appear to violate causality.

If reading a speculatively written value would bring a core in a speculative state, swapping the order of reads and speculative writes in the example (what you would need to get 42) would lead to a deadlock.

Speculation ends once the condition has been evaluated. If it is true, the store is committed. If it is false, the store is rolled back.

It works fine.

hgm · Post by **hgm** » Thu Apr 03, 2014 10:13 am

syzygy wrote:Speculation ends once the condition has been evaluated. If it is true, the store is committed. If it is false, the store is rolled back.

It works fine.

The condition cannot be evaluated if the operands are not yet known because they are based on a speculatively executed load.

You cannot have it both ways. Either reading a speculatively written value from an other core puts you in the speculative state (and then that state would only be resolved when that write was confirmed or retracted in that other core, and not because you used it in some other instruction), or you continue 'business as usual' and treat the compromised value you read as if it were real. The latter leads to undefined behavior for every multithreaded program, which doesn't seem a very good idea. If one thread executes

Code: Select all

int a&#91;1<<24&#93;;

vod f&#40;) &#123;
  unsigned int i, j;
  while&#40;1&#41; &#123;
    i = random&#40;);
    if&#40;i < &#40;1<<24&#41;) a&#91;i&#93; = j;
    j = i;
  &#125;
&#125;

any memory read by any other thread, including its code fetches, could deliver a totally random value, as mispredictions of the if-statement will cause writes of any conceivable value in any conceivable memory location. And the other thread would treat them all as if they were real data.

bob · Post by **bob** » Thu Apr 03, 2014 5:55 pm

syzygy wrote:
hgm wrote:
syzygy wrote:As I said, the other core would have to be in speculative execution mode as well and the results of it would have to be rolled back as well. The two cores would be in some kind of "joint speculative mode".
If it would be rolled back in both of them, none of the memory variables could end up at 42.
No, both conditions end up true so no need to roll back anything.

The speculation was conditional on (r1 == 42) and (r2 == 42) evaluating to true.

The reads and loads get reordered in a ridiculous way, but that's what memory_order_relaxed allows.

I know, but what machine language are you talking about that this architecture would violate the specs of....
Any machine language that would specify programs written in it had a defined effect, rather than always resulting in completely undefined behavior...
Single-threaded execution is completely predictable. Useful multi-threaded execution is never completely predictable. Here the results are rather surprising, because they appear to violate causality.

If reading a speculatively written value would bring a core in a speculative state, swapping the order of reads and speculative writes in the example (what you would need to get 42) would lead to a deadlock.
Speculation ends once the condition has been evaluated. If it is true, the store is committed. If it is false, the store is rolled back.

It works fine.

There's one thing wrong with this. "memory order relaxed" does NOT cover what is known as a "control dependency". if (c) a is a classic control dependency where a is not executed unless c is true. The alpha is a classic "relaxed memory order" architecture. but it does NOT include violating control dependencies. To do so violates the basic premise of programming, in fact, that the program ALWAYS produces the same results as when it is executed one instruction at a time, in the order they are written. It would seem this might even produce a WAR hazard, where the write gets speculatively done before a previous read, which would kill most any program on the planet.

This really is NOT going to happen architecturally.

bob · Post by **bob** » Thu Apr 03, 2014 5:58 pm

hgm wrote:
syzygy wrote:Speculation ends once the condition has been evaluated. If it is true, the store is committed. If it is false, the store is rolled back.

It works fine.
The condition cannot be evaluated if the operands are not yet known because they are based on a speculatively executed load.

You cannot have it both ways. Either reading a speculatively written value from an other core puts you in the speculative state (and then that state would only be resolved when that write was confirmed or retracted in that other core, and not because you used it in some other instruction), or you continue 'business as usual' and treat the compromised value you read as if it were real. The latter leads to undefined behavior for every multithreaded program, which doesn't seem a very good idea. If one thread executes
Code: Select all
int a&#91;1<<24&#93;;

vod f&#40;) &#123;
  unsigned int i, j;
  while&#40;1&#41; &#123;
    i = random&#40;);
    if&#40;i < &#40;1<<24&#41;) a&#91;i&#93; = j;
    j = i;
  &#125;
&#125;
any memory read by any other thread, including its code fetches, could deliver a totally random value, as mispredictions of the if-statement will cause writes of any conceivable value in any conceivable memory location. And the other thread would treat them all as if they were real data.

I don't see how it could work either. A and B are both in speculative mode. How can BOTH get out, one has to get out first. In any case, discussing the idea of a complete core being in "speculative mode" is a pointless exercise since it will never happen.

I can see this kind of stuff happening if the compilers continue to forge ahead into optimizations that are not safe. I don't see the hardware EVER getting into this, there is nothing to gain and a zillion transistors of complexity to be avoided.

c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed

Re: c++11 std::atomic and memory_order_relaxed