volatile?

Rein Halbersma · Post by **Rein Halbersma** » Thu Mar 20, 2014 6:19 pm

mcostalba wrote:
lucasart wrote: On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff
A variable accessed under lock protection does not require to be defined 'volatile'.

Instead if accessed by many threads outside lock protection is better to define volatile, although of course this doesn't give you any protection against races and you really need to know what you are doing.

But races are an intrinsic part of a SMP chess engine, for instance TT table is intrinsically racy for performance reasons, because to protect with lock it would be very slow...this is not a problem per-se, as long as you know it and you deal with it.

You are not really dealing with it, but you are taking a calculated risk. When Stockfish is run on a machine with dozens of threads, the likelihood of concurrent reads/writes of the same hash entry (and the ensueing Undefined Behavior) will become increasingly likely.

syzygy · Post by **syzygy** » Thu Mar 20, 2014 6:30 pm

Rein Halbersma wrote:You are not really dealing with it, but you are taking a calculated risk. When Stockfish is run on a machine with dozens of threads, the likelihood of concurrent reads/writes of the same hash entry (and the ensueing Undefined Behavior) will become increasingly likely.

UB can be an issue in that the compiler may produce something that you did not expect, but provided that the compiler maps the code to the machine in "Bob's Right Way(TM)", there should not be any UB problems with increasing numbers of threads.

Of course if the compiler produces code with nasty (machine level) race conditions, then increasing the number of threads is more likely to let the demons start flying.

Rein Halbersma · Post by **Rein Halbersma** » Thu Mar 20, 2014 6:49 pm

syzygy wrote:
Rein Halbersma wrote:You are not really dealing with it, but you are taking a calculated risk. When Stockfish is run on a machine with dozens of threads, the likelihood of concurrent reads/writes of the same hash entry (and the ensueing Undefined Behavior) will become increasingly likely.
UB can be an issue in that the compiler may produce something that you did not expect, but provided that the compiler maps the code to the machine in "Bob's Right Way(TM)", there should not be any UB problems with increasing numbers of threads.

Of course if the compiler produces code with nasty (machine level) race conditions, then increasing the number of threads is more likely to let the demons start flying.

OTOH, even parallelism guru Charles Leiserson knowingly took this risk with CilkChess, so one would be in good company in C++ purgatory.

http://citeseerx.ist.psu.edu/viewdoc/do ... 1&type=pdf

syzygy · Post by **syzygy** » Thu Mar 20, 2014 6:58 pm

syzygy wrote:It seems splitPoint->moveCount is always accessed under lock protection (except for an assert), so it seems it could be made non-volatile. It might give the compiler a little bit more freedom when compiling this line:
Code: Select all
          moveCount = ++splitPoint->moveCount;
I guess with volatile the compiled code must first read splitPoint->moveCount and increase the value read, and must then write the resulting value to splitPoint->moveCount. It would be illegal to increment splitPoint->moveCount in memory using inc and then read it from memory.

Or do I have it wrong here? Maybe the compiled code must in fact do the following?
1. read splitPoint->moveCount
2. increment the retrieved value
3. write this value to splitPoint->moveCount
4. read this value from splitPoint->moveCount and assign it to moveCount

Does anyone know which of the two (if any of the two) is mandated by the C/C++ abstract machine?

According to gcc the first option is correct:
1. read splitPoint->moveCount
2. increment the retrieved value
3. write this value to splitPoint->moveCount while keeping a local copy
4. assign the value in the local copy to moveCount

Now let's see if C99 agrees. According to 6.5.3.1.2, ++E is equivalent to (E+=1).
Paragraph 6.5.16.3:

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment, but is not an lvalue. The type of an assignment expression is the type of the left operand unless the left operand has qualified type, in which case it is the unqualified version of the type of the left operand. The side effect of updating the stored value of the left operand shall occur between the previous and the next sequence point.

Not particularly clear, but I suppose this means that the expression E+=1 is evaluated once and that, as a side effect, its value is stored in E. This is what gcc does, anyway...

syzygy · Post by **syzygy** » Thu Mar 20, 2014 7:16 pm

But isn't this interesting:

I've got a question about volatiles in assignment expressions.
I found the following code snippet in an embedded development
forum:
----
volatile char a;
...
if (++a == 1)
{
...
----
This code snippet was then followed by the output from two
compilers, one which was gcc. Gcc's code looked like something
like this (in RTL like form where Rn denotes a register):

R1 <- *a
R1 <- R1 + 1
*a <- R1
R2 <- *a
IF R2 == 1

Now to my question! The code above contains two read accesses
and one write access to a. I would have assumed that only one
read access should be performed. Does the standard allow the
second read operation?

Current gcc only reads variable "a" once.
What follows is a thorough exegesis of various passages of the C99 standard by different people, each coming to a different conclusion.

bob · Post by **bob** » Thu Mar 20, 2014 8:10 pm

lucasart wrote:looking at senpai's code, i came across something odd:
* variables subject to concurrent access are declared as volatile. why?
* they are always modified under lock, which is correct. but could it be faster in some cases to not use locking, and use atomic variables? eg. you just need to increment the node cnt by 1 seems wasteful to use a lock.

Volatile simply says "this variable may change values spontaneously" (by another thread, actually). It tells the compiler to not optimize access to the variable. Each time you reference it in a program, it must go back to memory to fetch it again. If you don't declare locks and shared stuff like that you can fall into all sorts of traps...

bob · Post by **bob** » Thu Mar 20, 2014 8:19 pm

syzygy wrote:
lucasart wrote:The reason I ask, is because I was wondering what this obscure "volatile" keyword really means. I read this short article:
https://www.kernel.org/doc/Documentatio ... armful.txt
Essentially they make the point that
In properly-written code, volatile can only serve to slow things down
They say that volatile has nothing to do with concurrency, it's almost never correct to use it, and the lock is enough.
Volatile means the compiler must generate code that reads from and writes to it in the order specified by the C or C++ abstract machine.

Volatile is just what you want if you're accessing memory mapped hardware I/O registers. You don't normally do this in application programs.

For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.

If you don't want to be restricted by a thread library, but (in addition) want to rely on how the compiler will map your code to the machine, then volatile is useful. Your program will be full of undefined behavior, but if that is a conscious choice that is not necessarily bad. It will just not be "standard C/C++ plus pthreads" or "standard C11/C++11".

On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff
I'll have a look at this.

For senpai it would be interesting to add a line
Code: Select all
#define volatile
and see how much that speeds up (and whether it crashes or results in funny things).

Sorry, but the above is wrong. Here is a simple example.

int variable; // shared variable with another thread

int funct() {

i = variable;
..
.. do some stuff;
..

lock(smp_lock);

j = variable;

do some more stuff;

unlock(smp_lock);

The above will fail because variable was not changed in this code between the first reference and the second. Another thread COULD have acquired the lock, modified variable, and then released the lock. When you access variable a second time, the compiler is free to use the original value (now stored in i) rather than re-loading it again, even though the other thread might have modified it inside the locks.

Volatile is intended to tell the compiler "you can't cache this value in register safely, it might be changed by code you can't see. Therefore whenever I read or write it, you have to do the read or write exactly where I say.

Even the locks have to be declared volatile or the compiler can optimize you right into a deadlock.

I would not declare ALL shared variables volatile, that is pretty pointless. The only ones that matter are the ones that can be read AND written in different threads. Those are the volatile candidates, and you might not even care about all of those. But the critical cases are where you test a value that another thread sets. If that is not volatile, it is not going to work correctly.

bob · Post by **bob** » Thu Mar 20, 2014 8:25 pm

syzygy wrote:
lucasart wrote:On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff
Many fields of the SplitPoint struct are volatile, but for a reason:
Code: Select all
struct SplitPoint &#123;

...

  // Shared data
  Mutex mutex;
  volatile uint64_t slavesMask;
  volatile uint64_t nodes;
  volatile Value alpha;
  volatile Value bestValue;
  volatile Move bestMove;
  volatile int moveCount;
  volatile bool cutoff;
&#125;;
The reason is that these fields are not always accessed under lock protection. For example, in search() you can find several occurrences of:
Code: Select all
          if &#40;SpNode&#41;
              alpha = splitPoint->alpha;
Without the "volatile" keyword, the compiler might in certain cases be allowed to assume (according to the C++ standard) that splitPoint->alpha cannot change. It could then load this value once from memory and keep it in a register or on the stack. Or the compiler could be allowed to deduce that splitPoint->alpha cannot be different from alpha, so that these lines can be optimised away.

It seems splitPoint->moveCount is always accessed under lock protection (except for an assert), so it seems it could be made non-volatile. It might give the compiler a little bit more freedom when compiling this line:
Code: Select all
          moveCount = ++splitPoint->moveCount;
I guess with volatile the compiled code must first read splitPoint->moveCount and increase the value read, and must then write the resulting value to splitPoint->moveCount. It would be illegal to increment splitPoint->moveCount in memory using inc and then read it from memory.

Or do I have it wrong here? Maybe the compiled code must in fact do the following?
1. read splitPoint->moveCount
2. increment the retrieved value
3. write this value to splitPoint->moveCount
4. read this value from splitPoint->moveCount and assign it to moveCount

Does anyone know which of the two (if any of the two) is mandated by the C/C++ abstract machine?

In any case, without volatile anything is allowed as long as the specified result is achieved, and the compiler may assume that no other code or hardware is messing with splitPoint->moveCount (which is in fact a valid assumption due to the locks around this code).

All this means that SF relies on UB. It is not impossible that future compiler optimisations break it in subtle ways. But personally I would advise against fixing this.

The code looks ugly to me and is probably not safe.

It does two operations. Inc and then fetch.

Nothing says thread 1 can't do the inc, then thread 2 does another inc, then thread 1 re-acquires the value back into cache and gets the +2 incremented value, as does thread 2.

Two different operations on the same shared value absolutely requires a lock. Clearly an inc and then a fetch fits this.

bob · Post by **bob** » Thu Mar 20, 2014 8:30 pm

syzygy wrote:
syzygy wrote:It seems splitPoint->moveCount is always accessed under lock protection (except for an assert), so it seems it could be made non-volatile. It might give the compiler a little bit more freedom when compiling this line:
Code: Select all
          moveCount = ++splitPoint->moveCount;
I guess with volatile the compiled code must first read splitPoint->moveCount and increase the value read, and must then write the resulting value to splitPoint->moveCount. It would be illegal to increment splitPoint->moveCount in memory using inc and then read it from memory.

Or do I have it wrong here? Maybe the compiled code must in fact do the following?
1. read splitPoint->moveCount
2. increment the retrieved value
3. write this value to splitPoint->moveCount
4. read this value from splitPoint->moveCount and assign it to moveCount

Does anyone know which of the two (if any of the two) is mandated by the C/C++ abstract machine?
According to gcc the first option is correct:
1. read splitPoint->moveCount
2. increment the retrieved value
3. write this value to splitPoint->moveCount while keeping a local copy
4. assign the value in the local copy to moveCount

Now let's see if C99 agrees. According to 6.5.3.1.2, ++E is equivalent to (E+=1).
Paragraph 6.5.16.3:
An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment, but is not an lvalue. The type of an assignment expression is the type of the left operand unless the left operand has qualified type, in which case it is the unqualified version of the type of the left operand. The side effect of updating the stored value of the left operand shall occur between the previous and the next sequence point.
Not particularly clear, but I suppose this means that the expression E+=1 is evaluated once and that, as a side effect, its value is stored in E. This is what gcc does, anyway...

I just tested gcc here. Problem is it still fails.

a=0
thread 1 thread 2
fetch a fetch a
inc a inc a
write a write a

a != 2. Both get the same value. It needs to be volatile to avoid optimization problems, it needs to be locked to avoid the above interleaved updating.

bob · Post by **bob** » Thu Mar 20, 2014 8:34 pm

mcostalba wrote:
lucasart wrote: On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff
A variable accessed under lock protection does not require to be defined 'volatile'.

Instead if accessed by many threads outside lock protection is better to define volatile, although of course this doesn't give you any protection against races and you really need to know what you are doing.

But races are an intrinsic part of a SMP chess engine, for instance TT table is intrinsically racy for performance reasons, because to protect with lock it would be very slow...this is not a problem per-se, as long as you know it and you deal with it.

That is simply wrong. The compiler doesn't know diddly squat about locks. I gave an example previously where you fetch a outside a lock, then you fetch it again inside the lock. The compiler is NOT required to re-load a inside the lock, it is perfectly free to re-use the original value it loaded. Even though another thread might have acquired the lock first, then modified a, and released the lock.

You need volatile there or it won't work correctly. It will, given enough registers, only load a one time in a procedure and then hang on to it in that register for the life of the procedure...

volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?

Re: volatile?