volatile?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: volatile?

Post by kbhearn »

Thanks for this link. It's one of the better writeups i've seen on how irritating compilers can be if they want to :) (I am curious just how often a compiler would really use the 'stack frame size shrinking' optimisation that seems to be the catch all for seemingly benign data races breaking code entirely).
syzygy
Posts: 5555
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:It is NOT ok to retrieve old values. The caches on Intel SPECIFICALLY prevent this by their snooping and inter-cache forwarding. Where is this stuff coming from? On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
I was wrong to say that Intel guarantees the illusion that a read will always give the last value written by any other CPU. Not even this illusion holds true in its full generality.

Suppose memory locations x and y are initialised to 0.

Now CPU1 performs a write and a read:
mov $1, [x]
mov [y], %eax

At roughly the same time, CPU2 also performs a write and a read:
mov $1, [y]
mov [x], %eax

Now %eax for both CPU1 and CPU2 may be 0.

How can that be? If CPU1 reads 0 from [y], it must have executed the read before CPU2 executed the write, right? So CPU1 must have executed the write even earlier, and CPU2 must have executed the read even later. That means that CPU2 can only have read 1. But in reality, it may read a 0.
That is a trivial example that is well known. Has absolutely nothing to do with current discussion which would only be about ONE variable. You will NEVER get an old value with Intel.
This is what you wrote:
bob wrote:On Intel, the value you read will be the LAST value written by any other CPU. That's guaranteed.
It is wrong.
Sorry it is absolutely correct. Just look up their MESIF cache coherency protocol, it will explain EXACTLY why it is guaranteed to be true. That is the very definition of "cache coherent NUMA"
Look. I gave an example, which you acknowledge, where the value read is NOT the LAST value written by any other CPU.

The value read is 0. At the time the value is being read, the value 1 had already been written. Capisce?
Last edited by syzygy on Sat Mar 22, 2014 9:20 pm, edited 1 time in total.
syzygy
Posts: 5555
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

bob wrote:
syzygy wrote:Look at my two (identical) statements.
syzygy wrote:For multithreaded programs using volatile is not necessary if you are properly using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads.
syzygy wrote:If "lock" is based on phtreads primitives or a similar library, which is the case I am talking about, then "lock" acts as a memory barrier. The compiler will reload "variable". No need for making it volatile.
They are correct.

Agreed?
No. If lock is based on ANY external procedure call, the compiler will reload any global variables referenced prior to the call. If no lock is called, the compiler will reload ANY global variable referenced prior to the call, UNLESS the compiler has access to the source code for the procedures being called so that it can directly determine which global variables (if any) might be modified.
Do you understand "using the synchronisation primitives of a thread library, e.g. pthreads or C++11 threads"?

POSIX guarantees that it works. Just treat it as a black box. Use the pthread primitives the way you are supposed to use them. Then it works. No volatile needed. Just check the POSIX standard.

And SURE the compiler will reload any global variable referenced prior to external calls, but what does that have to do with anything? Do you mean that not needing "volatile" is just a matter of LUCK because phtreads JUST SO HAPPENED to implement pthread_mutex_lock() as a function call?

Maybe that is not a matter of luck? Maybe the implementor was implementing a standard? And he chose this way to implement a (compiler) memory barrier?

Read those two sentences again. They are based directly on the POSIX standard. Is the POSIX standard lying?
syzygy
Posts: 5555
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

kbhearn wrote:Thanks for this link. It's one of the better writeups i've seen on how irritating compilers can be if they want to :) (I am curious just how often a compiler would really use the 'stack frame size shrinking' optimisation that seems to be the catch all for seemingly benign data races breaking code entirely).
I have been reading a bit about C11/C++11 atomics. Very interesting. Very complicated as well, but it seems to allow the programmer to specify exactly what is needed in terms of memory ordering (both at the compiler and at the processor level).
zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Re: volatile?

Post by zamar »

mcostalba wrote:
lucasart wrote:
mcostalba wrote:
lucasart wrote: On the other hand, Stockfish also declares all shared variables as volatile. And I know that Marco is much more knowledgeable than I am in C++, especially when it comes to multi-threading. So I can't help wondering if there isn't indeed a good reason for all this volatile stuff :?
A variable accessed under lock protection does not require to be defined 'volatile'.

Instead if accessed by many threads outside lock protection is better to define volatile, although of course this doesn't give you any protection against races and you really need to know what you are doing.

But races are an intrinsic part of a SMP chess engine, for instance TT table is intrinsically racy for performance reasons, because to protect with lock it would be very slow...this is not a problem per-se, as long as you know it and you deal with it.
I see. It's a calculated risk.

Senpai follows the conservative approach to use locks more systematically. As Fabien said, he wanted to get it right first, before optimizing (and risking to spend time debugging tricky SMP races).

When you say accesses "under lock protection", you mean write and read? For example, if I have a function that reads from a shared variable at the beginning and the end of the function, I would have to lock before the first read, and unlock after the last read? So the whole function would be under lock protection? Otherwise the compiler (in the absence of volatile) may assume the variable hasn't changed and not actually go and read it in memory. In that context, volatile seems to be a good compromise to write racy code that is at least safe from dangerous compiler optimizations (although not safe from races, which are dealt with, and assumed to be rare enough that we don't care).

Also, what about atomicity? If one thread modified a shared variable and another one modifies it at the same time. Is it possible that the bytes in memory composing that variable and up all scrambled?
No risk is zero because anything out of tt is validated before to be used. In particular tt move is tested for legality.
He's probably talking about highly theoretical cases.

What we currently do in SF:

1) fetch ttEntry from RAM into cache
2) validate the ttEntry
3) use the entry

However theoretically speaking, compiler would be allowed to e.g.:

1) fetch ttEntry from RAM into cache
2) validate ttEntry
3) re-fetch ttEntry from RAM into cache
4) use the entry

Of course this doesn't make any sense. But the standard allows C++ compiler to do almost anything as long as the result in single thread mode would be correct.

So:
- Mathematically speaking there exists a standard compliant compiler that would miscompile SF relating to TTEntry handling.
- Practically speaking I wouldn't worry about this.
Joona Kiiski
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: volatile?

Post by AlvaroBegue »

[Disclaimer: I haven't implemented a multi-threaded chess engine, so perhaps I don't know anything.]
zamar wrote:Of course this doesn't make any sense. But the standard allows C++ compiler to do almost anything as long as the result in single thread mode would be correct.

So:
- Mathematically speaking there exists a standard compliant compiler that would miscompile SF relating to TTEntry handling.
- Practically speaking I wouldn't worry about this.
The standard allows for worse things than that:
Some part of the C++11 standard wrote:The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
So at this point you are in undefined behavior land and anything can happen, whether it would be correct in single-threaded code or not.

It's hard to imagine a practical situation where a compiler will screw you over this, but I have an uneasy feeling when I rely on my lack of imagination for safety.

So how bad is it to actually lock accesses to the hash table? I can see two possible problems:
(1) Too much contention for mutexes.
(2) Locking and unlocking operations are simply too slow, regardless of contention.

(1) can be ameliorated by dividing the transposition table in chunks and having a separate mutex for each chunk. So perhaps (2) is really the problem. Does anyone have some measure of how bad it is?
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: volatile?

Post by mcostalba »

AlvaroBegue wrote: I can see two possible problems:
I can see another one: blindly use locks because you just don't understand the issue. This is the worst one.

Don't worry SF is safe as is ;-)
syzygy
Posts: 5555
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

mcostalba wrote:
AlvaroBegue wrote: I can see two possible problems:
I can see another one: blindly use locks because you just don't understand the issue. This is the worst one.

Don't worry SF is safe as is ;-)
SF is _at most_ safe in the sense that Linus Torvalds, had the Linux kernel used C++, would personally strangle the gcc developers one by one if they managed to introduce optimisations that break the assumptions that SF silently relies on.

From the point of view of the C++ standard combined with POSIX threads, SF is not safe. (But probably no efficient multithreaded engine is safe in that sense.)

I am wondering if it might be possible to write an efficient engine in C++11 or C11 by carefully using C++11/C11 atomics. I don't understand it well enough, but it seems to allow a much more low level approach to multithreading than pthreads.
syzygy
Posts: 5555
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

AlvaroBegue wrote:The standard allows for worse things than that:
Some part of the C++11 standard wrote:The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
So at this point you are in undefined behavior land and anything can happen, whether it would be correct in single-threaded code or not.

It's hard to imagine a practical situation where a compiler will screw you over this, but I have an uneasy feeling when I rely on my lack of imagination for safety.
I think the standard already gives the solution: mark all fields of the TT table atomic.

This does not give you atomic access to TT entries, just to the individual fields of an entry (which you anyway already had no x86 hardware), but that's fine. The engine is already programmed to cope with inconsistent entries.

Simply using atomic will have the problem that it prevent reordering and thereby optimisations. It might also, on some architectures, insert memory fence instructions. But C+11 allows you to disable this by specifying the memory_order_relaxed memory order.

I would think this would change exactly nothing on how the engine would be compiled with current compilers, BUT it brings you under the standard and gets rid of UB.
syzygy
Posts: 5555
Joined: Tue Feb 28, 2012 11:56 pm

Re: volatile?

Post by syzygy »

AlvaroBegue wrote:It's hard to imagine a practical situation where a compiler will screw you over this, but I have an uneasy feeling when I rely on my lack of imagination for safety.
I assume you have been reading this blog? :-)
Bartosz Milweski wrote:You just cannot reason about something that is specifically defined to be “undefined.” So, obviously, you cannot prove the correctness of a program that has data races. However, very few people engage in proofs of correctness. In most cases the argument goes, “I can’t see how this can fail, therefore it must be correct” (maybe not in so many words). I call this “proof by lack of imagination.” If you want to become a concurrency expert, you have to constantly stretch your imagination. So let’s do a few stretches.