atomic TT

syzygy · Post by **syzygy** » Mon Sep 14, 2015 11:45 pm

bob wrote:How is a race condition not "undefined behavior"?

See above. If the only races concern accesses to atomic variables, there is no undefined behavior in the C++11 sense.

One might not be able to predict the outcome, but that is not undefined behavior.

syzygy · Post by **syzygy** » Mon Sep 14, 2015 11:47 pm

bob wrote:
syzygy wrote:
Joost Buijs wrote:
lucasart wrote:Seems there are a few ways of managing TT races:
* Crafty style: lockless hashing trick
* Stockfish style: allow races and dont care, just check for legality of the TT move
* Cilk Chess style: atomic lock for each TT entry

But, at least on x86-64, is it not possible to atomically load or store an entire 128bit TT entry ? Would a simple `std::atomic<TTEntry>` not do the job ? Or maybe through compiler intrinsics ? Anyone tried that ?
The Stockfish style looks very dirty and prone to errors.
If you check the validity of the move there is a chance the entry is ok, but how do you check this in case of an upperbound when there is no move available?
Unless something has changed recently, SF has tens of thousands of hash collisions per second. The very few extra caused by races really won't matter.
Surely not using 64 bit Zobrist...

Using 64 bit Zobrist, but only storing 16-18 bits or so in TT entries.

cdani · Post by **cdani** » Mon Sep 14, 2015 11:52 pm

hgm wrote:
cdani wrote:Now storing and retrieving the whole uint64_t, I can be sure that at least this uint64_t is coherent. In fact now I don't find moves not related to the position, previously happened various times in a single game. That was my simple objective.
Ah OK, now I see. At least it guarantees you that the move belongs to the position. You probably care less about wrong scores or bounds, as the cannot crash your engine so easily.

Yes!

syzygy · Post by **syzygy** » Tue Sep 15, 2015 12:01 am

syzygy wrote:
bob wrote:How is a race condition not "undefined behavior"?
See above. If the only races concern accesses to atomic variables, there is no undefined behavior in the C++11 sense.

One might not be able to predict the outcome, but that is not undefined behavior.

http://concurrencyfreaks.blogspot.de/20 ... -bugs.html

the point here is that data races are undefined behavior (nothing new about it), but if there are simultaneous read/write by two threads on the same variable and it is an atomic variable, then this is no longer called a data race, and it becomes well defined behavior in the C11 and C++11 memory model.

if you define a data race as being a race on a variable, regardless of whether or not that variable is of type atomic, then there are benign data races, but this is not how the C11/C++11 defines a data race, so it depends on how your phrase it.
This means that you can have races on a variable and still have well defined behavior, with relaxed atomics being the best example.

Basically you can get the compiler to do what you, with the machine model in mind, always expected it to do. But this time you have the C11 / C++11 standard backing your expectations.

(I haven't investigated though whether relaxed atomics are sufficient to avoid the potential miscompilation of Crafty's xor-trick that was discussed in an old thread. Some things will need acquire-release semantics.)

ymatioun · Post by **ymatioun** » Tue Sep 15, 2015 12:17 am

what i do in Fizbo is i keep TT entry within one 8 byte block, which can be accessed atomically. Here is my declaration of main TT entry:

// main transposition table data structure
typedef struct{
unsigned short int lock2; // 2 bytes of lock = 16 bits
unsigned char lock1; // 1 more byte of lock =24 bits
unsigned char lock1a:2, // 2 more bits of lock, for total lock size of 3*8+2=26 bits. = 26 bits
depth:6; // 6 bits of depth: 0 to 63 = 32 bits
short int score; // 16 bits, score, +-10K. Need 15 bits to cover +-16K. = 48 bits
unsigned char from:6, // "from". Need 6 bits. = 54 bits
type:2; // score type: 0/1/2=exact,lower,upper. Need 2 bits. = 56 bits
unsigned char to:6, // "to". Need 6 bits. = 62 bits
age:2; // age: 0 to 3. Need 2 bits. = 64 bits total
}

lucasart · Post by **lucasart** » Tue Sep 15, 2015 2:14 am

I don't understand how the C++ standard allows the compiler to break lock-less hashing xor trick. Anyone can explain (in lay man words) ?

kbhearn · Post by **kbhearn** » Tue Sep 15, 2015 2:27 am

lucasart wrote:I don't understand how the C++ standard allows the compiler to break lock-less hashing xor trick. Anyone can explain (in lay man words) ?

just because it's a race, and c++ will not guarantee anything in racy code. Demons may fly out your nose and whatnot.

std::atomic moves it out of undefined behavior but may still lead to unexpected/unpredictable behavior if not sufficiently well thought out.

mar · Post by **mar** » Tue Sep 15, 2015 3:25 am

kbhearn wrote:just because it's a race, and c++ will not guarantee anything in racy code. Demons may fly out your nose and whatnot.

I think it was more about potential reordering than races. I think the conclusion was that this can't happen in practice unless the compiler is broken.

There's no reliable/easy way to detect races at compile time, it could be detected at runtime but that would be a huge performance hit so very unlikely
(you could as well use some interpreted language instead and get the same performance).
Hardware could detect this (in fact CPU already has to synchronize caches somehow to maintain consistency if writes happen to cache lines holding
the same piece of physical memory - certainly in the case of atomics),
but IMHO this would be only practically useful for debugging.

So the only UB I see here is on hw level because of the race itself, I don't see what C++ has to do with it.

That being said I would prefer to finally see alignment attributes being standardized.

bob · Post by **bob** » Tue Sep 15, 2015 4:00 am

kbhearn wrote:
lucasart wrote:I don't understand how the C++ standard allows the compiler to break lock-less hashing xor trick. Anyone can explain (in lay man words) ?
just because it's a race, and c++ will not guarantee anything in racy code. Demons may fly out your nose and whatnot.

std::atomic moves it out of undefined behavior but may still lead to unexpected/unpredictable behavior if not sufficiently well thought out.

Sorry, but this is absolutely the most stupid concept I have ever seen, but then the C++ standards group has NEVER had an elevator that goes all the way to the roof. They are basically allowing you to declares something as "undefined behavior" and then saying "since it is declared as undefined behavior, it is no longer undefined behavior." Such a complete crock...

Undefined == unpredictable.

bob · Post by **bob** » Tue Sep 15, 2015 4:04 am

syzygy wrote:
bob wrote:
syzygy wrote:
Joost Buijs wrote:
lucasart wrote:Seems there are a few ways of managing TT races:
* Crafty style: lockless hashing trick
* Stockfish style: allow races and dont care, just check for legality of the TT move
* Cilk Chess style: atomic lock for each TT entry

But, at least on x86-64, is it not possible to atomically load or store an entire 128bit TT entry ? Would a simple `std::atomic<TTEntry>` not do the job ? Or maybe through compiler intrinsics ? Anyone tried that ?
The Stockfish style looks very dirty and prone to errors.
If you check the validity of the move there is a chance the entry is ok, but how do you check this in case of an upperbound when there is no move available?
Unless something has changed recently, SF has tens of thousands of hash collisions per second. The very few extra caused by races really won't matter.
Surely not using 64 bit Zobrist...
Using 64 bit Zobrist, but only storing 16-18 bits or so in TT entries.

I had forgotten about the 64 bit entry size. I did that a while back, but I REALLY didn't like the collision rate, regardless of the paper I wrote with Cozzie on the subject.

atomic TT

Re: atomic TT

Re: atomic TT

Re: atomic TT

Re: atomic TT

Re: atomic TT

Re: atomic TT

Re: atomic TT

Re: atomic TT

Re: atomic TT

Re: atomic TT