Parallel Search with Transposition Table

daylen · Post by **daylen** » Thu Mar 27, 2014 5:57 am

Hi all, I'm writing an engine. It's not a chess engine, but it's for a similar enough game that the same ideas for search apply (i.e. negamax, alpha-beta, transposition table, parallel search, iterative deepening). It's also in Java. I've implemented a negamax alpha-beta search with a transposition table inside of an iterative-deepening loop (thanks to the well-documented Stockfish source code for illuminating some concepts).

The transposition table is a Java HashMap<Long, TableEntry> (dictionary of Zobrist hashes to table entries). I then decided to add parallel search. Since parallelizing alpha-beta looks very difficult and way above my skill level, I decided to take the "easy" route and just have a thread pool handle the search for all the moves at the root. For example, if this was a chess engine searching from the starting position, I would queue up a search on e4, d4, c4, etc. and the thread pool would dequeue these moves and search each of these moves in parallel.

Now obviously this would lead to issues with concurrent access to the transposition table (HashMap). Right now it works perfectly if you use either multiple threads OR the transposition table, but not both. I tried swapping out the HashMap for a ConcurrentHashMap, which allows for concurrent reading and writing, but that didn't really solve anything. Using a completely synchronized map didn't help either.

So my question is: how should I access the transposition table in such a way so that multiple threads can be doing searches, or is this even possible? I see that Stockfish's TranspositionTable class makes no mention of thread safety. Additionally inside Stockfish's search method, calls to TT.store aren't wrapped in a lock. So I'm a bit confused as to how Stockfish ensures integrity of the transposition table.

Edmund · Post by **Edmund** » Thu Mar 27, 2014 6:22 am

http://www.talkchess.com/forum/viewtopi ... =0&t=41917

Basically there exist 2 'solutions' out there avoiding locks. Firstly (like done in Stockfish), validate the information presented by the transposition table, not to crash on entry corruption. Secondly (eg. like done in Crafty) you can xor the data of the entry back into the key in order to invalidate the key of any corrupted entry (see http://chessprogramming.wikispaces.com/ ... e#Lockless).

daylen · Post by **daylen** » Thu Mar 27, 2014 7:01 am

How would one validate the information from the TT? In the linked thread Marco says that the move is validated before it is used. I don't see this in search.cpp (lines 519-544, also labeled "Step 4"). I do see a comment called "Only in case of TT access race" and it checks that ttValue != VALUE_NONE. But I don't know why VALUE_NONE would be in the TT in the first place. Additionally with a 64-bit key, would validation be necessary?

Sven · Post by **Sven** » Thu Mar 27, 2014 8:23 am

daylen wrote:How would one validate the information from the TT? In the linked thread Marco says that the move is validated before it is used. I don't see this in search.cpp (lines 519-544, also labeled "Step 4"). I do see a comment called "Only in case of TT access race" and it checks that ttValue != VALUE_NONE. But I don't know why VALUE_NONE would be in the TT in the first place. Additionally with a 64-bit key, would validation be necessary?

One common opinion is that the TT move should be checked for pseudo-legality. I don't know whether this is actually done in Stockfish, neither where you could find it. Maybe in the MovePicker?

syzygy · Post by **syzygy** » Thu Mar 27, 2014 8:58 am

Sven Schüle wrote:
daylen wrote:How would one validate the information from the TT? In the linked thread Marco says that the move is validated before it is used. I don't see this in search.cpp (lines 519-544, also labeled "Step 4"). I do see a comment called "Only in case of TT access race" and it checks that ttValue != VALUE_NONE. But I don't know why VALUE_NONE would be in the TT in the first place. Additionally with a 64-bit key, would validation be necessary?
One common opinion is that the TT move should be checked for pseudo-legality. I don't know whether this is actually done in Stockfish, neither where you could find it. Maybe in the MovePicker?

In MovePicker indeed. pos.pseudo_legal(ttm)

daylen · Post by **daylen** » Thu Mar 27, 2014 9:54 am

Even after adding this check in, I'm still getting erratic results when using multiple threads and the TT.

Without multiple threads:

Code: Select all

info depth 1 score 6 time 0 multipv 1 pv &#91;add to 45&#93; 6
info depth 2 score 9997 time 1 multipv 1 pv &#91;add to 45&#93; 9997
info depth 3 score 9997 time 1 multipv 1 pv &#91;add to 45&#93; 9997
info depth 4 score 9997 time 2 multipv 1 pv &#91;add to 45&#93; 9997
info depth 5 score 9997 time 3 multipv 1 pv &#91;add to 45&#93; 9997
info depth 6 score 9997 time 3 multipv 1 pv &#91;add to 45&#93; 9997
info depth 7 score 9997 time 311 multipv 1 pv &#91;add to 45&#93; 9997
info depth 8 score 9997 time 1554 multipv 1 pv &#91;add to 45&#93; 9997
info depth 9 score 9997 time 4987 multipv 1 pv &#91;add to 45&#93; 9997
info time 4987
bestmove &#91;add to 45&#93; 9997

The right move and score is loaded in from the TT.

With multiple threads:

Code: Select all

info depth 1 score 4 time 1 multipv 1 pv &#91;add to 42&#93; 4
info depth 2 score 9993 time 1 multipv 1 pv &#91;add to 72&#93; 9993
info depth 3 score 9993 time 2 multipv 1 pv &#91;add to 72&#93; 9993
info depth 4 score 9993 time 10 multipv 1 pv &#91;add to 72&#93; 9993
info depth 5 score 9995 time 81 multipv 1 pv &#91;add to 63&#93; 9995
info depth 6 score 9995 time 466 multipv 1 pv &#91;add to 63&#93; 9995
info depth 7 score 9995 time 1667 multipv 1 pv &#91;add to 63&#93; 9995
info depth 8 score 9997 time 4994 multipv 1 pv &#91;add to 45&#93; 9997
info time 4994
bestmove &#91;add to 45&#93; 9997

As we can see here, the multi-threaded version *barely* finds the right move in time. Looks like the single-threaded version actually does better.

Maybe this is due to the fact that the way I'm implementing multiple threads is not very efficient. I've heard about YBWC but I haven't found any clear pseudocode demonstrating exactly how to go about implementing it. Stockfish's source regarding this is quite complicated.

AlvaroBegue · Post by **AlvaroBegue** » Thu Mar 27, 2014 10:32 am

So you are implementing this in Java, presumably because you don't care about performance. But you want to parallelize it, because you do care about performance.

Make up your mind. If you want performance, you probably want to go with C or C++.

daylen · Post by **daylen** » Thu Mar 27, 2014 10:53 am

Well Java has a really nice threading/concurrency framework (ExecutorService - http://docs.oracle.com/javase/7/docs/ap ... rvice.html ) which makes it really easy to create and use a thread pool. With C/C++ I'd have to do this myself.

Bloodbane · Post by **Bloodbane** » Thu Mar 27, 2014 2:05 pm

C++11 added threading libraries so you don't have to do everything yourself anymore. Also, you can easily double the speed of your engine if you switch without doing the multihtreaded thing. I can't see any reason not to switch languages.

Also, the fact that your multithreading doesn't work properly simply means that there is a bug somewhere, not that you have to try YBW.

rtitle · Post by **rtitle** » Thu Mar 27, 2014 2:43 pm

The main performance problem with Java is garbage collection. If your chess program creates an object for each node of the search (e.g. a Move object created by move generation) then all those objects need to be garbage collected. The TT table can also be a problem, since each TT entry in the HashMap is an object, and they need to be garbage-collected after replacement. I initially wrote my chess program in Java, and profiling showed over 90% of its time spent garbage collecting. You can work around this by pre-allocating pools of objects that you hand out yourself, but if you do that you are effectively writing your own "new" and "delete". In that case why not use C++ in the first place - it provides a good-performing "new" and "delete" for you. That was my conclusion. I then translated my Java program to C++ and saw a roughly 10x speedup.

For threading, C++ 11 has threading support in its libraries. To do parallel search, check out the PVS (principal variation splitting) algorithm. It'll do better than the simple divvy-up-the-top-level-nodes algorithm you describe.

Thread safety of the TT table is an issue. As a professional software engineer who works on storage systems in real life, I shudder at the solutions adopted by typical chess programs. If you want to be rigorous about it, you can either (1) Keep your entries to 64 bits and read/write them using atomic reads and writes, or (2) Use larger entries and synchronize access with read/write locks, or (3) Use a lock-free hash algorithm - see for instance http://www.research.ibm.com/people/m/mi ... -2002.pdf - however sound lock-free hashes are complex to implement. There's also a paper by Bob Hyatt that describes the lock-free hash he uses in Crafty. It's simpler than the paper I just pointed you to, but it makes restrictive assumptions about what your hash entries look like. And it's not quite sound since it assumes atomic read/write of 64-bit values by default, which C++ does not guarantee. But that last point is maybe a quibble since most modern computer architectures will do 64-bit atomically even if you don't use C++ std::atomic.

The alternative is to ignore thread safety except for some sanity checking that you don't get back garbage due to corrupted entries. This obviously is not foolproof - you might get back something non-garbage but wrong - but it might be good enough for chess search where an erroneous result from the hashtable during search isn't fatal. (By contrast, you wouldn't want your bank trashing your bank balance and saying "oh, that's just corruption because we used a non-threadsafe database for speed").

Anyway, good luck!

Rich

Parallel Search with Transposition Table

Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table