Parallel Search with Transposition Table

jdart · Post by **jdart** » Thu Mar 27, 2014 2:45 pm

Since parallelizing alpha-beta looks very difficult and way above my skill level, I decided to take the "easy" route and just have a thread pool handle the search for all the moves at the root.

This is problematic, because the usual approach (PVS search) is to do the supposed best move first, obtain an accurate score, and then narrow the search window so that subsequent moves are only tested to see if they improve the current best score or not. That narrow-window search is typically much faster than a search with wider bounds.

Re concurrency: all you have to do to make it work is to make your hash table access method "synchronized", although as others have noted performance may be an issue because you will acquire/release locks frequently.

--Jon

Bloodbane · Post by **Bloodbane** » Thu Mar 27, 2014 3:13 pm

I though GC was pretty fast if it is done right, i.e. using some sort of a system based on when an object was last accessed, can't remember the name. Is Java not doing GC that way or is it just an feature of hashMap?

DrRibosome · Post by **DrRibosome** » Thu Mar 27, 2014 7:57 pm

Hmm, I wrote an engine in java as well and it seems to have comparable performance to a lot of the cpp engines. Everything is pre-allocated, etc on startup so there's no allocations at all in the search, and the jvm optimizes all the numeric operations really well

daylen · Post by **daylen** » Thu Mar 27, 2014 8:18 pm

Thanks for your detailed post. It's true that Java is doing a lot of object creation/deletion in, as you said, the move generator. What profiler did you use that showed garbage collection? jvisualvm, the profiler that ships with Java, doesn't show it.

I'll try to read more about PVS. Right now I know what it is on a very high-level but not really how to go about implementing it. The idea is to explore the left-most branch and then do the rest of the tree in parallel...

By using a ConcurrentHashMap I'm basically using method (2)—synchronizing access with read/write locks. I'm *also* doing sanity checking of data I get out of the ConcurrentHashMap. However there are still inconsistency issues, so the bug must be elsewhere.

daylen · Post by **daylen** » Thu Mar 27, 2014 8:21 pm

It's called generational garbage collection. From Wikipedia:

"It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or the generational hypothesis). A generational GC (also known as ephemeral GC) divides objects into generations and, on most cycles, will place only the objects of a subset of generations into the initial white (condemned) set."

syzygy · Post by **syzygy** » Thu Mar 27, 2014 8:57 pm

rtitle wrote:The alternative is to ignore thread safety except for some sanity checking that you don't get back garbage due to corrupted entries. This obviously is not foolproof - you might get back something non-garbage but wrong - but it might be good enough for chess search where an erroneous result from the hashtable during search isn't fatal. (By contrast, you wouldn't want your bank trashing your bank balance and saying "oh, that's just corruption because we used a non-threadsafe database for speed").

You also don't want your bank to calculate your balance based on wild heuristics that nobody really understands but just happen to work. A chess engine is just not comparable to a bank.

But with C++11/C11 atomics you can at least get rid of UB.

syzygy · Post by **syzygy** » Thu Mar 27, 2014 9:04 pm

daylen wrote:Thanks for your detailed post. It's true that Java is doing a lot of object creation/deletion in, as you said, the move generator.

If already generating moves involves object creation/deletion, then, with all due respect, it is time to start over.

Use integers to represent moves, not objects.

daylen · Post by **daylen** » Thu Mar 27, 2014 9:30 pm

Doing so would require almost a complete rewrite of the Position class. A Move object encapsulates 4 integers: the x, y for the source and destination squares. Would the speed up from using one integer be significant?

Rein Halbersma · Post by **Rein Halbersma** » Thu Mar 27, 2014 10:24 pm

daylen wrote:Doing so would require almost a complete rewrite of the Position class. A Move object encapsulates 4 integers: the x, y for the source and destination squares. Would the speed up from using one integer be significant?

Ronald presumably meant: if you dynamically have to create/delete objects (i.e. using malloc/free, or C++ new/delete), rather than on the stack.

emadsen · Post by **emadsen** » Fri Mar 28, 2014 3:54 am

So you are implementing this in Java, presumably because you don't care about performance. But you want to parallelize it, because you do care about performance. Make up your mind. If you want performance, you probably want to go with C or C++.

I agree, depending on your goals. This forum can be a bit performance or ELO-obsessed, as Fernando reminds us from time to time. If your goal is to produce a high-performance, strong engine, then yes, these technology choices are contradictory. However, if the goal of the project is to produce a sparring partner, or gain some intellectual satisfaction by correctly implementing difficult algorithms, then these arguments over languages and performance are irrelevant. I've been spending a lot of time lately testing various algorithms to implement UCI_LimitStrength and UCI_Elo, which adds value to a chess program in a different manner than NPS.

Besides, my slow .NET engine is nearly 2500 ELO against human opposition on ICC. It's ridiculous how strong these chess engines are when the distorting effect of computer-computer testing is removed.

So, the original poster has to decide what are his goals.

profiling showed over 90% of its time spent garbage collecting

Seems very high. My engine spends about 12% in garbage collection.

each TT entry in the HashMap is an object, and they need to be garbage-collected after replacement.

No need for garbage collection. Just pre-allocate the TT entries and reuse.

Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table

Re: Parallel Search with Transposition Table