"Simplified ABDADA" updated

TomKerrigan · Post by **TomKerrigan** » Tue Aug 29, 2017 7:11 am

Hi all, thanks to feedback from this forum, I've updated my "Simplified ABDADA" algorithm:

http://www.tckerrigan.com/Chess/Paralle ... ed_ABDADA/

It turns out that hash table collisions were an issue for currently_searching (the hash table that stores hashes of moves that are currently being searched).

I switched it to a 4-way hash table which solved the problem very neatly. The code is still very simple, doesn't require any locks, there's basically no impact on performance (NPS), and at no point in my testing was there a need for more than 4-way associativity.

I also redid all my speedup testing and updated the numbers on this page:

http://www.tckerrigan.com/Chess/Parallel_Search/

The speedups are all basically the same. Actually I think they are probably a tiny bit slower on average, by maybe 1-2%. I guess having multiple threads randomly search the same move sometimes provided a little benefit, but I'd rather have the peace of mind of knowing the algorithm is working as intended!

Michel · Post by **Michel** » Tue Aug 29, 2017 8:23 am

I don't want to be nitpicking but to better evaluate how much the ABDADA synchronization mechanism is worth (in the TtD metric you use) you should also make the corresponding measurements without synchronization (i.e. the simple SHT approach).

TomKerrigan · Post by **TomKerrigan** » Tue Aug 29, 2017 8:34 am

Michel wrote:I don't want to be nitpicking but to better evaluate how much the ABDADA synchronization mechanism is worth (in the TtD metric you use) you should also make the corresponding measurements without synchronization (i.e. the simple SHT approach).

Not at all! I'm happy to discuss any of this.

The speedup numbers on that top page are for a combination of all my stuff--Simplified ABDADA, Cutoff Checks, and the synchronization I do at the root.

I list approximate speedup numbers for the different "features" here:

http://www.tckerrigan.com/Chess/Parallel_Search/How_To/

Briefly, with nothing other than a shared hash table, I see speedups of ~6x with 16 cores. The speedups vary pretty wildly. Some are pretty good, around 8x. Others are pretty bad, closer to 3x. With ABDADA, etc. the speedups are much more consistent.

With nothing other than a shared hash table, my NPS takes a dive, from a pretty consistent 15.5x down to around 13-14x. Makes sense because more threads are going to be searching identical parts of the tree.

nionita · Post by **nionita** » Fri Nov 24, 2017 3:16 pm

Hi Tom,

Many thanks for the write up!

I followed your ideas and implemented simplified ABDADA in Barbarossa (up to step 6, but without cutoff checks yet). I skipped some of the preparation steps which were meant basically to make sure the implementation is correct so far.

In my own tests I got ~80 Elo for 2 cores, which is in accordance with the theory. Not sure how it scales further, I will need more tests.

The multi threading Barbarossa will be used in next HGM tournament.

There were some technical complications with:
- LMR: when I defer a move, it will be played at the end with the "later" reduction (not sure if this is correct)
- re-search (because I use PVS): I think you already stated that in this case the "currently searching" of that move should be deleted, so that other threads eventually help with the re-search: I implemented this idea, but I'm not sure if it is quite correct or if it has any effect without synchronization between threads

TomKerrigan · Post by **TomKerrigan** » Sat Dec 02, 2017 7:30 pm

nionita wrote:Hi Tom,

Many thanks for the write up!

I followed your ideas and implemented simplified ABDADA in Barbarossa (up to step 6, but without cutoff checks yet). I skipped some of the preparation steps which were meant basically to make sure the implementation is correct so far.

In my own tests I got ~80 Elo for 2 cores, which is in accordance with the theory. Not sure how it scales further, I will need more tests.

The multi threading Barbarossa will be used in next HGM tournament.

That's great! I'm glad it's working well and that you (presumably) found it useful!

There were some technical complications with:
- LMR: when I defer a move, it will be played at the end with the "later" reduction (not sure if this is correct)

I haven't tried LMR (I know, I know) but I suspect this isn't correct. What you could do is, as you're deferring the move, record the depth that you WOULD have searched the move at.

- re-search (because I use PVS): I think you already stated that in this case the "currently searching" of that move should be deleted, so that other threads eventually help with the re-search: I implemented this idea, but I'm not sure if it is quite correct or if it has any effect without synchronization between threads

Here's the rationale: when you mark a move as currently being searched, the theoretical idea is that you're dividing up work between threads, but the PRACTICAL effect of doing that is to make other threads search that move later.

When you're re-searching a move because of a null-window failure, that means the move is very likely to fail high or at least change alpha, so that's important search information that every thread should ideally know before searching any subsequent move. So the last thing you want your threads to do is skip over the move that's failing. So that's why it shouldn't be marked as currently being searched.

So that's the theoretical explanation. In practice, the benefit is small but measurable with my chess engine. Hope this helps!

nionita · Post by **nionita** » Tue Dec 05, 2017 10:40 am

TomKerrigan wrote:I haven't tried LMR (I know, I know) but I suspect this isn't correct. What you could do is, as you're deferring the move, record the depth that you WOULD have searched the move at.

Ok, I will have to try this. I was thinking that, when you defer a move, chances are high that when you come to search it, there is a TT entry for it, and that should have the correct depth anyway, as it was searched in the same position, but not deferred.

But of course if the search in the first thread was not finished at that time then this is not the case. It is hard to understand all implications of this concurrent search... I guess best approach is to try both variants and take the best.

jcw66 · Post by **jcw66** » Wed Dec 13, 2017 8:47 pm

Thanks for enhancing this old parallel search! I hope you find it usefull.

TomKerrigan · Post by **TomKerrigan** » Thu Dec 14, 2017 3:17 am

jcw66 wrote:Thanks for enhancing this old parallel search! I hope you find it usefull.

Great to hear from you and thank you so much for developing the original algorithm!

I think almost anything that's self-synchronizing is terrific. Self-synchronizing Young Brothers Wait is a stroke of genius.

"Simplified ABDADA" updated

"Simplified ABDADA" updated

Re: "Simplified ABDADA" updated

Re: "Simplified ABDADA" updated

Re: "Simplified ABDADA" updated

Re: "Simplified ABDADA" updated

Re: "Simplified ABDADA" updated

Re: "Simplified ABDADA" updated

Re: "Simplified ABDADA" updated