Hey guys, I did a lot of work on my parallel search earlier this year and came up with a pretty good way of doing it (IMO). I figured I'd make some web pages about it, and here they are:
http://www.tckerrigan.com/Chess/Parallel_Search/
Basically it's a very simple, easy, straightforward "method" of implementing parallel search, and if you follow the steps I think you will probably get speedups that are similar to DTS's. (You can see my own speedup numbers if you follow the link.)
The main algorithm involved is ABDADA, but I've simplified the implementation and added some of my own (simple) optimizations that improve its performance by 5-10%.
I look forward to any feedback people might have about this method and my web pages, and I definitely look forward to other people trying the method and hearing about how well it works for them!
"How To" guide to parallel-izing an engine, easy a
Moderators: hgm, Rebel, chrisw
-
- Posts: 64
- Joined: Fri Jul 03, 2015 9:22 pm
-
- Posts: 6808
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: o.t.
Hi Tom,
20 years later ...
Very happy to read your name in TalkChess. Thinking directly on the good old Winboard times and our first Winboard tourneys with Stober, Zarkov and first programmers create an engine for Winboard. You gave us a very strong support we need to public the tourneys and to animated others.
And of course I remember on your nice mails at this time. My English isn't better all the years but again ... I am very happy to read your name here.
A good day ...
Tom wrote!
Friendly
Frank
PS: From programming no idea ... but I have to write my shortly message!
20 years later ...
Very happy to read your name in TalkChess. Thinking directly on the good old Winboard times and our first Winboard tourneys with Stober, Zarkov and first programmers create an engine for Winboard. You gave us a very strong support we need to public the tourneys and to animated others.
And of course I remember on your nice mails at this time. My English isn't better all the years but again ... I am very happy to read your name here.
A good day ...
Tom wrote!
Friendly
Frank
PS: From programming no idea ... but I have to write my shortly message!
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: "How To" guide to parallel-izing an engine, ea
Nice guide, and for me very timely! I've been mulling over (re)writing my YBW inspired parallel algorithm, and ABDADA looks appealing (apart from the name).
Very useful!
Very useful!
-
- Posts: 1221
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Re: "How To" guide to parallel-izing an engine, ea
Hi Tom,
Many thanks. It's timely for me as I'm thinking about giving Maverick a parallel search.
Quick question - have you tried Lazy SMP? It seems to be simple to implement and all the rage with top engines (e.g. Stockfish and Komodo).
- Steve
Many thanks. It's timely for me as I'm thinking about giving Maverick a parallel search.
Quick question - have you tried Lazy SMP? It seems to be simple to implement and all the rage with top engines (e.g. Stockfish and Komodo).
- Steve
http://www.chessprogramming.net - Maverick Chess Engine
-
- Posts: 64
- Joined: Fri Jul 03, 2015 9:22 pm
Re: "How To" guide to parallel-izing an engine, ea
Haha, ABDADA is a mouthful. I had a hard time remembering the sequence of letters for a while until I started pronouncing it: ab-da-da. I thought about calling my implementation something else but then I thought it would be disrespectful to Jean-Christophe Weill. The algorithm really is a stroke of genius IMO--self-synchronizing YBW! Who thinks of this stuff?
Really excited to see if you have any feedback or thoughts about the stuff I posted. Feel free to post here or email me privately.
Really excited to see if you have any feedback or thoughts about the stuff I posted. Feel free to post here or email me privately.
-
- Posts: 64
- Joined: Fri Jul 03, 2015 9:22 pm
Re: "How To" guide to parallel-izing an engine, ea
I haven't tried Lazy SMP but I don't see the point. Lazy SMP seems like what you would get if you were trying to implement ABDADA and then gave up halfway through--instead of systematically ordering moves to divide up the work between threads (as ABDADA calls for), you just do things kinda randomly and hope for the best.Steve Maughan wrote:Hi Tom,
Many thanks. It's timely for me as I'm thinking about giving Maverick a parallel search.
Quick question - have you tried Lazy SMP? It seems to be simple to implement and all the rage with top engines (e.g. Stockfish and Komodo).
...
I can understand people's reluctance to implement ABDADA in the past, because the implementation in the original paper is a bit tricky.
With my "Simplified ABDADA" implementation, I think there's no reason not to use ABDADA. It's literally only a few lines of code, with absolutely minimal changes to a single-threaded search.
http://www.tckerrigan.com/Chess/Paralle ... ed_ABDADA/
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: "How To" guide to parallel-izing an engine, ea
Interesting. I'll give this a try.
But it will need some adaptation to work. The main problem is that, with shared hashtable (aka lazy), you certainly don't want to have all threads searching the same depth, or running their own iterative deepening loop in a desynchronized manner.
There is a TON of elo in shared hashtable, by simply:
1/ having half the threads at depth=d, and the other half at depth=d+1. SF extends this to more complicated patterns, but this works well up to 4 cores (and my machine only has 4).
2/ when a thread completes a depth, signal all threads working on this depth (or lower) to stop *immediately* (eg. global slatomic signal, threads checks signal and throws an exception to return to ID loop). They go back to the ID loop, where useful work is given to them (heeding rule #1).
I would be very surprised if this gains any elo on top of what I already have (which scales quasi perfectly elo wise up to 4 cores).
But it will need some adaptation to work. The main problem is that, with shared hashtable (aka lazy), you certainly don't want to have all threads searching the same depth, or running their own iterative deepening loop in a desynchronized manner.
There is a TON of elo in shared hashtable, by simply:
1/ having half the threads at depth=d, and the other half at depth=d+1. SF extends this to more complicated patterns, but this works well up to 4 cores (and my machine only has 4).
2/ when a thread completes a depth, signal all threads working on this depth (or lower) to stop *immediately* (eg. global slatomic signal, threads checks signal and throws an exception to return to ID loop). They go back to the ID loop, where useful work is given to them (heeding rule #1).
I would be very surprised if this gains any elo on top of what I already have (which scales quasi perfectly elo wise up to 4 cores).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: "How To" guide to parallel-izing an engine, ea
Another thing about your simplified ABDADA: it only applies at root. Shouldn't the logic be applied at every node (perhaps with a min depth) ? This requires some data in TT entries, like how many threads are searching the hash move in this position. Or perhaps I'm reinventing ABDADA…
I dont see how any root only rule can be scalable at long tc.
I dont see how any root only rule can be scalable at long tc.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: "How To" guide to parallel-izing an engine, ea
Where do you see it only applies at root?lucasart wrote:Another thing about your simplified ABDADA: it only applies at root.
As I understand it Tom uses a separate small hash table (pos AND move hashed together) which he uses to postpone (defer) searching moves already searched by other threads (except for the first move).
This is only applied when depth >= limit (in this case 3).
It's a nice idea and should be trivial to implement and try.