"SuperLazy SMP" by using move list chunks: would this work?
Posted: Wed Sep 23, 2020 3:06 pm
Hi
Most people here know how the perft function works: you create a move list, iterate over it, make a move, and then call perft on the new board with depth - 1. As soon as depth == 0, you "return 1" to increase the leaf count. Because threading is so easy in Rust, I decided to try and add it to perft without looking how other engines did it.
I divided the initial move list into chunks, and then attached a thread to each chunk.
When I have (let's say) four threads, I first call a function create_chunks(). It generates a move list, and then creates 4 chunks, one for each thread, and distributes the moves from that list over the chunks. Thus I end up with 4 chunks, each having a partial move lists. Instead of just calling perft (which would need a complete move list to work), I call perft_on_chunk(), which launches perft for each chunk (in its own thread, attaching the handle to the chunk). After each of the chunk threads ends, I can add the resulting leaf counts from all threads together, and perft is completed.
It works: on four real cores, the speedup is about 3.5x.
The code is here:
https://github.com/mvanthoor/rustic/blo ... c/perft.rs
With regard to LazySMP, I've been reading that the different threads must differentiate from one another so they do not search the same moves. At some point they won't, because of the hash table and timing differences (and thus speed differences) between the threads, but it seems this can take some time.
Would it work to do exactly the same thing as I did in perft? Create chunks, and then call "search_on_chunk()" before actually calling the "real" search function? Then each thread would start out at a completely different point, and they will each find their own best_move. After the search time is up, the controlling thread can pick the best best_move. Obviously adding a TT will be necessary to avoid searching/evaluating a position multiple times.
(The other way around... if this sort of SMP works, then adding a TT to the currently working multi-threaded perft would also work.)
I don't immediately see a reason why this wouldn't work, besides the fact that I never implemented a multi-threaded alpha/beta search, and that this way would just search more useless stuff faster (i.e.: it could "speed up" the search with regard to node counts, but the searched moves are useless and the result is the same as it would have been with one thread).
Anybody has any thoughts about this?
(At this point I'm writing a normal a/b search. Even though it already has a controller thread, it'll only have one worker for now, actually searching single-threaded until all this stuff works and actually plays chess.)
Most people here know how the perft function works: you create a move list, iterate over it, make a move, and then call perft on the new board with depth - 1. As soon as depth == 0, you "return 1" to increase the leaf count. Because threading is so easy in Rust, I decided to try and add it to perft without looking how other engines did it.
I divided the initial move list into chunks, and then attached a thread to each chunk.
When I have (let's say) four threads, I first call a function create_chunks(). It generates a move list, and then creates 4 chunks, one for each thread, and distributes the moves from that list over the chunks. Thus I end up with 4 chunks, each having a partial move lists. Instead of just calling perft (which would need a complete move list to work), I call perft_on_chunk(), which launches perft for each chunk (in its own thread, attaching the handle to the chunk). After each of the chunk threads ends, I can add the resulting leaf counts from all threads together, and perft is completed.
It works: on four real cores, the speedup is about 3.5x.
The code is here:
https://github.com/mvanthoor/rustic/blo ... c/perft.rs
With regard to LazySMP, I've been reading that the different threads must differentiate from one another so they do not search the same moves. At some point they won't, because of the hash table and timing differences (and thus speed differences) between the threads, but it seems this can take some time.
Would it work to do exactly the same thing as I did in perft? Create chunks, and then call "search_on_chunk()" before actually calling the "real" search function? Then each thread would start out at a completely different point, and they will each find their own best_move. After the search time is up, the controlling thread can pick the best best_move. Obviously adding a TT will be necessary to avoid searching/evaluating a position multiple times.
(The other way around... if this sort of SMP works, then adding a TT to the currently working multi-threaded perft would also work.)
I don't immediately see a reason why this wouldn't work, besides the fact that I never implemented a multi-threaded alpha/beta search, and that this way would just search more useless stuff faster (i.e.: it could "speed up" the search with regard to node counts, but the searched moves are useless and the result is the same as it would have been with one thread).
Anybody has any thoughts about this?
(At this point I'm writing a normal a/b search. Even though it already has a controller thread, it'll only have one worker for now, actually searching single-threaded until all this stuff works and actually plays chess.)