RMO - Randomized Move Order - yet another Lazy SMP derivate

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Michel »

Daniel Shawul wrote: Mon Jan 06, 2020 8:32 pm
Alayan wrote: Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:

Code: Select all

Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.
Vondele has the very sensible theory which (translated into my own words) says that the individual threads search a _smaller_ tree, compared to sequential search (due to more cutoffs), but they effectively search a bigger _virtual tree_ which includes the grafted trees, searched by other threads, whose depth/eval information has been communicated via the hash table.

One could think for example of moves that would get different reductions in two different threads (due to move ordering desynchronizing). Via the HT interaction it could now be that they are _virtually_ searched to the same depth in both threads (Peter was also mentioning the effect of LMR).

So the widening is virtual.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Daniel Shawul »

For those who want to follow vondele's experiments at stockfish forum
https://groups.google.com/forum/?fromgr ... X3lDH83tlk
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by lucasart »

Daniel Shawul wrote: Mon Jan 06, 2020 8:32 pm
Alayan wrote: Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:

Code: Select all

Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.
Demolito does v2. This is the correct behavior IMO: as soon as any worker completes the required depth, stop immediately and play the move from that worker (not some arbitrary worker running a lower depth or some weird depth/score mix thread voting).

Of course, using fixed depth testing, v2 loses to v4 easily. But that's not a fair comparison. Elo testing using time based limit is metric that's relevant.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Pio »

lucasart wrote: Tue Jan 07, 2020 5:39 am
Daniel Shawul wrote: Mon Jan 06, 2020 8:32 pm
Alayan wrote: Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:

Code: Select all

Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.
Demolito does v2. This is the correct behavior IMO: as soon as any worker completes the required depth, stop immediately and play the move from that worker (not some arbitrary worker running a lower depth or some weird depth/score mix thread voting).

Of course, using fixed depth testing, v2 loses to v4 easily. But that's not a fair comparison. Elo testing using time based limit is metric that's relevant.
I think the thread voting is a very smart Idea 💡and I think smatovic (I hope I did not misspell it) randomised experiment is also very smart.

I would love to see a search that used both ideas together. It would be really interesting to put in some randomisation in the move ordering (probably more randomisation the closer to the leafs you are (less depth left)). I would then do a generalisation of the thread voting idea by using it in the transposition table as well. That together with maybe doing the voting as a function of depth where the averaging should be bigger the closer to the leafs you get will turn your alpha beta search into a
MCTS like algorithm.

/Pio
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by smatovic »

Pio wrote: Tue Jan 07, 2020 11:08 am
lucasart wrote: Tue Jan 07, 2020 5:39 am
Daniel Shawul wrote: Mon Jan 06, 2020 8:32 pm
Alayan wrote: Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:

Code: Select all

Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.
Demolito does v2. This is the correct behavior IMO: as soon as any worker completes the required depth, stop immediately and play the move from that worker (not some arbitrary worker running a lower depth or some weird depth/score mix thread voting).

Of course, using fixed depth testing, v2 loses to v4 easily. But that's not a fair comparison. Elo testing using time based limit is metric that's relevant.
I think the thread voting is a very smart Idea 💡and I think smatovic (I hope I did not misspell it) randomised experiment is also very smart.

I would love to see a search that used both ideas together. It would be really interesting to put in some randomisation in the move ordering (probably more randomisation the closer to the leafs you are (less depth left)). I would then do a generalisation of the thread voting idea by using it in the transposition table as well. That together with maybe doing the voting as a function of depth where the averaging should be bigger the closer to the leafs you get will turn your alpha beta search into a
MCTS like algorithm.

/Pio
I did experiment with different randomization schemes, tried to randomize more
in the middle of the game tree than at root and leafs, but could not outsmart
the generalized scheme on quick n dirty benchmarks.

--
Srdja
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Daniel Shawul »

lucasart wrote: Tue Jan 07, 2020 5:39 am
Daniel Shawul wrote: Mon Jan 06, 2020 8:32 pm
Alayan wrote: Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:

Code: Select all

Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.
Demolito does v2. This is the correct behavior IMO: as soon as any worker completes the required depth, stop immediately and play the move from that worker (not some arbitrary worker running a lower depth or some weird depth/score mix thread voting).

Of course, using fixed depth testing, v2 loses to v4 easily. But that's not a fair comparison. Elo testing using time based limit is metric that's relevant.
It is only the fixed depth tests that I prefer v4, but there is no reason not to stop the search if one thread reaches the depth limit.
It turns out Stockfish can be made stronger by simply repeating a single-threaded search at a given depth a number of times.
In fact, it seems a single-threaded search repeated 8 times seem to give the same elo improvement as an 8-threads search to the same depth.
You would think hashtable cutoffs make the repeated searches useless but Stockfish does not do hashtable cutoffs at PV nodes. Any other
engine who does will not see any improvement from repeated searches. According to vondele's tests the effect wears of after 55 iterations for a fixed-depth test of 13 ...
JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by JVMerlino »

Daniel Shawul wrote: Tue Jan 07, 2020 3:54 pm It is only the fixed depth tests that I prefer v4, but there is no reason not to stop the search if one thread reaches the depth limit.
It turns out Stockfish can be made stronger by simply repeating a single-threaded search at a given depth a number of times.
In fact, it seems a single-threaded search repeated 8 times seem to give the same elo improvement as an 8-threads search to the same depth.
You would think hashtable cutoffs make the repeated searches useless but Stockfish does not do hashtable cutoffs at PV nodes. Any other
engine who does will not see any improvement from repeated searches. According to vondele's tests the effect wears of after 55 iterations for a fixed-depth test of 13 ...
Bringing up the question, why doesn't SF (or any engine, for that matter) do cutoffs at PV nodes? The answer, of course, is "because it is worth X Elo". So, MY question is, how much is "X" for SF. And wouldn't allowing it to do those cutoffs give more of a benefit than just repeating the same search over and over?

Color me confused.... :?

jm
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Michel »

JVMerlino wrote: Tue Jan 07, 2020 11:10 pm
Daniel Shawul wrote: Tue Jan 07, 2020 3:54 pm It is only the fixed depth tests that I prefer v4, but there is no reason not to stop the search if one thread reaches the depth limit.
It turns out Stockfish can be made stronger by simply repeating a single-threaded search at a given depth a number of times.
In fact, it seems a single-threaded search repeated 8 times seem to give the same elo improvement as an 8-threads search to the same depth.
You would think hashtable cutoffs make the repeated searches useless but Stockfish does not do hashtable cutoffs at PV nodes. Any other
engine who does will not see any improvement from repeated searches. According to vondele's tests the effect wears of after 55 iterations for a fixed-depth test of 13 ...
Bringing up the question, why doesn't SF (or any engine, for that matter) do cutoffs at PV nodes? The answer, of course, is "because it is worth X Elo". So, MY question is, how much is "X" for SF. And wouldn't allowing it to do those cutoffs give more of a benefit than just repeating the same search over and over?

Color me confused.... :?

jm
Doing cut offs at PV nodes leads to short PV's (except in Crafty which stores a PV for exact PV nodes in a separate data structure).
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by lucasart »

JVMerlino wrote: Tue Jan 07, 2020 11:10 pm
Daniel Shawul wrote: Tue Jan 07, 2020 3:54 pm It is only the fixed depth tests that I prefer v4, but there is no reason not to stop the search if one thread reaches the depth limit.
It turns out Stockfish can be made stronger by simply repeating a single-threaded search at a given depth a number of times.
In fact, it seems a single-threaded search repeated 8 times seem to give the same elo improvement as an 8-threads search to the same depth.
You would think hashtable cutoffs make the repeated searches useless but Stockfish does not do hashtable cutoffs at PV nodes. Any other
engine who does will not see any improvement from repeated searches. According to vondele's tests the effect wears of after 55 iterations for a fixed-depth test of 13 ...
Bringing up the question, why doesn't SF (or any engine, for that matter) do cutoffs at PV nodes? The answer, of course, is "because it is worth X Elo". So, MY question is, how much is "X" for SF. And wouldn't allowing it to do those cutoffs give more of a benefit than just repeating the same search over and over?

Color me confused.... :?

jm
That's expected. First you search with lots of reductions. Then you research but with HT grafting partially cancelling the reductions (bcos in the child node you HT prune with deeper entries). Which explains the richer tree effect.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Post by Dann Corbit »

Doesn't this indicate that the reductions are flawed? Otherwise, where does the Elo increase come from?

If a fatter tree is better then maybe the reductions are incorrect or too extreme.

On the other hand, it seems possible that the small extra randomness introduced by the thread local storage might indicate a.phenomenon where k threads in n time are actually superior to one thread in k"n time, which is really interesting. But it's clearly not enough to compensate for SMP loss. I guess the next question is, can this effect be enhanced?
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.