RMO - Randomized Move Order - yet another Lazy SMP derivate

Michel · Post by **Michel** » Mon Jan 06, 2020 8:45 pm

Daniel Shawul wrote: ↑Mon Jan 06, 2020 8:32 pm
Alayan wrote: ↑Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.

Vondele has the very sensible theory which (translated into my own words) says that the individual threads search a _smaller_ tree, compared to sequential search (due to more cutoffs), but they effectively search a bigger _virtual tree_ which includes the grafted trees, searched by other threads, whose depth/eval information has been communicated via the hash table.

One could think for example of moves that would get different reductions in two different threads (due to move ordering desynchronizing). Via the HT interaction it could now be that they are _virtually_ searched to the same depth in both threads (Peter was also mentioning the effect of LMR).

So the widening is virtual.

Daniel Shawul · Post by **Daniel Shawul** » Mon Jan 06, 2020 9:39 pm

For those who want to follow vondele's experiments at stockfish forum
https://groups.google.com/forum/?fromgr ... X3lDH83tlk

lucasart · Post by **lucasart** » Tue Jan 07, 2020 5:39 am

Daniel Shawul wrote: ↑Mon Jan 06, 2020 8:32 pm
Alayan wrote: ↑Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.

Demolito does v2. This is the correct behavior IMO: as soon as any worker completes the required depth, stop immediately and play the move from that worker (not some arbitrary worker running a lower depth or some weird depth/score mix thread voting).

Of course, using fixed depth testing, v2 loses to v4 easily. But that's not a fair comparison. Elo testing using time based limit is metric that's relevant.

Pio · Post by **Pio** » Tue Jan 07, 2020 11:08 am

lucasart wrote: ↑Tue Jan 07, 2020 5:39 am
Daniel Shawul wrote: ↑Mon Jan 06, 2020 8:32 pm
Alayan wrote: ↑Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.
Demolito does v2. This is the correct behavior IMO: as soon as any worker completes the required depth, stop immediately and play the move from that worker (not some arbitrary worker running a lower depth or some weird depth/score mix thread voting).

Of course, using fixed depth testing, v2 loses to v4 easily. But that's not a fair comparison. Elo testing using time based limit is metric that's relevant.

I think the thread voting is a very smart Idea

and I think smatovic (I hope I did not misspell it) randomised experiment is also very smart.

I would love to see a search that used both ideas together. It would be really interesting to put in some randomisation in the move ordering (probably more randomisation the closer to the leafs you are (less depth left)). I would then do a generalisation of the thread voting idea by using it in the transposition table as well. That together with maybe doing the voting as a function of depth where the averaging should be bigger the closer to the leafs you get will turn your alpha beta search into a
MCTS like algorithm.

/Pio

smatovic · Post by **smatovic** » Tue Jan 07, 2020 12:39 pm

Pio wrote: ↑Tue Jan 07, 2020 11:08 am
lucasart wrote: ↑Tue Jan 07, 2020 5:39 am
Daniel Shawul wrote: ↑Mon Jan 06, 2020 8:32 pm
Alayan wrote: ↑Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.
Demolito does v2. This is the correct behavior IMO: as soon as any worker completes the required depth, stop immediately and play the move from that worker (not some arbitrary worker running a lower depth or some weird depth/score mix thread voting).

Of course, using fixed depth testing, v2 loses to v4 easily. But that's not a fair comparison. Elo testing using time based limit is metric that's relevant.
I think the thread voting is a very smart Idea and I think smatovic (I hope I did not misspell it) randomised experiment is also very smart.

I would love to see a search that used both ideas together. It would be really interesting to put in some randomisation in the move ordering (probably more randomisation the closer to the leafs you are (less depth left)). I would then do a generalisation of the thread voting idea by using it in the transposition table as well. That together with maybe doing the voting as a function of depth where the averaging should be bigger the closer to the leafs you get will turn your alpha beta search into a
MCTS like algorithm.

/Pio

I did experiment with different randomization schemes, tried to randomize more
in the middle of the game tree than at root and leafs, but could not outsmart
the generalized scheme on quick n dirty benchmarks.

--
Srdja

Daniel Shawul · Post by **Daniel Shawul** » Tue Jan 07, 2020 3:54 pm

lucasart wrote: ↑Tue Jan 07, 2020 5:39 am
Daniel Shawul wrote: ↑Mon Jan 06, 2020 8:32 pm
Alayan wrote: ↑Mon Jan 06, 2020 6:05 pm Vondele did an interesting experiment :
I implemented 5 versions (drafts: https://github.com/vondele/Stockfish/co ... hreadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%
Thanks, I think v4 ( all threads searching to same depth) is what we want.
So if I read this right, stockfish widening gives it at least +2 more plies of elo i.e. v4 at depth=13 is lightly better than seq at depth=15.
Demolito does v2. This is the correct behavior IMO: as soon as any worker completes the required depth, stop immediately and play the move from that worker (not some arbitrary worker running a lower depth or some weird depth/score mix thread voting).

Of course, using fixed depth testing, v2 loses to v4 easily. But that's not a fair comparison. Elo testing using time based limit is metric that's relevant.

It is only the fixed depth tests that I prefer v4, but there is no reason not to stop the search if one thread reaches the depth limit.
It turns out Stockfish can be made stronger by simply repeating a single-threaded search at a given depth a number of times.
In fact, it seems a single-threaded search repeated 8 times seem to give the same elo improvement as an 8-threads search to the same depth.
You would think hashtable cutoffs make the repeated searches useless but Stockfish does not do hashtable cutoffs at PV nodes. Any other
engine who does will not see any improvement from repeated searches. According to vondele's tests the effect wears of after 55 iterations for a fixed-depth test of 13 ...

JVMerlino · Post by **JVMerlino** » Tue Jan 07, 2020 11:10 pm

Daniel Shawul wrote: ↑Tue Jan 07, 2020 3:54 pm It is only the fixed depth tests that I prefer v4, but there is no reason not to stop the search if one thread reaches the depth limit.
It turns out Stockfish can be made stronger by simply repeating a single-threaded search at a given depth a number of times.
In fact, it seems a single-threaded search repeated 8 times seem to give the same elo improvement as an 8-threads search to the same depth.
You would think hashtable cutoffs make the repeated searches useless but Stockfish does not do hashtable cutoffs at PV nodes. Any other
engine who does will not see any improvement from repeated searches. According to vondele's tests the effect wears of after 55 iterations for a fixed-depth test of 13 ...

Bringing up the question, why doesn't SF (or any engine, for that matter) do cutoffs at PV nodes? The answer, of course, is "because it is worth X Elo". So, MY question is, how much is "X" for SF. And wouldn't allowing it to do those cutoffs give more of a benefit than just repeating the same search over and over?

Color me confused....

jm

Michel · Post by **Michel** » Wed Jan 08, 2020 12:23 am

JVMerlino wrote: ↑Tue Jan 07, 2020 11:10 pm
Daniel Shawul wrote: ↑Tue Jan 07, 2020 3:54 pm It is only the fixed depth tests that I prefer v4, but there is no reason not to stop the search if one thread reaches the depth limit.
It turns out Stockfish can be made stronger by simply repeating a single-threaded search at a given depth a number of times.
In fact, it seems a single-threaded search repeated 8 times seem to give the same elo improvement as an 8-threads search to the same depth.
You would think hashtable cutoffs make the repeated searches useless but Stockfish does not do hashtable cutoffs at PV nodes. Any other
engine who does will not see any improvement from repeated searches. According to vondele's tests the effect wears of after 55 iterations for a fixed-depth test of 13 ...
Bringing up the question, why doesn't SF (or any engine, for that matter) do cutoffs at PV nodes? The answer, of course, is "because it is worth X Elo". So, MY question is, how much is "X" for SF. And wouldn't allowing it to do those cutoffs give more of a benefit than just repeating the same search over and over?

Color me confused....

jm

Doing cut offs at PV nodes leads to short PV's (except in Crafty which stores a PV for exact PV nodes in a separate data structure).

lucasart · Post by **lucasart** » Wed Jan 08, 2020 5:20 am

JVMerlino wrote: ↑Tue Jan 07, 2020 11:10 pm
Daniel Shawul wrote: ↑Tue Jan 07, 2020 3:54 pm It is only the fixed depth tests that I prefer v4, but there is no reason not to stop the search if one thread reaches the depth limit.
It turns out Stockfish can be made stronger by simply repeating a single-threaded search at a given depth a number of times.
In fact, it seems a single-threaded search repeated 8 times seem to give the same elo improvement as an 8-threads search to the same depth.
You would think hashtable cutoffs make the repeated searches useless but Stockfish does not do hashtable cutoffs at PV nodes. Any other
engine who does will not see any improvement from repeated searches. According to vondele's tests the effect wears of after 55 iterations for a fixed-depth test of 13 ...
Bringing up the question, why doesn't SF (or any engine, for that matter) do cutoffs at PV nodes? The answer, of course, is "because it is worth X Elo". So, MY question is, how much is "X" for SF. And wouldn't allowing it to do those cutoffs give more of a benefit than just repeating the same search over and over?

Color me confused....

jm

That's expected. First you search with lots of reductions. Then you research but with HT grafting partially cancelling the reductions (bcos in the child node you HT prune with deeper entries). Which explains the richer tree effect.

Dann Corbit · Post by **Dann Corbit** » Wed Jan 08, 2020 6:29 pm

Doesn't this indicate that the reductions are flawed? Otherwise, where does the Elo increase come from?

If a fatter tree is better then maybe the reductions are incorrect or too extreme.

On the other hand, it seems possible that the small extra randomness introduced by the thread local storage might indicate a.phenomenon where k threads in n time are actually superior to one thread in k"n time, which is really interesting. But it's clearly not enough to compensate for SMP loss. I guess the next question is, can this effect be enhanced?

RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate

Re: RMO - Randomized Move Order - yet another Lazy SMP derivate