Time to depth concerns

Edsel Apostol · Post by **Edsel Apostol** » Wed Aug 17, 2016 7:34 am

petero2 wrote:
Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.

Ah, yes, I totally missed that one. Have you managed to implement the "Possible Improvements" in the latest version? Do the threads in the SplitPoint search unique moves from the movelist int he SplitPoint?

Edsel Apostol · Post by **Edsel Apostol** » Wed Aug 17, 2016 7:50 am

Werewolf wrote:
Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP.
So are you saying Lazy SMP doesn't scale well with more threads, or that time to depth isn't a good indicator of its speed? I'm guessing the latter.

In which case, how do we measure speed with Lazy SMP?

What about Komodo, is that lazy SMP or YBWC?

Thanks.

Time to depth isn't a good indicator of it's speed. There really is no way to check the efficiency of that implementation except by playing games.

As for Komodo, I remembered Don being very interested when a new form of Lazy SMP was being discussed. I also remembered Bob saying Komodo is a normal YBW (as per his talk with Don) but has not fixed/synced the move count bug that makes LMR and LMP affected, leading it to the observation that Komodo SMP is widening. Komodo team wouldn't admit to anything as they've said in the TCEC chat that it is a trade secret. So I'm not sure really.

petero2 · Post by **petero2** » Wed Aug 17, 2016 8:15 pm

Edsel Apostol wrote:
petero2 wrote:
Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
Ah, yes, I totally missed that one. Have you managed to implement the "Possible Improvements" in the latest version?

Some of them have been implemented.

petero2 wrote:If the master thread is searching move one at a SplitPoint, and a helper thread is searching move two, and the helper thread fails high, the master thread still finishes searching move one before it finds the fail high result for move two in the transposition table and propagates the fail high result up in the search tree. In this case there is an opportunity for the parallel search to search fewer nodes than the serial search would search, by immediately aborting the master thread. I don't know if the DTS algorithm handles this case but I would guess that it does.

This has been fixed.

petero2 wrote:If the master thread is searching move one at a SplitPoint, and a helper thread is searching move two, and the master thread fails low on move one, it will start searching move two. When this happens the helper thread is "kicked out" of that branch and has to start searching somewhere else. The idea is that the master thread should quickly get to the same point where the helper thread was interrupted because of transposition table hits. However, there is still some overhead involved. This seems hard to avoid with the current design, but the hope is that the overhead is small.

This is built into the design and I don't think it can be fixed without drastically changing the design.

petero2 wrote:If the transposition table is too small it is possible that search results computed by helper threads are overwritten before they are needed by the master thread. It would be straightforward to store the search results in the SplitPointMove objects, but I have not implemented this because I don't think it is a problem in practice.

This has been fixed.

petero2 wrote:The algorithm could try harder to search at moves with larger remaining search depth. Currently the move with the highest estimated usefulness probability is searched, and in case of a tie the move with the largest depth is chosen. However, it may be better to search a move with a larger depth but with a slightly lower probability, because the overhead associated with starting and stopping the search, including the priority queue manipulation and copying of history and killer tables, is reduced. If statistics were collected regarding overhead as a function of search depth, this could be weighted into the probability calculations.

This was fixed before the 1.03 release.

petero2 wrote:Another idea is to perform a "wider" search in the helper threads if the search depth is low, so that the overhead is high. The widening could be implemented by ignoring the LMR reduction in this case, and the effect would be that the helper thread would spend more time examining search tree nodes and less time on overhead. I have no idea if this would improve the program elo though.

I have not tried this. It is questionable if it would actually help. Don Dailey once said something like that worked in an early version of Komodo, but more recent Komodo versions may do things differently.

Edsel Apostol wrote:Do the threads in the SplitPoint search unique moves from the movelist in the SplitPoint?

Yes, except for the brief periods of time when the SplitPoint owner thread starts searching a move that is currently being search by a helper thread, and the helper thread has not yet realized that is has been "kicked out" and should go away and search somewhere else instead.

petero2 · Post by **petero2** » Wed Aug 17, 2016 9:19 pm

bob wrote:
petero2 wrote:
bob wrote:
petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.
I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type.
You HAVE to use some sort of YBW-derivative or the search will simply have a very poor speedup, because you can't afford to split at CUT nodes

It depends on how you want to define YBW. In your previous post you said the loose definition was to not split until at least one move has been searched at a node. If that still holds, neither DTS nor the texel algorithm qualifies as being YBW, since both will happily split at an expected ALL node before the first move has been searched.

Anyway I agree YBWC, DTS and texel's algorithm are quite similar in nature, in that they all try to partition the search tree in a way that:

1. Avoids searching the same subtrees more than once.
2. Avoids searching parts of the tree that a serial search would not have searched.
3. Tries to keep all threads busy at all times.

This is quite different from lazy SMP, which instead has the following properties:

1. Relies on the transposition table to (somewhat?) avoid searching the same subtrees more than once.
2. Happily searches nodes that a serial search would not have searched, relying instead on the widening effect.
3. Keeps all threads busy by having almost no synchronization points.

bob · Post by **bob** » Wed Aug 17, 2016 11:38 pm

petero2 wrote:
bob wrote:
petero2 wrote:
bob wrote:
petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.
I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type.
You HAVE to use some sort of YBW-derivative or the search will simply have a very poor speedup, because you can't afford to split at CUT nodes
It depends on how you want to define YBW. In your previous post you said the loose definition was to not split until at least one move has been searched at a node. If that still holds, neither DTS nor the texel algorithm qualifies as being YBW, since both will happily split at an expected ALL node before the first move has been searched.

Anyway I agree YBWC, DTS and texel's algorithm are quite similar in nature, in that they all try to partition the search tree in a way that:

1. Avoids searching the same subtrees more than once.
2. Avoids searching parts of the tree that a serial search would not have searched.
3. Tries to keep all threads busy at all times.

This is quite different from lazy SMP, which instead has the following properties:

1. Relies on the transposition table to (somewhat?) avoid searching the same subtrees more than once.
2. Happily searches nodes that a serial search would not have searched, relying instead on the widening effect.
3. Keeps all threads busy by having almost no synchronization points.

For DTS, the idea was YBW except when no nodes satisfy that condition. In that case, examine the state of the tree to pick a likely node that WILL satisfy YBW once the first branch is searched (as opposed to one that will be a CUT node).

I am most definitely "anti-widening". That's counter-productive, otherwise you should do the same thing in the serial search to gain the same advantage. Hence me efforts at trying to improve the basic parallel search efficiency without resorting to compromises...

I consider DTS to be a YBW algorithm for the above reasons, even if the YBW criterion is not always used / satisfied. If the choice is "sit" or "do speculative work" I opt for the latter, since sitting never pays off.

petero2 · Post by **petero2** » Wed Mar 01, 2017 11:19 pm

bob wrote:
petero2 wrote:
bob wrote:
petero2 wrote:
bob wrote:
petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.
I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type.
You HAVE to use some sort of YBW-derivative or the search will simply have a very poor speedup, because you can't afford to split at CUT nodes
It depends on how you want to define YBW. In your previous post you said the loose definition was to not split until at least one move has been searched at a node. If that still holds, neither DTS nor the texel algorithm qualifies as being YBW, since both will happily split at an expected ALL node before the first move has been searched.

Anyway I agree YBWC, DTS and texel's algorithm are quite similar in nature, in that they all try to partition the search tree in a way that:

1. Avoids searching the same subtrees more than once.
2. Avoids searching parts of the tree that a serial search would not have searched.
3. Tries to keep all threads busy at all times.

This is quite different from lazy SMP, which instead has the following properties:

1. Relies on the transposition table to (somewhat?) avoid searching the same subtrees more than once.
2. Happily searches nodes that a serial search would not have searched, relying instead on the widening effect.
3. Keeps all threads busy by having almost no synchronization points.
For DTS, the idea was YBW except when no nodes satisfy that condition.

The texel algorithm is different since it does not use the YBW concept at all. It just estimates probabilities that searching a move would be useful and picks a move that has the highest probability of being useful. Nevertheless the algorithm will often pick a move that happens to satisfy the YBW condition, since the YBW condition is often strongly correlated with a high probability of the move being useful.

To see how often the YBW condition is actually satisfied, I added some code to texel to gather statistics about the move number of moves selected to be searched by helper threads. I searched the WAC 230 position to depth 30 using 4 threads:
[D]2b5/1r6/2kBp1p1/p2pP1P1/2pP4/1pP3K1/1R3P2/8 b - - 0 1
The result for depth 15, 20, 25 and 30 was the following:

Code: Select all

  moveNo     d15     d20     d25     d30
       0       0       0       0       0
       1     120     660    3375   20154
       2      89     657    3844   35609
       3     103     778    5738   67439
       4      91     758    6124   70848
       5      82     776    6448   73768
       6      91     792    6706   74164
       7      91     790    6660   74646
       8      97     803    6801   75719
       9      88     763    6823   73790
      10      92     767    6785   74357
      11      90     770    6765   74511
      12      89     778    6779   73521
      13      89     782    6675   72264
      14      82     770    6365   70110
      15      76     750    6181   67069
      16      81     722    5936   62209
      17      71     686    5531   58091
      18      61     588    5098   54188
      19      60     523    4824   49834
      20      48     444    4273   45202
      21      45     432    3917   40794
      22      36     331    3256   34974
      23      34     278    2607   28719
      24      25     176    1949   22261
      25       3      62    1257   16948
      26       3      32     902   13239
      27       0      10     673   10270
      28       0       6     444    7064
      29       0       0     243    4496
      30       0       0      91    2672
      31       0       0      59    1822
      32       0       0      37    1305
      33       0       0      25     883
      34       0       0      11     604
      35       0       0      10     412
      36       0       0       5     285
      37       0       0       5     203
      38       0       0       3     127
      39       0       0       3      80
      40       0       0       1      46
      41       0       0       0      31
      42       0       0       0      18
      43       0       0       0      11
      44       0       0       0       6
      45       0       0       0       4
      46       0       0       0       5
      47       0       0       0       2
      48       0       0       0       1
      49       0       0       0       0

moveNo 0 is never selected by a helper thread because that move is immediately taken by the thread that created the split point. moveNo 1 corresponds to moves where the YBW condition is not satisfied.

As can be seen, at low depths violating the YBW condition is often the best choice (least bad) in terms of estimated probability of being useful. When the search depth increases more potential good split points are available and a YBW violating move is less often selected. It can also be seen that at high depths the algorithm also avoids selecting a move with moveNo 2, i.e. it prefers waiting until at least two moves have failed to cause a cut-off.

It would be interesting to see similar data for a YBW type program, but I suspect such data is not easy to come by since many programs use lazy SMP these days, thereby nicely sidestepping the whole problem of selecting a good sub-tree to search in helper threads.

mjlef · Post by **mjlef** » Thu Mar 02, 2017 2:56 pm

APassionForCriminalJustic wrote:
Werewolf wrote:
Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP.
So are you saying Lazy SMP doesn't scale well with more threads, or that time to depth isn't a good indicator of its speed? I'm guessing the latter.

In which case, how do we measure speed with Lazy SMP?

What about Komodo, is that lazy SMP or YBWC?

Thanks.
Unless I am mistaken LAZY-SMP is about the nodes per second, mostly. Nobody knows what implementation Komodo uses as far as I know.

Larry and I know!

Time to depth concerns

Re: Time to depth concerns

Re: Time to depth concerns

Re: Time to depth concerns

Re: Time to depth concerns

Re: Time to depth concerns

Re: Time to depth concerns

Re: Time to depth concerns