Ah, yes, I totally missed that one. Have you managed to implement the "Possible Improvements" in the latest version? Do the threads in the SplitPoint search unique moves from the movelist int he SplitPoint?petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
Time to depth concerns
Moderators: hgm, Rebel, chrisw
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
Re: Time to depth concerns
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
Re: Time to depth concerns
Time to depth isn't a good indicator of it's speed. There really is no way to check the efficiency of that implementation except by playing games.Werewolf wrote:So are you saying Lazy SMP doesn't scale well with more threads, or that time to depth isn't a good indicator of its speed? I'm guessing the latter.Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP.
In which case, how do we measure speed with Lazy SMP?
What about Komodo, is that lazy SMP or YBWC?
Thanks.
As for Komodo, I remembered Don being very interested when a new form of Lazy SMP was being discussed. I also remembered Bob saying Komodo is a normal YBW (as per his talk with Don) but has not fixed/synced the move count bug that makes LMR and LMP affected, leading it to the observation that Komodo SMP is widening. Komodo team wouldn't admit to anything as they've said in the TCEC chat that it is a trade secret. So I'm not sure really.
-
- Posts: 688
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: Time to depth concerns
Some of them have been implemented.Edsel Apostol wrote:Ah, yes, I totally missed that one. Have you managed to implement the "Possible Improvements" in the latest version?petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP. You can try other engines with Lazy SMP like Andscacs/Texel to see if it has the same behavior. You can also compare it to engines with YBWC like Hannibal or Crafty.
This has been fixed.petero2 wrote:If the master thread is searching move one at a SplitPoint, and a helper thread is searching move two, and the helper thread fails high, the master thread still finishes searching move one before it finds the fail high result for move two in the transposition table and propagates the fail high result up in the search tree. In this case there is an opportunity for the parallel search to search fewer nodes than the serial search would search, by immediately aborting the master thread. I don't know if the DTS algorithm handles this case but I would guess that it does.
This is built into the design and I don't think it can be fixed without drastically changing the design.petero2 wrote:If the master thread is searching move one at a SplitPoint, and a helper thread is searching move two, and the master thread fails low on move one, it will start searching move two. When this happens the helper thread is "kicked out" of that branch and has to start searching somewhere else. The idea is that the master thread should quickly get to the same point where the helper thread was interrupted because of transposition table hits. However, there is still some overhead involved. This seems hard to avoid with the current design, but the hope is that the overhead is small.
This has been fixed.petero2 wrote:If the transposition table is too small it is possible that search results computed by helper threads are overwritten before they are needed by the master thread. It would be straightforward to store the search results in the SplitPointMove objects, but I have not implemented this because I don't think it is a problem in practice.
This was fixed before the 1.03 release.petero2 wrote:The algorithm could try harder to search at moves with larger remaining search depth. Currently the move with the highest estimated usefulness probability is searched, and in case of a tie the move with the largest depth is chosen. However, it may be better to search a move with a larger depth but with a slightly lower probability, because the overhead associated with starting and stopping the search, including the priority queue manipulation and copying of history and killer tables, is reduced. If statistics were collected regarding overhead as a function of search depth, this could be weighted into the probability calculations.
I have not tried this. It is questionable if it would actually help. Don Dailey once said something like that worked in an early version of Komodo, but more recent Komodo versions may do things differently.petero2 wrote:Another idea is to perform a "wider" search in the helper threads if the search depth is low, so that the overhead is high. The widening could be implemented by ignoring the LMR reduction in this case, and the effect would be that the helper thread would spend more time examining search tree nodes and less time on overhead. I have no idea if this would improve the program elo though.
Yes, except for the brief periods of time when the SplitPoint owner thread starts searching a move that is currently being search by a helper thread, and the helper thread has not yet realized that is has been "kicked out" and should go away and search somewhere else instead.Edsel Apostol wrote:Do the threads in the SplitPoint search unique moves from the movelist in the SplitPoint?
-
- Posts: 688
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: Time to depth concerns
It depends on how you want to define YBW. In your previous post you said the loose definition was to not split until at least one move has been searched at a node. If that still holds, neither DTS nor the texel algorithm qualifies as being YBW, since both will happily split at an expected ALL node before the first move has been searched.bob wrote:You HAVE to use some sort of YBW-derivative or the search will simply have a very poor speedup, because you can't afford to split at CUT nodespetero2 wrote:I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type.bob wrote:The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
Anyway I agree YBWC, DTS and texel's algorithm are quite similar in nature, in that they all try to partition the search tree in a way that:
1. Avoids searching the same subtrees more than once.
2. Avoids searching parts of the tree that a serial search would not have searched.
3. Tries to keep all threads busy at all times.
This is quite different from lazy SMP, which instead has the following properties:
1. Relies on the transposition table to (somewhat?) avoid searching the same subtrees more than once.
2. Happily searches nodes that a serial search would not have searched, relying instead on the widening effect.
3. Keeps all threads busy by having almost no synchronization points.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Time to depth concerns
For DTS, the idea was YBW except when no nodes satisfy that condition. In that case, examine the state of the tree to pick a likely node that WILL satisfy YBW once the first branch is searched (as opposed to one that will be a CUT node).petero2 wrote:It depends on how you want to define YBW. In your previous post you said the loose definition was to not split until at least one move has been searched at a node. If that still holds, neither DTS nor the texel algorithm qualifies as being YBW, since both will happily split at an expected ALL node before the first move has been searched.bob wrote:You HAVE to use some sort of YBW-derivative or the search will simply have a very poor speedup, because you can't afford to split at CUT nodespetero2 wrote:I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type.bob wrote:The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
Anyway I agree YBWC, DTS and texel's algorithm are quite similar in nature, in that they all try to partition the search tree in a way that:
1. Avoids searching the same subtrees more than once.
2. Avoids searching parts of the tree that a serial search would not have searched.
3. Tries to keep all threads busy at all times.
This is quite different from lazy SMP, which instead has the following properties:
1. Relies on the transposition table to (somewhat?) avoid searching the same subtrees more than once.
2. Happily searches nodes that a serial search would not have searched, relying instead on the widening effect.
3. Keeps all threads busy by having almost no synchronization points.
I am most definitely "anti-widening". That's counter-productive, otherwise you should do the same thing in the serial search to gain the same advantage. Hence me efforts at trying to improve the basic parallel search efficiency without resorting to compromises...
I consider DTS to be a YBW algorithm for the above reasons, even if the YBW criterion is not always used / satisfied. If the choice is "sit" or "do speculative work" I opt for the latter, since sitting never pays off.
-
- Posts: 688
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: Time to depth concerns
The texel algorithm is different since it does not use the YBW concept at all. It just estimates probabilities that searching a move would be useful and picks a move that has the highest probability of being useful. Nevertheless the algorithm will often pick a move that happens to satisfy the YBW condition, since the YBW condition is often strongly correlated with a high probability of the move being useful.bob wrote:For DTS, the idea was YBW except when no nodes satisfy that condition.petero2 wrote:It depends on how you want to define YBW. In your previous post you said the loose definition was to not split until at least one move has been searched at a node. If that still holds, neither DTS nor the texel algorithm qualifies as being YBW, since both will happily split at an expected ALL node before the first move has been searched.bob wrote:You HAVE to use some sort of YBW-derivative or the search will simply have a very poor speedup, because you can't afford to split at CUT nodespetero2 wrote:I know and texel does split before at least one move has been searched at a node. Therefore the algorithm is not of YBW type.bob wrote:The loose definition of YBW is that you don't split until at least one move has been searched at a node. Hence the "young brothers wait" terminology.petero2 wrote:Texel does not use lazy SMP and it does not use YBWC either. It uses this algorithm.
Anyway I agree YBWC, DTS and texel's algorithm are quite similar in nature, in that they all try to partition the search tree in a way that:
1. Avoids searching the same subtrees more than once.
2. Avoids searching parts of the tree that a serial search would not have searched.
3. Tries to keep all threads busy at all times.
This is quite different from lazy SMP, which instead has the following properties:
1. Relies on the transposition table to (somewhat?) avoid searching the same subtrees more than once.
2. Happily searches nodes that a serial search would not have searched, relying instead on the widening effect.
3. Keeps all threads busy by having almost no synchronization points.
To see how often the YBW condition is actually satisfied, I added some code to texel to gather statistics about the move number of moves selected to be searched by helper threads. I searched the WAC 230 position to depth 30 using 4 threads:
[D]2b5/1r6/2kBp1p1/p2pP1P1/2pP4/1pP3K1/1R3P2/8 b - - 0 1
The result for depth 15, 20, 25 and 30 was the following:
Code: Select all
moveNo d15 d20 d25 d30
0 0 0 0 0
1 120 660 3375 20154
2 89 657 3844 35609
3 103 778 5738 67439
4 91 758 6124 70848
5 82 776 6448 73768
6 91 792 6706 74164
7 91 790 6660 74646
8 97 803 6801 75719
9 88 763 6823 73790
10 92 767 6785 74357
11 90 770 6765 74511
12 89 778 6779 73521
13 89 782 6675 72264
14 82 770 6365 70110
15 76 750 6181 67069
16 81 722 5936 62209
17 71 686 5531 58091
18 61 588 5098 54188
19 60 523 4824 49834
20 48 444 4273 45202
21 45 432 3917 40794
22 36 331 3256 34974
23 34 278 2607 28719
24 25 176 1949 22261
25 3 62 1257 16948
26 3 32 902 13239
27 0 10 673 10270
28 0 6 444 7064
29 0 0 243 4496
30 0 0 91 2672
31 0 0 59 1822
32 0 0 37 1305
33 0 0 25 883
34 0 0 11 604
35 0 0 10 412
36 0 0 5 285
37 0 0 5 203
38 0 0 3 127
39 0 0 3 80
40 0 0 1 46
41 0 0 0 31
42 0 0 0 18
43 0 0 0 11
44 0 0 0 6
45 0 0 0 4
46 0 0 0 5
47 0 0 0 2
48 0 0 0 1
49 0 0 0 0
moveNo 0 is never selected by a helper thread because that move is immediately taken by the thread that created the split point. moveNo 1 corresponds to moves where the YBW condition is not satisfied.
As can be seen, at low depths violating the YBW condition is often the best choice (least bad) in terms of estimated probability of being useful. When the search depth increases more potential good split points are available and a YBW violating move is less often selected. It can also be seen that at high depths the algorithm also avoids selecting a move with moveNo 2, i.e. it prefers waiting until at least two moves have failed to cause a cut-off.
It would be interesting to see similar data for a YBW type program, but I suspect such data is not easy to come by since many programs use lazy SMP these days, thereby nicely sidestepping the whole problem of selecting a good sub-tree to search in helper threads.
-
- Posts: 1494
- Joined: Thu Mar 30, 2006 2:08 pm
Re: Time to depth concerns
Larry and I know!APassionForCriminalJustic wrote:Unless I am mistaken LAZY-SMP is about the nodes per second, mostly. Nobody knows what implementation Komodo uses as far as I know.Werewolf wrote:So are you saying Lazy SMP doesn't scale well with more threads, or that time to depth isn't a good indicator of its speed? I'm guessing the latter.Edsel Apostol wrote:Lazy SMP algorithm in SF is not conducive for time to depth tests. I think that is normal for Lazy SMP.
In which case, how do we measure speed with Lazy SMP?
What about Komodo, is that lazy SMP or YBWC?
Thanks.