Time assignment to children

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Time assignment to children

Post by matthewlai »

This is a very high level philosophical idea about shaping the search tree.

For the purpose of this discussion, I am going to ignore various extensions and prunings for now, as well as things like LMR. This discussion is focused on doing searches fundamentally constrained by depth (which is what pretty much everyone is doing right now).

Given a root position, there's a theoretical game tree that includes all possible variations starting at this root position, going all the way to end of games. There's also 1 or more PVs in this theoretical game tree. When we do a search, we are essentially trying to see as much of this theoretical PV as possible.

We do that by constructing a search tree within this huge theoretical game tree, and hope that it includes as much of the theoretical PV as possible.

But which part of the giant theoretical game tree do we search? Well, it makes sense to search nodes that have the highest probability of being in the PV.

The root node has the highest probability (of 1), then the nodes 1 ply from the root node, etc. With no further information about the nodes, we assume that each children of a parent have the same probability of being in the PV, given that the parent is in the PV.

Since the tree obviously has a branching factor of greater than 1, in general, nodes closer to the root will have a higher probability of being in the theoretical PV.

That's why in depth-constrained searches, we look at all nodes less than a certain number of moves from the root position (for each iteration of ID).

That seems to make sense, but consider the situation where there is one node with 2 children. Child A is a much simpler position, with only 3 possible moves. Child B is a more complex position, with 9 possible moves.

With a depth-constrained search, we would be spending much more time on B than A. But does that really make sense?

Does it really make sense to search each of B's 9 children to the same depth as A's 3 children?

If we assume the probability of A and B being in the theoretical PV to be equal, the probability of each of B's children being in the PV is much lower than A's children, just because B has many more children. Wouldn't it make sense to search A's children deeper?

This is unless we want to assume that, in general, B has a higher probability of being in the PV because it has more children. I don't see how that can be argued.

Instead of depth-constrained searches, how about we do node-count constrained searches instead?

Code: Select all

search(position, nodeCount):
    if &#40;nodeCount < 1&#41;
        return qsearch&#40;position&#41;
    
    --nodeCount
    
    ...
    
    movelist = position.generateMoves&#40;)
    
    for each move in movelist&#58;
        apply move
        score = search&#40;move, nodeCount / movelist.size&#41;
        ...
    ...
Now obviously, we don't actually do uniform-depth searches anymore. We do singular extensions, check extensions, recapture extension, one-reply extension, etc, and LMR. All of those techniques have been successfully applied in some program at some point.

What do they all have in common? They all force the search to spend more time on positions with less legal moves (or less reasonable moves).

In fact, I believe this is the most important effect of LMR. LMR makes searches spend much less time on positions with a lot of moves, than it would have otherwise, relative to positions with less reasonable moves.

This is one of the ideas I'm exploring with my Master's thesis, and I was just wondering what you guys think?

Note that I described the algorithm in terms of nodes above. I did that because I feel it's easier to understand that way. However, it can also be applied in a depth-based framework. The only thing that needs to be changed is the default depth reduction. Instead of reducing depth by 1 on every ply by default, it should be reduced by an amount based on how many legal moves there are for that position. But then "depth" is pretty much meaningless at this point, and it would be easier to just throw it out.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: Time assignment to children

Post by Aleks Peshkov »

The problem that depth-metrics needs only 8 bits or less in TT slot. It is difficult to compact node-metrics so much.

For example if we store just log2 function of the subtree node-count it is difficult to have decent precision of node-count for intermediate nodes.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Time assignment to children

Post by matthewlai »

Aleks Peshkov wrote:The problem that depth-metrics needs only 8 bits or less in TT slot. It is difficult to compact node-metrics so much.

For example if we store just log2 function of the subtree node-count it is difficult to have decent precision of node-count for intermediate nodes.
That's true. It would take more bits.

I believe this issue is bigger than that, though.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: Time assignment to children

Post by Aleks Peshkov »

There is nothing magical in depth metrics except its simplicity.

For example for perft function nodes count is significantly better then depth in replacement scheme.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Time assignment to children

Post by bob »

matthewlai wrote:This is a very high level philosophical idea about shaping the search tree.

For the purpose of this discussion, I am going to ignore various extensions and prunings for now, as well as things like LMR. This discussion is focused on doing searches fundamentally constrained by depth (which is what pretty much everyone is doing right now).

Given a root position, there's a theoretical game tree that includes all possible variations starting at this root position, going all the way to end of games. There's also 1 or more PVs in this theoretical game tree. When we do a search, we are essentially trying to see as much of this theoretical PV as possible.

We do that by constructing a search tree within this huge theoretical game tree, and hope that it includes as much of the theoretical PV as possible.

But which part of the giant theoretical game tree do we search? Well, it makes sense to search nodes that have the highest probability of being in the PV.

The root node has the highest probability (of 1), then the nodes 1 ply from the root node, etc. With no further information about the nodes, we assume that each children of a parent have the same probability of being in the PV, given that the parent is in the PV.

Since the tree obviously has a branching factor of greater than 1, in general, nodes closer to the root will have a higher probability of being in the theoretical PV.

That's why in depth-constrained searches, we look at all nodes less than a certain number of moves from the root position (for each iteration of ID).

That seems to make sense, but consider the situation where there is one node with 2 children. Child A is a much simpler position, with only 3 possible moves. Child B is a more complex position, with 9 possible moves.

With a depth-constrained search, we would be spending much more time on B than A. But does that really make sense?

Does it really make sense to search each of B's 9 children to the same depth as A's 3 children?

If we assume the probability of A and B being in the theoretical PV to be equal, the probability of each of B's children being in the PV is much lower than A's children, just because B has many more children. Wouldn't it make sense to search A's children deeper?

This is unless we want to assume that, in general, B has a higher probability of being in the PV because it has more children. I don't see how that can be argued.

Instead of depth-constrained searches, how about we do node-count constrained searches instead?

Code: Select all

search&#40;position, nodeCount&#41;&#58;
    if &#40;nodeCount < 1&#41;
        return qsearch&#40;position&#41;
    
    --nodeCount
    
    ...
    
    movelist = position.generateMoves&#40;)
    
    for each move in movelist&#58;
        apply move
        score = search&#40;move, nodeCount / movelist.size&#41;
        ...
    ...
Now obviously, we don't actually do uniform-depth searches anymore. We do singular extensions, check extensions, recapture extension, one-reply extension, etc, and LMR. All of those techniques have been successfully applied in some program at some point.

What do they all have in common? They all force the search to spend more time on positions with less legal moves (or less reasonable moves).

In fact, I believe this is the most important effect of LMR. LMR makes searches spend much less time on positions with a lot of moves, than it would have otherwise, relative to positions with less reasonable moves.

This is one of the ideas I'm exploring with my Master's thesis, and I was just wondering what you guys think?

Note that I described the algorithm in terms of nodes above. I did that because I feel it's easier to understand that way. However, it can also be applied in a depth-based framework. The only thing that needs to be changed is the default depth reduction. Instead of reducing depth by 1 on every ply by default, it should be reduced by an amount based on how many legal moves there are for that position. But then "depth" is pretty much meaningless at this point, and it would be easier to just throw it out.
I think that a simple example from the past shows why this doesn't work very well. At one point Harry Nelson noticed that in Cray Blitz, the node count for each root node (total sub-tree, not just number of siblings) contained useful information. For example, you have a best move M, but once you reach the final iteration move P will replace it as the best move. Harry noticed that for each successive iteration, the node count for the sub-tree below move P increased more than the node count for other moves (> than the expected value based on branching factor). So I wrote code to sort the move list after each iteration based on this. And it was an improvement.

But then along came all the forward-pruning and reduction stuff (such as LMR, LMP, futility pruning, razoring, you-name-it) and those node counts suddenly didn't mean much any more. Sorting based on node counts was measurably worse than just leaving the move list alone so that each new best move would rise to the top, and all the previous best moves would be at the front of the list for each successive iteration, with everything else left alone.

Just because a move has fewer siblings doesn't say a thing about how important each of those siblings might be. I can't see any reasonable justification for searching each of 3 siblings more deeply and you do when a position has 9 siblings, other than the position is somehow more forced. Singular extensions is an idea that has merit, but it doesn't extend the search on a move just because there is one move, it extends when there is one good move and the rest are bad, regardless of whether or not there are 2 others or 20 others to choose from. The positions with the fewest moves are already generally searched more deeply, because such positions have the side on move in check. Positions with few GOOD moves are much more difficult to recognize, however. And when you throw in pseudo-legal moves, this becomes even more problematic.

This is not saying the idea won't work. But evidence suggests that the only way to recognize things that need to be extended is by searching them, not by some static criterion such as "< N siblings vs > N siblings"..
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Time assignment to children

Post by matthewlai »

bob wrote: I think that a simple example from the past shows why this doesn't work very well. At one point Harry Nelson noticed that in Cray Blitz, the node count for each root node (total sub-tree, not just number of siblings) contained useful information. For example, you have a best move M, but once you reach the final iteration move P will replace it as the best move. Harry noticed that for each successive iteration, the node count for the sub-tree below move P increased more than the node count for other moves (> than the expected value based on branching factor). So I wrote code to sort the move list after each iteration based on this. And it was an improvement.
That sounds like a useful heuristic. I'm not surprised it works. When a child has a larger sub-tree, that usually means something interesting is happening there (PV changed), so ordering moves based on that makes sense.

I don't really see how that's related to the discussion here, though.
Just because a move has fewer siblings doesn't say a thing about how important each of those siblings might be. I can't see any reasonable justification for searching each of 3 siblings more deeply and you do when a position has 9 siblings, other than the position is somehow more forced. Singular extensions is an idea that has merit, but it doesn't extend the search on a move just because there is one move, it extends when there is one good move and the rest are bad, regardless of whether or not there are 2 others or 20 others to choose from. The positions with the fewest moves are already generally searched more deeply, because such positions have the side on move in check. Positions with few GOOD moves are much more difficult to recognize, however. And when you throw in pseudo-legal moves, this becomes even more problematic.

This is not saying the idea won't work. But evidence suggests that the only way to recognize things that need to be extended is by searching them, not by some static criterion such as "< N siblings vs > N siblings"..
Pruning, reductions, and extensions can still be done using a node-count based approach, by not allocating nodes equally. That's not any more difficult or easier than figuring out what to reduce and extend in a depth-based framework, since all these "extension" and "reduction/pruning" decisions still need to be made statically before searching (TT, killers, SEE, etc). Or they can be done after a first pass, and that can be done with a node-count based approach as well (re-searches, etc).

So here we are talking about "default" depth reductions (always 1 in most modern engines). This is our starting point for how deeply to search each move, before any heuristic is applied. If we have other information (TT, killers, SEE, history, and whatnot), we can tweak each individual depth/node count using that.

If we have no prior information about the moves, the more moves there are, the lower the chance of each one being the best is.

Checking is not the only way to drastically reduce number of moves. Trading queens can drastically reduce number of moves as well, for example.

If we have a root position with 2 legal moves - whether to trade queens or not.

Using a uniform-depth approach, we would be spending much more time on the non-trading sub-tree, even though absent any other information, we don't know that's more likely going to happen.

This would also apply to open vs closed games for example. If you have the option of keeping the game closed vs trading something to open it, does it make sense to spend much more time on the opening subtree, even though keeping it closed is equally likely to be the better option?

Pseudo-legal moves would make it problematic, yes, but that's just an implementation detail. I switched to legal move generation in my engine because when your evaluator is a huge neural network that brings your NPS down to 40k, the legal move generation overhead is negligible :).
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Time assignment to children

Post by hgm »

Funny that you brought this up, as just an hour or so later I sort of hinted at the same thing in the 'dedicated mate solver' thread, without having read your post yet.

And yes, what you write makes very much sense. In fact a course version of it has been a well-known trick for decades: some engines double the remaining depth (or give a hefty extension) when the last non-Pawn is captured, to search more deeply in the Pawn-ending branches than in the others. As the branching factor is a lot lower in Pawn endings, this is not too costly. Especially in the days that we could not reach very high depth, this was very important to avoid naive conversion to an obviously lost Pawn ending, just because the depth wasn't enough to see the actual promotion.

I have porposed this solution on earlier occasions, in the context of these problems where one side has many trapped pieces, and there is a quite long path to a win that keeps the pieces trapped, while most branches let the pieces escape and the branching ratio explode as a consequence.

Note that the log of the depth doesn't have to be very precise, and that whether you use 8 or 10 bits to store draft in the PV has virtually no impact. (Halving the number of entries in the TT typically reduces the search speed by only the 12th root of 2, i.e. 6%, for a 6-Elo drop in rating.)

If you could statically recognize 'apparently poor' moves (like exposing a high piece to capture by a lower, or an unpretected piece to capture, or leave a piece more attacked than it is protected), you could use the count of the remaining moves to reduce the depth.

One problem I foresee is to handle situations where one side has many moves, but the other just a few. The side with few move really needs good verification that what he plans to play won't run into an unexpected bad end, as when it does, he has little opportunity to find an escape. But in the mean time his opponent is burning the depth budget.
Edoardo Manino
Posts: 70
Joined: Thu Jul 12, 2012 12:43 am
Location: Turin, Italy

Re: Time assignment to children

Post by Edoardo Manino »

Hi,
while the idea sounds interesting to me for a minimax search, I don't understand how it applies to an iterative deepening alpha/beta framework. Usually the majority of nodes are searched in the subtrees belonging to the PV, does your scheme implies that the PV is constantly undersearched (depth-wise) w.r.t. the other (less important) branches?
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Time assignment to children

Post by mvk »

matthewlai wrote:This is a very high level philosophical idea about shaping the search tree.
I'm have been working on this also. What I try to do is a 3 step algorithm to expand the tree:

Step 1. Back-propagate the probability densities from the leaves (or dropout moves in my case) to the root.
1.a. For this you need to estimate/calculate the probability density function of the max of several gaussians.
1.b. At the same time you can calculate the odds that each move is the best in its own node.

Step 2. Forward-propagate the "PV likelihood" from the root towards the dropout moves using these odds.

Step 3. Select the leaf or dropout move with the highest values found in step 2 and expand those nodes. I expect you can safely expand a whole bunch of them in parallel, to avoid repeating step 1 and 2 too often.

(Then repeat from 1 until you run out of time or until the probability that the 1st root move is best exceeds a certain threshold.)

I use a two-parameter model for the nodes: the probability densities (or are they called likelihoods? I'm never sure) of the nodes are simple gaussians. I have a lot of data I can use to map search score (and depth) to a mu and sigma. This ignores non-gaussianness and skew. Also, the probability density function of the max is usually well-approximated by another gaussian, but not always (the typical nasty case is mu1=0, sigma1=1, mu2=1, sigma2=10). I ignore that now. I hope that in chess trees such cases shouldn't normally happen. (However, draw scores can do this)

In my prototype, step 1 is still very slow to be practical. But probably for the 2-move case (the most typical case) there is a faster calculation method than the integrals I calculate now. That is where I was stuck last week, looking for literature.

But it might be fast enough already to drive a distributed search during the game, because there the trees can be much smaller. I haven't tried that yet.

You throw away alpha-beta though, because alpha-beta assumes the sigmas are all zero. But in a way that is the whole point: Alpha-beta assumes the tree shape is already given, while it isn't. So to remedy that we have ended up with a cargo load of heuristics to win back the information that alpha-beta is throwing away. I figure that many of the well-known algorithms (iterative deepening, singular extensions, LMR, splitpoint selection, easy move, conspiracy search) can be seen as crude-but-fast approximations of this generic algorithm.
Last edited by mvk on Mon Jul 27, 2015 8:15 pm, edited 5 times in total.
[Account deleted]
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Time assignment to children

Post by matthewlai »

hgm wrote:Funny that you brought this up, as just an hour or so later I sort of hinted at the same thing in the 'dedicated mate solver' thread, without having read your post yet.

And yes, what you write makes very much sense. In fact a course version of it has been a well-known trick for decades: some engines double the remaining depth (or give a hefty extension) when the last non-Pawn is captured, to search more deeply in the Pawn-ending branches than in the others. As the branching factor is a lot lower in Pawn endings, this is not too costly. Especially in the days that we could not reach very high depth, this was very important to avoid naive conversion to an obviously lost Pawn ending, just because the depth wasn't enough to see the actual promotion.

I have porposed this solution on earlier occasions, in the context of these problems where one side has many trapped pieces, and there is a quite long path to a win that keeps the pieces trapped, while most branches let the pieces escape and the branching ratio explode as a consequence.

Note that the log of the depth doesn't have to be very precise, and that whether you use 8 or 10 bits to store draft in the PV has virtually no impact. (Halving the number of entries in the TT typically reduces the search speed by only the 12th root of 2, i.e. 6%, for a 6-Elo drop in rating.)

If you could statically recognize 'apparently poor' moves (like exposing a high piece to capture by a lower, or an unpretected piece to capture, or leave a piece more attacked than it is protected), you could use the count of the remaining moves to reduce the depth.

One problem I foresee is to handle situations where one side has many moves, but the other just a few. The side with few move really needs good verification that what he plans to play won't run into an unexpected bad end, as when it does, he has little opportunity to find an escape. But in the mean time his opponent is burning the depth budget.
Good to know that I am not completely crazy :).

I just read your post on the mate finder thread, and that makes sense. I think that's how humans naturally search as well, when looking for mates.

I am not entirely sure whether I really want to do dynamic node budget changes or not. It's probably beneficial in some situations, but it's harder to reason about.

My current plan is to allocate the number of nodes statically (using TT, killers, SEE, etc). Then if we get a cutoff, we can return before actually having searched the number of nodes specified in the budget. This way, node budget becomes an upper-bound.

One simple way to allocate nodes would be something like assigning a number to each move, and normalize them afterwards so they sum to 1.

For example, hash move can be given a score of 3, winning captures 2, other moves 1, losing captures 0.1, etc.

I will probably do that with machine learning, so I'm not going to think about this too much.

I will do a simple proof of concept using a scheme like that, though, and see what happens.

There are quite a few implementation details to figure out, but they aren't really that difficult. Just requires a bit of thinking (and a lot of coding).

Yeah if the 2 players have very different number of legal moves it could become a problem. I'm not sure if there's a solution to that.

The biggest missing piece in computer chess now, I think, is that computers still have no way to tell if 2 positions are "practically equivalent". That's something humans use all the time, and it's what significantly reduces effective branching factors in those kind of positions. I believe machine learning may be able to solve that, but I won't have time to investigate that before the thesis is due, so that will have to wait for now.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.