Time assignment to children

matthewlai · Post by **matthewlai** » Mon Jul 27, 2015 8:08 pm

Edoardo Manino wrote:Hi,
while the idea sounds interesting to me for a minimax search, I don't understand how it applies to an iterative deepening alpha/beta framework. Usually the majority of nodes are searched in the subtrees belonging to the PV, does your scheme implies that the PV is constantly undersearched (depth-wise) w.r.t. the other (less important) branches?

Pretty much everything (ID and minimax/AB) will still work the same way. The only change is when to go into QSearch.

ID would be done using node budget instead. For example, first iteration you can have a node budget of 10, then 100, then 1000, etc.

The opposite is true regarding how much of the PV is searched. The PV will probably be significantly longer. Probably. Most of the time. Obviously that can never be guaranteed, otherwise the search problem would have already been solved.

matthewlai · Post by **matthewlai** » Mon Jul 27, 2015 9:07 pm

mvk wrote:
matthewlai wrote:This is a very high level philosophical idea about shaping the search tree.
I'm have been working on this also. What I try to do is a 3 step algorithm to expand the tree:

Step 1. Back-propagate the probability densities from the leaves (or dropout moves in my case) to the root.
1.a. For this you need to estimate/calculate the probability density function of the max of several gaussians.
1.b. At the same time you can calculate the odds that each move is the best in its own node.

Step 2. Forward-propagate the "PV likelihood" from the root towards the dropout moves using these odds.

Step 3. Select the leaf or dropout move with the highest values found in step 2 and expand those nodes. I expect you can safely expand a whole bunch of them in parallel, to avoid repeating step 1 and 2 too often.

(Then repeat from 1 until you run out of time or until the probability that the 1st root move is best exceeds a certain threshold.)

I use a two-parameter model for the nodes: the probability densities (or are they called likelihoods? I'm never sure) of the nodes are simple gaussians. I have a lot of data I can use to map search score (and depth) to a mu and sigma. This ignores non-gaussianness and skew. Also, the probability density function of the max is usually well-approximated by another gaussian, but not always (the typical nasty case is mu1=0, sigma1=1, mu2=1, sigma2=10). I ignore that now. I hope that in chess trees such cases shouldn't normally happen. (However, draw scores can do this)

In my prototype, step 1 is still very slow to be practical. But probably for the 2-move case (the most typical case) there is a faster calculation method than the integrals I calculate now. That is where I was stuck last week, looking for literature.

But it might be fast enough already to drive a distributed search during the game, because there the trees can be much smaller. I haven't tried that yet.

You throw away alpha-beta though, because alpha-beta assumes the sigmas are all zero. But in a way that is the whole point: Alpha-beta assumes the tree shape is already given, while it isn't. So to remedy that we have ended up with a cargo load of heuristics to win back the information that alpha-beta is throwing away. I figure that many of the well-known algorithms (iterative deepening, singular extensions, LMR, splitpoint selection, easy move, conspiracy search) can be seen as crude-but-fast approximations of this generic algorithm.

I thought about doing something like this, too, but ended up settling on a much simpler version (described in the first post), where only priors are propagated down, and probability of each node is used for time allocation. But that's just because I need to finish my thesis by September, and can't afford to take on something so ambitious

.

Good luck with that!

bob · Post by **bob** » Mon Jul 27, 2015 9:57 pm

matthewlai wrote:
bob wrote: I think that a simple example from the past shows why this doesn't work very well. At one point Harry Nelson noticed that in Cray Blitz, the node count for each root node (total sub-tree, not just number of siblings) contained useful information. For example, you have a best move M, but once you reach the final iteration move P will replace it as the best move. Harry noticed that for each successive iteration, the node count for the sub-tree below move P increased more than the node count for other moves (> than the expected value based on branching factor). So I wrote code to sort the move list after each iteration based on this. And it was an improvement.
That sounds like a useful heuristic. I'm not surprised it works. When a child has a larger sub-tree, that usually means something interesting is happening there (PV changed), so ordering moves based on that makes sense.

I don't really see how that's related to the discussion here, though.

Just because a move has fewer siblings doesn't say a thing about how important each of those siblings might be. I can't see any reasonable justification for searching each of 3 siblings more deeply and you do when a position has 9 siblings, other than the position is somehow more forced. Singular extensions is an idea that has merit, but it doesn't extend the search on a move just because there is one move, it extends when there is one good move and the rest are bad, regardless of whether or not there are 2 others or 20 others to choose from. The positions with the fewest moves are already generally searched more deeply, because such positions have the side on move in check. Positions with few GOOD moves are much more difficult to recognize, however. And when you throw in pseudo-legal moves, this becomes even more problematic.

This is not saying the idea won't work. But evidence suggests that the only way to recognize things that need to be extended is by searching them, not by some static criterion such as "< N siblings vs > N siblings"..
Pruning, reductions, and extensions can still be done using a node-count based approach, by not allocating nodes equally. That's not any more difficult or easier than figuring out what to reduce and extend in a depth-based framework, since all these "extension" and "reduction/pruning" decisions still need to be made statically before searching (TT, killers, SEE, etc). Or they can be done after a first pass, and that can be done with a node-count based approach as well (re-searches, etc).

So here we are talking about "default" depth reductions (always 1 in most modern engines). This is our starting point for how deeply to search each move, before any heuristic is applied. If we have other information (TT, killers, SEE, history, and whatnot), we can tweak each individual depth/node count using that.

If we have no prior information about the moves, the more moves there are, the lower the chance of each one being the best is.

Checking is not the only way to drastically reduce number of moves. Trading queens can drastically reduce number of moves as well, for example.

If we have a root position with 2 legal moves - whether to trade queens or not.

Using a uniform-depth approach, we would be spending much more time on the non-trading sub-tree, even though absent any other information, we don't know that's more likely going to happen.

This would also apply to open vs closed games for example. If you have the option of keeping the game closed vs trading something to open it, does it make sense to spend much more time on the opening subtree, even though keeping it closed is equally likely to be the better option?

Pseudo-legal moves would make it problematic, yes, but that's just an implementation detail. I switched to legal move generation in my engine because when your evaluator is a huge neural network that brings your NPS down to 40k, the legal move generation overhead is negligible .

You mentioned number of successors at a node, This is a more accurate estimate of the effort spent below each root node. And note it does NOT work with today's searches. With all the extensions and reductions and forward pruning that is going on, node counts for each root node (except for the first which is somewhere around 90% of the total effort expended now) vary wildly with the first node having a huge count and the rest having counts that appear to have nothing to do with how close they are to failing high...

As far as spending more time on non-trading sub-trees, that is a characteristic of the game. You only have 15 pieces you can trade, yet the games typically last to 50-60 moves. So 3/4 of the good moves played were not captures. You are therefore spending more of your time on the right branches, not the wrong ones.

You say the more moves there are, the less likely one of them is to be the best move. But you ignore the problem that except at ALL nodes, one of them WILL BE the best move. Do you want to spend less effort on that best move just because it has more siblings than another? The two ideas don't really seem to fit together very well to me.

In fact, quite the opposite seems to work better. Why do you extend a check? Because it is a forcing move and might also push something beyond the horizon? There are lots of such moves we don't recognize, and that is one of the things singular extensions (the real HSU singular extensions) does, focus more effort on important moves and less on the rest.

matthewlai · Post by **matthewlai** » Mon Jul 27, 2015 10:12 pm

bob wrote: You mentioned number of successors at a node, This is a more accurate estimate of the effort spent below each root node. And note it does NOT work with today's searches. With all the extensions and reductions and forward pruning that is going on, node counts for each root node (except for the first which is somewhere around 90% of the total effort expended now) vary wildly with the first node having a huge count and the rest having counts that appear to have nothing to do with how close they are to failing high...

I only mentioned it because you mentioned it. It's not what I am trying to do here.

As far as spending more time on non-trading sub-trees, that is a characteristic of the game. You only have 15 pieces you can trade, yet the games typically last to 50-60 moves. So 3/4 of the good moves played were not captures. You are therefore spending more of your time on the right branches, not the wrong ones.

Most of the positions also don't have a reasonable trade opportunity. Of positions that do, I'm guessing the trade is the best move close to 50% of the time.

You say the more moves there are, the less likely one of them is to be the best move. But you ignore the problem that except at ALL nodes, one of them WILL BE the best move. Do you want to spend less effort on that best move just because it has more siblings than another? The two ideas don't really seem to fit together very well to me.

It makes a lot of sense. Yes, there will be a best move. But if this best move has more siblings, there's lower chance that this move is in the PV, which, in the end, is all we care about.

In fact, quite the opposite seems to work better. Why do you extend a check? Because it is a forcing move and might also push something beyond the horizon? There are lots of such moves we don't recognize, and that is one of the things singular extensions (the real HSU singular extensions) does, focus more effort on important moves and less on the rest.

As I explained in the last post, this is irrelevant. Extensions, reductions, and prunings can all still happen with a node-limited search.

Yes, the idea is to focus more effort on important moves. Of course, we don't know which moves are important before searching, otherwise we wouldn't need to search at all.

So instead, we focus more effort on moves that seem more likely to be important.

With no additional information, when you have 2 children, with 9 and 3 children each, each of the 9 children are much less likely to be important than each of the 3 children.

bob · Post by **bob** » Mon Jul 27, 2015 11:29 pm

matthewlai wrote:
bob wrote: You mentioned number of successors at a node, This is a more accurate estimate of the effort spent below each root node. And note it does NOT work with today's searches. With all the extensions and reductions and forward pruning that is going on, node counts for each root node (except for the first which is somewhere around 90% of the total effort expended now) vary wildly with the first node having a huge count and the rest having counts that appear to have nothing to do with how close they are to failing high...
I only mentioned it because you mentioned it. It's not what I am trying to do here.

As far as spending more time on non-trading sub-trees, that is a characteristic of the game. You only have 15 pieces you can trade, yet the games typically last to 50-60 moves. So 3/4 of the good moves played were not captures. You are therefore spending more of your time on the right branches, not the wrong ones.
Most of the positions also don't have a reasonable trade opportunity. Of positions that do, I'm guessing the trade is the best move close to 50% of the time.

You say the more moves there are, the less likely one of them is to be the best move. But you ignore the problem that except at ALL nodes, one of them WILL BE the best move. Do you want to spend less effort on that best move just because it has more siblings than another? The two ideas don't really seem to fit together very well to me.
It makes a lot of sense. Yes, there will be a best move. But if this best move has more siblings, there's lower chance that this move is in the PV, which, in the end, is all we care about.

In fact, quite the opposite seems to work better. Why do you extend a check? Because it is a forcing move and might also push something beyond the horizon? There are lots of such moves we don't recognize, and that is one of the things singular extensions (the real HSU singular extensions) does, focus more effort on important moves and less on the rest.
As I explained in the last post, this is irrelevant. Extensions, reductions, and prunings can all still happen with a node-limited search.

Yes, the idea is to focus more effort on important moves. Of course, we don't know which moves are important before searching, otherwise we wouldn't need to search at all.

So instead, we focus more effort on moves that seem more likely to be important.

With no additional information, when you have 2 children, with 9 and 3 children each, each of the 9 children are much less likely to be important than each of the 3 children.

Your guess about trading is wrong. IE take a simple 1. e4 e5, 2. Nf3 Nc6, 3. Bb5 a6 4. Ba4. For a while the Bxc6 trade is going to be possible, but it is usually not the best move. If you were going to play Bxf6 you should have done it after a6. Positions where there are no captures are pretty rare. But most of the time a capture is not the best move. You can convince yourself of this by simply modifying your search to count the statistics. The only significant thing about a capture is that it is a forcing move since your opponent has to recapture immediately except in the more unusual cases where he has other intervening moves that are even more forcing on you to prevent your retreating the capturing piece and ending up ahead.

So in short, most of the time a capture is not the best move, otherwise the game would be over quickly.

Your point about "the lower the probability that one of these moves will be in the PV" is either (a) obvious or (b) irrelevant or (c) both of the above. You can't make decisions on how important a move is until you know how important it is. If this node is part of the PV, the whether there is one move or 100 moves to try, why would I alter the way I search here since it is a critical node?

conspiracy number search sort of encapsulates this idea, but it has never really panned out compared to alpha/beta. We search later moves at a ply less deeply (LMR) because be believe in our move ordering and if we haven't gotten a cutoff already, each successive move has a lower probability of causing one and can therefore be searched less deeply with a risk associated with doing this.

Now, to this two moves, one with 3 children, one with 9. You are really going to want to spend more time on the move with 3 children than the one with 9? How do you know which of the two moves will be your PV move? Not knowing that, how can you arbitrarily decide that the one with fewer descendants is better? This sounds like a check loving scheme, since a checking move will always leave the opponent with fewer moves than a non-checking move, yet many realized after a study I did a few years ago, that blindly extending all checks is not such a good idea. IE Qxh6+ gxh6. Might be the only legal way to recapture, but it wins the queen for a pawn.

While the idea should always be investigated, I am skeptical of this being a real way to focus search effort along more important parts of the tree. Just my $.02 of course...

matthewlai · Post by **matthewlai** » Mon Jul 27, 2015 11:49 pm

bob wrote: Your guess about trading is wrong. IE take a simple 1. e4 e5, 2. Nf3 Nc6, 3. Bb5 a6 4. Ba4. For a while the Bxc6 trade is going to be possible, but it is usually not the best move. If you were going to play Bxf6 you should have done it after a6. Positions where there are no captures are pretty rare. But most of the time a capture is not the best move. You can convince yourself of this by simply modifying your search to count the statistics. The only significant thing about a capture is that it is a forcing move since your opponent has to recapture immediately except in the more unusual cases where he has other intervening moves that are even more forcing on you to prevent your retreating the capturing piece and ending up ahead.

So in short, most of the time a capture is not the best move, otherwise the game would be over quickly.

Yes, but captures don't need to be the best move for this scheme to make sense. It only needs to have at least the same probability of being the best move as other moves. In fact, if we use your statistical method, it shows that captures probably have higher chance of being the best move.

A typical game is last 50-60 moves, and each position has an average of 35 legal moves, with only a few (much less than 35/4 ~= 9) captures. If a capture is the best move 1/4 of the time, that means for any given position, a capture is more likely to be the best move than a non-capture.

If that's the case, why do we spend less time on the captures?

Your point about "the lower the probability that one of these moves will be in the PV" is either (a) obvious or (b) irrelevant or (c) both of the above. You can't make decisions on how important a move is until you know how important it is. If this node is part of the PV, the whether there is one move or 100 moves to try, why would I alter the way I search here since it is a critical node?

conspiracy number search sort of encapsulates this idea, but it has never really panned out compared to alpha/beta. We search later moves at a ply less deeply (LMR) because be believe in our move ordering and if we haven't gotten a cutoff already, each successive move has a lower probability of causing one and can therefore be searched less deeply with a risk associated with doing this.

Now, to this two moves, one with 3 children, one with 9. You are really going to want to spend more time on the move with 3 children than the one with 9? How do you know which of the two moves will be your PV move? Not knowing that, how can you arbitrarily decide that the one with fewer descendants is better? This sounds like a check loving scheme, since a checking move will always leave the opponent with fewer moves than a non-checking move, yet many realized after a study I did a few years ago, that blindly extending all checks is not such a good idea. IE Qxh6+ gxh6. Might be the only legal way to recapture, but it wins the queen for a pawn.

Ah! I see where the misunderstanding is now.

No, I would not spend more time on the move with 3 children. I would spend equal time on the move with 3 children (A) as the move with 9 (B).

However, since the time spent on the move with 3 children only needs to be split among the 3 children, each child of A gets more time than each child of B.

A depth-limited search would spend much more time on B. What's the justification for that if, as shown above, A is more likely to happen? Or even if A is as likely to happen? Or even if A is only more than 1/3 as likely to happen?

wgarvin · Post by **wgarvin** » Tue Jul 28, 2015 12:07 am

matthewlai wrote:Now obviously, we don't actually do uniform-depth searches anymore. We do singular extensions, check extensions, recapture extension, one-reply extension, etc, and LMR. All of those techniques have been successfully applied in some program at some point.

What do they all have in common? They all force the search to spend more time on positions with less legal moves (or less reasonable moves).

In fact, I believe this is the most important effect of LMR. LMR makes searches spend much less time on positions with a lot of moves, than it would have otherwise, relative to positions with less reasonable moves.

I interpret what you're saying as "without LMR, wide nodes would take a much bigger share of the time than their narrow neighbors get."

It doesn't make sense to me that a "wide node" subtree and "narrow node" subtree deserve the same amount of time (or even some weighted subdivision of the available time).

Instead, what makes sense to me is that we should try to search every line of play consisting entirely of interesting moves to the same nominal depth, but spend less time on (i.e. search to a shorter depth) lines of play containing one or more un-interesting moves. (Ignore things like check extensions here, and temporarily assume that all "good" lines of play are worth understanding to about the same depth, though that is not obviously true).

LMR takes advantage of the observation that in a "wide node", most of the moves are usually un-interesting, and the most interesting move(s) are usually at the front of the move list. This crude heuristic lets us reduce so many crap moves, that the saved time more than compensates for the occasional mistake where it reduces a good move. But the crude heuristic only works in "wide nodes"--in a "narrow node" with only 2-3 moves, its just too hard to know which ones are un-interesting.

So that's why LMR saves time mainly in positions with "a lot of moves": because those are the only ones where its safe to use the crude heuristic. The narrow nodes contain a bigger fraction of "useful" nodes, but they can have some crappy moves too... but knowing which of them is safe to reduce is a hard problem.

Anyway, I guess I am arguing that "the most important effect of LMR" is just that it reduces a lot of crap moves. Its a really simple heuristic that has performed well in practice. Maybe something better is possible, but given how cheap LMR is, that is a difficult bar to clear.

matthewlai · Post by **matthewlai** » Tue Jul 28, 2015 12:14 am

wgarvin wrote:I interpret what you're saying as "without LMR, wide nodes would take a much bigger share of the time than their narrow neighbors get."

It doesn't make sense to me that a "wide node" subtree and "narrow node" subtree deserve the same amount of time (or even some weighted subdivision of the available time).

Instead, what makes sense to me is that we should try to search every line of play consisting entirely of interesting moves to the same nominal depth, but spend less time on (i.e. search to a shorter depth) lines of play containing one or more un-interesting moves. (Ignore things like check extensions here, and temporarily assume that all "good" lines of play are worth understanding to about the same depth, though that is not obviously true).

Your interpretation is correct. That is the point of debate. The chess community has held that belief for many decades now.

I have provided justification for spending equal time on each branch (vs equal depth). What are the justifications for equal depth?

We are ignoring "uninteresting" moves for now because that's an orthogonal issue and unrelated to the topic being discussed.

If one node has 3 interesting moves, and another has 9, why should we search them to the same depth?

Let's say we have an extreme case where 1 node has 1 possible move, and the other node has 200 possible moves. If we assume the best move is equally likely to be the first node as the second node, does it make sense to spend almost no time on the first node?

LMR takes advantage of the observation that in a "wide node", most of the moves are usually un-interesting, and the most interesting move(s) are usually at the front of the move list. This crude heuristic lets us reduce so many crap moves, that the saved time more than compensates for the occasional mistake where it reduces a good move. But the crude heuristic only works in "wide nodes"--in a "narrow node" with only 2-3 moves, its just too hard to know which ones are un-interesting.

That is LMR as it is introduced, and what it's intended to do. My hypothesis is that the power of LMR is really in its side effect of making time sharing more "fair" between narrow and wide nodes.

bob · Post by **bob** » Tue Jul 28, 2015 2:47 am

matthewlai wrote:
bob wrote: Your guess about trading is wrong. IE take a simple 1. e4 e5, 2. Nf3 Nc6, 3. Bb5 a6 4. Ba4. For a while the Bxc6 trade is going to be possible, but it is usually not the best move. If you were going to play Bxf6 you should have done it after a6. Positions where there are no captures are pretty rare. But most of the time a capture is not the best move. You can convince yourself of this by simply modifying your search to count the statistics. The only significant thing about a capture is that it is a forcing move since your opponent has to recapture immediately except in the more unusual cases where he has other intervening moves that are even more forcing on you to prevent your retreating the capturing piece and ending up ahead.

So in short, most of the time a capture is not the best move, otherwise the game would be over quickly.
Yes, but captures don't need to be the best move for this scheme to make sense. It only needs to have at least the same probability of being the best move as other moves. In fact, if we use your statistical method, it shows that captures probably have higher chance of being the best move.

A typical game is last 50-60 moves, and each position has an average of 35 legal moves, with only a few (much less than 35/4 ~= 9) captures. If a capture is the best move 1/4 of the time, that means for any given position, a capture is more likely to be the best move than a non-capture.

If that's the case, why do we spend less time on the captures?]

I don't follow your question? I treat 'em all the same until the q-search, where there almost all of the time is spent on them.

Your point about "the lower the probability that one of these moves will be in the PV" is either (a) obvious or (b) irrelevant or (c) both of the above. You can't make decisions on how important a move is until you know how important it is. If this node is part of the PV, the whether there is one move or 100 moves to try, why would I alter the way I search here since it is a critical node?

conspiracy number search sort of encapsulates this idea, but it has never really panned out compared to alpha/beta. We search later moves at a ply less deeply (LMR) because be believe in our move ordering and if we haven't gotten a cutoff already, each successive move has a lower probability of causing one and can therefore be searched less deeply with a risk associated with doing this.

Now, to this two moves, one with 3 children, one with 9. You are really going to want to spend more time on the move with 3 children than the one with 9? How do you know which of the two moves will be your PV move? Not knowing that, how can you arbitrarily decide that the one with fewer descendants is better? This sounds like a check loving scheme, since a checking move will always leave the opponent with fewer moves than a non-checking move, yet many realized after a study I did a few years ago, that blindly extending all checks is not such a good idea. IE Qxh6+ gxh6. Might be the only legal way to recapture, but it wins the queen for a pawn.
Ah! I see where the misunderstanding is now.

No, I would not spend more time on the move with 3 children. I would spend equal time on the move with 3 children (A) as the move with 9 (B).

However, since the time spent on the move with 3 children only needs to be split among the 3 children, each child of A gets more time than each child of B.

A depth-limited search would spend much more time on B. What's the justification for that if, as shown above, A is more likely to happen? Or even if A is as likely to happen? Or even if A is only more than 1/3 as likely to happen?

That depends. With LMR one won't spend "much more time on B, because as you step through the moves, you ramp up the reductions.

matthewlai · Post by **matthewlai** » Tue Jul 28, 2015 3:02 am

bob wrote: I don't follow your question? I treat 'em all the same until the q-search, where there almost all of the time is spent on them.

Well, it's a followup question from our discussion earlier. Captures lead to simplifications. In a depth-limited search, less time is spent on simplifying moves. They are treated the same depth-wise, not time-wise or node-wise.

That depends. With LMR one won't spend "much more time on B, because as you step through the moves, you ramp up the reductions.

And that's what I meant when I originally said I believe the power of LMR is in the side effect of shaping the search to be more like what I proposed.

But if our goal is to achieve what I proposed, there are simpler and more theoretically sound ways than LMR (for example, using node-limited search).

LMR doesn't make sense. That has been well established. It just works because it happens to be right more often than it happens to be wrong. It's a very crude technique. It's something that doesn't work in theory but works in practice.

We have a position with 35 legal moves. 5 of them are hash moves, SEE winning captures, and killers. 5 of them are SEE losing captures or losing non-captures. We still have 25 moves left. These 25 moves are ordered pretty much randomly. How does it make any sense that we start reducing and ramp up the reduction halfway through those 25 moves?

Time assignment to children

Re: Time assignment to children

Re: Time assignment to children

Re: Time assignment to children

Re: Time assignment to children

Re: Time assignment to children

Re: Time assignment to children

Re: Time assignment to children

Re: Time assignment to children

Re: Time assignment to children

Re: Time assignment to children