Time assignment to children

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Time assignment to children

Post by bob »

matthewlai wrote:
bob wrote: First they are not ordered randomly. IE the history heuristic for starters.
And like you said before, history heuristic is basically random noise.
But the main key is that most everyone uses PVS. Which means you search one move or you search all moves. Once you have searched more than one, the probability of this being an ALL node increases rapidly. and if that is the case, reducing effort has no risk at all, except when move ordering breaks down and a very late move suddenly fails high unexpectedly. But since this is rare, the idea of LMR is quite sound. You can prove that for perfectly ordered trees it is just as correct as pure alpha/beta when compared to minimax. So you toss an unfair coin, and most of the time you win. But you will lose some here and there. That's LMR in a nutshell.
It doesn't change the fact that in PVS, once we run out of known good moves (hash move, killers, etc), we know very little about the rest of the moves. The 6th move and the 20th move have about the same (low) probability of failing high. Why do we search the 6th deeper than the 20th? If we want to reduce, wouldn't it make sense to reduce all these moves equally?
So to me, LMR is quite natural and sensible. Most people, given a task to do the same thing N times will find that the last instance of the task is done more quickly and more sloppily than the first. Watch someone paint on house trim for a while, and you'll see what I mean.
Yes, that's more human. But humans are bad at chess. Why are we emulating humans?
moves are ordered in several ways. There are ways to get better history values. One can also use the evaluation (ala' stockfish and others) to order moves based on how they affect the score. I think the basic assumption that "move ordering is random" is not a given.

I ran a few simple tests quite a few months back, where instead of just counting fail highs on the first move, I counted them move by move. The numbers were far from random (i.e. the 20th counter was not the same as the 15th move counter). This done with history + a simple evaluation to sort by.

Move ordering is a topic by itself that deserves a lot of study, since LMR, LMP, et. al. depend on it even more than alpha/beta.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Time assignment to children

Post by mvk »

Rein Halbersma wrote:
mvk wrote:
matthewlai wrote:This is a very high level philosophical idea about shaping the search tree.
I'm have been working on this also. What I try to do is a 3 step algorithm to expand the tree:

Step 1. Back-propagate the probability densities from the leaves (or dropout moves in my case) to the root.
1.a. For this you need to estimate/calculate the probability density function of the max of several gaussians.
1.b. At the same time you can calculate the odds that each move is the best in its own node.

Step 2. Forward-propagate the "PV likelihood" from the root towards the dropout moves using these odds.

Step 3. Select the leaf or dropout move with the highest values found in step 2 and expand those nodes. I expect you can safely expand a whole bunch of them in parallel, to avoid repeating step 1 and 2 too often.

(Then repeat from 1 until you run out of time or until the probability that the 1st root move is best exceeds a certain threshold.)
This scheme (especially the back-propagation of probalities) sounds a lot like some sort of Monte Carlo tree search. Was this a source of inspiration for you?
I figure that many of the well-known algorithms (iterative deepening, singular extensions, LMR, splitpoint selection, easy move, conspiracy search) can be seen as crude-but-fast approximations of this generic algorithm.
Monte Carlo tree search converges to the minimax search result, see e.g. this review paper.
Thanks for the article. At first glance it looks like mc tree search can indeed approximate the probabilistic tree shaping algorithm. I have to read it in more detail though, I just skimmed over it today. I think one difference is that I plan relatively expensive leaf evaluation (e.g. more than a couple of millions of nodes per leaf or dropout move), to drive down the initial sigmas, and to take advantage of the fast-but-crude approximations from the current algorithms. (And to make it scalable in a distributed system.) And after having made that expensive evaluation, why then probe the tree randomly if you can approximate the exact PV likelihoods with comparably little effort instead?

A follow-up idea is to adjust the initial sigmas based on the stability of the terminal searches. There is a lot of room for experimenting there.

The inspiration for this approach is in fact much older than this. It was seeded by a usenet posting many years ago, certainly before 2000. At that time it wasn't feasible yet. The trigger to try a prototype came from working on the opening book.

BTW, I have by now found the relevant literature I was looking for. pdf(max) estimation is a common operation in circuit simulations for signal propagation at gates. A simple algorithm was already described in 1961 and is known as the Clark algorithm. I will use that for the time being.
[Account deleted]
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Time assignment to children

Post by mvk »

matthewlai wrote:The problem with LMR is it reduces some of those crappy moves more than others, with no good reason. It would make more sense to reduce them all the same amount. Perhaps an amount based on how wide the node is? Cue my approach :).
I think LMR's benefits have nothing to do with making the time distribution "more fair", and everything to do with not spending huge amounts of time searching subtrees for crappy moves. :lol:
I conjecture that progressive LMR works because the number of previously fail-low moves in a node correlates with the score gap with the PV, or alpha. A score gap we won't exactly know because alpha-beta is reducing the information we learn about it to a single boolean. The further down the list (and given that all earlier moves failed low!) the higher the likelihood that we are far away, score wise, from the PV. As all we need to "proof" (with the same confidence as the other moves) is that the move is below alpha, we can get away with an increasingly inaccurate search. Or in other words: we can allow for a higher sigma from that search. Hence a progressive reduction within the node is an acceptable method to get us to the next iteration quicker, and thus faster back at searching near the PV again.
[Account deleted]
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Time assignment to children

Post by matthewlai »

bob wrote: moves are ordered in several ways. There are ways to get better history values. One can also use the evaluation (ala' stockfish and others) to order moves based on how they affect the score. I think the basic assumption that "move ordering is random" is not a given.

I ran a few simple tests quite a few months back, where instead of just counting fail highs on the first move, I counted them move by move. The numbers were far from random (i.e. the 20th counter was not the same as the 15th move counter). This done with history + a simple evaluation to sort by.

Move ordering is a topic by itself that deserves a lot of study, since LMR, LMP, et. al. depend on it even more than alpha/beta.
In that case, sure, the ordering is not random. However, using move count to decide on amount of reduction is still crude.

Say you have 2 positions, where 1 has 3 relatively good moves, and 7 relatively bad moves. And the other position has 7 relatively good moves, and 3 relatively bad moves. All based on history heuristics or eval. Why would you follow the same reduction schedule for both?

In any case, this is getting further and further off topic.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Time assignment to children

Post by bob »

You keep mentioning "probability". But I think you are overlooking one key component of probability, that is the probability that an ALL node is really an ALL node. In a strongly-ordered tree, fail-highs happen on the first move about 90% of the time. Which means all the other moves are in the 0.10 probability range for causing a cutoff. If you search 5 moves (good captures, hash move, killers, maybe counter-moves) then the probability this is an all node is close to 0.99, and reducing remaining moves will have no effect on accuracy even though it will have a huge impact on total search time. So the idea of reducing later moves more is purely based on the probability that this specific node is an ALL node, or vice-versa, based on the probability that this is not a CUT node. The more moves you search, the greater that probability and the less likely any of the remaining moves will fail high.

I've been working on move ordering issues here lately, in fact, since LMR can be as aggressive as your move ordering can tolerate...
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Time assignment to children

Post by matthewlai »

bob wrote:You keep mentioning "probability". But I think you are overlooking one key component of probability, that is the probability that an ALL node is really an ALL node. In a strongly-ordered tree, fail-highs happen on the first move about 90% of the time. Which means all the other moves are in the 0.10 probability range for causing a cutoff. If you search 5 moves (good captures, hash move, killers, maybe counter-moves) then the probability this is an all node is close to 0.99, and reducing remaining moves will have no effect on accuracy even though it will have a huge impact on total search time. So the idea of reducing later moves more is purely based on the probability that this specific node is an ALL node, or vice-versa, based on the probability that this is not a CUT node. The more moves you search, the greater that probability and the less likely any of the remaining moves will fail high.

I've been working on move ordering issues here lately, in fact, since LMR can be as aggressive as your move ordering can tolerate...
If all the other moves have 0.1 probability for causing a cutoff, each move would still have a lower probability if there are more of them.

Obviously there are still some cutoffs happening, otherwise we should be able to just not search those moves at all.

So everything we have been talking about still applies, just that all the probabilities are scaled by 0.01.

When we have a position with 30 "other" moves and another position with 10 "other" moves, the chance of the 8th move in each causing a cutoff would still be very different, even though they may both be very small. LMR doesn't take that fact into account.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Time assignment to children

Post by matthewlai »

Since I'm bored and taking a break from coding.

Here is a quick illustration of the difference between node-count-constrained search vs depth-constrained search.

Image

In this example, both approaches will search the red nodes. Depth-based search will search the blue nodes, and node-count-based search will search the yellow nodes.

Numbers are probabilities that they are in the PV. Obviously all children's probabilities have to sum up to their parent's.

With a depth-constrained search, we are skipping the yellow nodes to look at the blue nodes, even though, without any additional information, yellow nodes are much more likely to be useful.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Time assignment to children

Post by hgm »

Proof-number search expands the smallest set of nodes that would be needed to proof the target, irrespective of what ply level these nodes are. This strategy is difficult to use on an alpha-beta tree, however, as normally one of the sides plays only cutmoves, and in the ideal caseonly the cut move is searched in a cut node. So it is really only the side playing the all nodes that has any choice.

Nevertheless, if you have a refutation tree that proves the alternatives to the PV move in a certain PV node ends in leaves with lower score than the PV score... The purpose of doing a better search is to first find a single path of 'all-moves' that makes the current choice of cut-moves fail low, and secondly, show that the cut-node side cannot repair this by changing his cut moves.

Now the second part of this would get a lot easier / have better chance of success if there were only few alternatives to the cut moves. E.g. in the hypothetical situation that you would have a 6-ply fixed-depth refutation tree, where all cut nodes have 10 legal moves, except the three lying along a single branch, which all have only a single legal move. All that is needed to turn the fail low into a fail high is now to expand the leave of that particular branch, and find an evaluation/QS score there that fails high. That fail high would immediately be backed up to the 'root' (actually the PV node from which the refutation tree branched off), as the other side has no opportunity to alter his cut moves anywhere along the branch. To achieve the same starting with any other move would require you to also sprout refutation trees from all 9 alternatives to the cut moves at 3 different levels.

It makes sense to spend more search effort there where it would matter. I.e. search deeper along branchess where the cut moves have few alternatives. Preferably the number of alternatives should be counted amongst the 'reasonable moves'.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Time assignment to children

Post by matthewlai »

And here are some cluster test results. Time control is 40 moves in 1 second, with 0.5s increment. So each game takes about 1 minute each side.

Code: Select all

Rank Name          Elo    +    - games score oppo. draws 
   1 Giraffe dcs    30    6    6  3196   58%   -30   15% 
   2 Giraffe ncs   -30    6    6  3196   42%    30   15% 
This is my first implementation of node-constrained search (ncs). Right now it's 60 +- 12 Elo weaker than the original.

Both are very simple implementations. DCS is done with no reductions and no extensions at all. NCS is done with uniform division of node budget.

The only things beyond basic alpha-beta is PVS (same between DCS and NCS), killers (same), TT (almost same, but in the NCS case uses node count instead of depth obviously), null move (NOT the same, have to decide how much to reduce for NCS), IID (NOT the same, have to figure out how much to reduce), and ID (NOT the same, have to decide how much to increase node budget by every iteration).

I didn't think there would be a big difference since most positions do not have subtrees with very different number of moves, and indeed there wasn't a very big difference.

I believe the current difference is from not having figured out how to do NM, IID, and ID. Right now I just have fixed constants, which I drew out of thin air (0.01 for null move reduction, 0.1 for IID, and 32 for ID). Lot's of room for improvement. If anyone has ideas, please let me know!

I will stick with NCS regardless because my primary motivation for doing this is that it makes the machine learning much easier to reason about. Hopefully ML will bring out more performance.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Time assignment to children

Post by matthewlai »

My suspicion was correct. With null move disabled (for both versions), the ncs version now pulls ahead.

Code: Select all

Rank Name          Elo    +    - games score oppo. draws 
   1 Giraffe ncs    26    6    6  3189   57%   -26   17% 
   2 Giraffe dcs   -26    6    6  3189   43%    26   17% 
Now I just have to figure out how to do NM with a node-constrained search.

But I think this shows that NCS is probably at least as good as DCS, and the standard way to search is really not the only reasonable way.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.