Best move statistics

bob · Post by **bob** » Tue Sep 13, 2016 1:29 am

matthewlai wrote:
bob wrote: Personally, I have been studying this issue for a LONG time now. I have specifically been looking at the history counters and how they can be updated, and how reliable they are. Bottom line is "they are almost random noise". IE better moves do often have better history counter values, but the actual value itself doesn't correlate with anything I can find doing statistical analysis. I'd love to be able to find something that will produce numbers that can directly influence reductions and pruning, but I have found no such approach. The idea that several use today came from my testing this stuff when Fruit first came out (so-called history pruning back then). I hit upon the idea of increasing the counter on a fail high, and then later I added the idea of reducing all the counters for moves that failed low before the move that failed high. But it STILL gives lousy values if you just look at the numbers.

Seems to me there SHOULD be a way to get an idea of how good or bad a move is on an absolute scale, but so far - not that I have found.

I continue to look, but I am afraid it is similar to a Sasquatch hunt.. Lots of places to look, but nothing to find.
It does seem like history counters are basically random noise. I wonder if (back when they helped), they were basically discovering static information that doesn't change between games. In fact, that's what motivated this study - to see if we can get that information in explicit form.

We do already have some very good knowledge - for example, we know that positive and neutral SEE captures are about 8x as likely to be the best move as quiet, non-losing moves, and about the same ratio from quiet non-losing moves to negative SEE moves (except for negative SEE captures, which are better than non-captures).

The problem is there's no good way to use that knowledge.

Right now most engines only use that knowledge indirectly through LMR + move ordering. If you reduce after 3rd move for example, in most positions, you are basically reducing all quiet moves. This makes sense when you have a position with a hash move and maybe 1 or 2 +SEE captures, but doesn't make any sense when all moves are quiet.

It would be more principled to explicitly reduce quiet moves and -SEE moves instead. The only problem is when there are only quiet moves. Then do you reduce them all, or not reduce them? Conversely, if there are multiple +SEE moves, do you search them all to high depth?

There is no clean solution in a depth-based search framework, but can be solved cleanly in a probability-based one, and that's what I am trying to do. I just have to assign each move a score proportional to their probability of being best, and then renormalize so they sum to one. This is a clean, principled, and theoretically sound approach, to achieve more or less what LMR achieves, hopefully more accurately.

Even if we have the knowledge, how would you use them in a depth-based search? Let's say you know there are two classes of moves, A and B, and moves in class A are twice as likely to be best moves than moves in class B, how can you use that knowledge? You can change their depths, but by how much? Let's say you decide to reduce moves in class B by one ply -

If you are doing a depth=3 search and have
ABBB, you can search them to d=3, d=2, d=2, d=2.

Now what if you have AAAB? d=3, d=3, d=3, d=2.

But that doesn't make sense, because the probability of each move in class A being the best is now reduced, just because there are so many moves in class A. Why are we spending more total time in this parent node now, even though there's still just one best move?

LMR is a workaround for this, but it's a pretty crude one. It makes the assumption that the "probability decay" curve in each node to be the same. That is obviously not true. AAAAB and ABBBB should be reduced very differently.

The fact that extending and reducing children affect total time spent in the parent node is a (or even "the") fundamental problem in depth-based search. When we decide a node is worth x amount of time, the distribution of moves below that node shouldn't affect it, since we know there is exactly one best move among them.

I believe the fundamentally correct solution is to stop using depths, and start using probabilities instead. Everything is nice and logical with probabilities. And, as shown by Giraffe, it's perfectly possible to make a chess engine without using depth-based search.

Depth-based search made sense back in the days with no extensions/reductions heuristics, and we had uniform-depth trees. When we start playing with essentially probabilities, probabilistic search is much more natural. We should stop putting more and more patches and hacks into the depth-based framework, and start exploring totally different approaches, even if they may take a while to catch up.

I think the reason for their early success was lack of depth. When Jonathan wrote that paper, he was doing _five_ ply searches to evaluate their effectiveness. Which was taking 2-3 minutes per move or so on hardware he was using back then (very early sun workstations).

As far as LMR goes, I have mentioned to Mike and Tracy for several years now, that we have found something that works for the average case, but not for specific cases. In fact, probably not for any SPECIFIC position at all. That was why I spent so much time this past year playing with as many different history approaches as I could come up with... I'd like to have SOMETHING that controls reductions based on the quality of the move, not where it appears in the search order, even if assisted by pseudo-random history counters. My personal feeling is that there MUST be some way of dynamically assessing each move and making reduction/pruning/extension decisions based on some real measurable quality of some sort.

One clue that history counters are poor is that I experimented with the typical <piece><to> indexing scheme, and also with <piece><to><from> which OUGHT to be more accurate. And ignoring any minor cache issues and looking ONLY at node counts, the 15 bit (latter) approach is no better or worse than the 9 bit (former) approach. To me, that makes no sense and suggests something is wrong with the approach. I'd even thought about adding one more dimension to the array, history[ply][piece][to][from] to keep history counters more local to a specific ply. It showed promise but obviously has a drawback that even killers avoids, if you allow killers at current ply to be tried earlier and later in the tree.

BTW my 15 bit history was really 16 bits, as side-to-move was factored in to keep black and white counters separate.

Sergei S. Markoff · Post by **Sergei S. Markoff** » Wed Sep 14, 2016 4:03 am

Another thing that should be kept in a mind is tree size reduction depending on captured piece and a possible recapture. It's better to try most tree-reducing captures first if there are more than one almost certain cutoff candidates.

I hope one day the tree search will be generalised and everything: depth extension/reducing amount, move order and leaf node consideration will be driven by some neat model)

tysen2k · Post by **tysen2k** » Wed Sep 14, 2016 4:22 am

I think something that may be of use is looking at the cases where the quiet move was the best and identifying key features besides SEE for those moves.

Best move statistics

Re: Best move statistics

Re: Best move statistics

Re: Best move statistics