book move selection logic

jdart · Post by **jdart** » Sat Jun 16, 2018 7:40 pm

Arasan has had its own opening book since its inception. Currently I select moves from the book based on a combination of frequency and result information, with some manual tweaking to avoid moves that are actually bad but frequent (usually older lines that have been refuted).

I have been looking more recently at a more formal approach and connecting this with some kind of learning system (I used to have a form of book learning but it is not in current versions and was never very effective).

If you just look at results and not frequency, then of course each position that comes from a game has a set of win/loss/draw counts associated with it. This also has a result score, which you can compute as (wins + (0.5+contempt factor)*draws)/total count.

If you have this info for each move from the position (or at least all moves known to the book, may be a subset of possible moves), then of course a simple approach is to take the move with the best result score. However, there is some uncertainty about the scores: what you really have is an estimate of future scores based on past scores, and another move that now has a lower score might actually give the best score in future.

This can be thought of as an example of the "multi-armed bandit" problem, about which there is a large literature. (Note: technically it might be a case of the "adversarial bandit" problem, since you are playing an opponent who wants to minimize your score; but I assuming for right now game outcomes can just be though of in terms of random probabilities).

One simple and effective approach is based on what is called Thompson sampling. You model the distribution of results (in this case it is a gamma distribution) for each move, and you take a random sample from each distribution. Then you compute the scores for the samples, and you take the move whose sample has the highest score. (This is one algorithm: the other that is frequently cited is called UCB1).

This introduces some variability. You will tend to get the highest-scoring moves played more often, but once in a while the sample from another move's distribution will score higher, and you will select that move. Initially you get more variability but over time, if you keep doing this, the process will zero in on the move that really has the best scoring percentage. There are some nice convergence proofs that show this is an effective approach.

There are a couple of problems though in the chess context. One is that the score is independent of move frequency. If you just turn this algorithm loose and let it run, it may pick something like 1. e3 almost as often as 1. e4, because they might have similar stats and because there won't be a lot of 1. e3 games to go on and that distribution will have a large standard deviation. Maybe not a problem, but if have knowledge of prior games from strong players then move frequency does tend to be an indicator, even if imperfect, of move quality. So maybe you want to add at least a frequency filter when doing real games and drop very infrequent moves. For offline learning you could turn the filter off.

The other problem is as mentioned, eventually the algorithm tends to converge on one move. Suppose 1. e4 scores 55% for White and 1. d4 scores 53%. If enough games are available, the standard deviation of the two distributions will be very small and so the algorithm will start to play 1. e4 almost all the time. This is theoretically optimal, but in practical terms the score is scarcely different between the moves, and for variety and to increase the chances of tripping up an opponent, you might still want to play 1. d4 sometimes, even if this is not theoretically optimal.

To deal with this you could establish a tolerance level for the result difference, and play moves whose difference from the best move is within the tolerance limit, with decreasing frequency as the limit is approached. If the tolerance was zero you'd have pure Thompson sampling and with a non-zero tolerance you'd have a generalization of it with more exploration. (I have not seen this sort of thing described in the literature).

I am not sure how to handle this tolerance statistically. You would want to reduce the frequency of moves played as their (result mean-best result mean)/tolerance moves farther away from zero, but taking into account the standard deviation of the difference. Maybe a Monte Carlo approach?
The difference of results from two moves is distributed as the difference of two gamma distributions and that is hard to compute, although there are approximations to it (for example http://www.ucs.louisiana.edu/~kxk4695/GammaR2.pdf).

Comments appreciated. This is just a dump of my thinking on the problem so far.

--Jon

Ozymandias · Post by **Ozymandias** » Sun Jun 17, 2018 11:33 am

As is often the case when discussing book-building techniques, the data behind (or is it below?) is taken for granted. What does it matter if you choose frequency, score, draw rate or mini-maxed stats, when there's such a variety of players, time controls and playing sites? Not to mention contaminated data.

In other words, unless you have a really good DB, your experiment results will be tainted.

jdart · Post by **jdart** » Sun Jun 17, 2018 4:07 pm

I am very aware that human games in particular are suspect data sources because of the relatively high blunder rate even among high-rated players.

But I am looking to improve over an ad-hoc approach and looking forward to a learning system, which over time should be able to overcome even bad initial data.

--Jon

MOBMAT · Post by **MOBMAT** » Sun Jun 17, 2018 5:39 pm

Personal anecdotal story follows...

Back in the day, I created an opening book for Othello using the min-maxed method. Pain in the rear-end dealing with transposition issues and I eventually dropped the idea and switched to a technique which is today known as a PolyGlot book (position based with weighed move options). This wasn't any better because of the same reasons outlined in this thread, but it was better than nothing. Before I switched, the min-maxed book did come through, though. I had a batch process that would add new games, and because there weren't that many GM level games back then (mid 90's), I would add games played between the top Othello programs from an online Othello server. Logistello (written by Michael Buro) was king of the hill, but there were a few others that played very well. One of them was Keyano (written by Mark Brockington). Prior to a tournament, I added recently played games to my DB and one of them was a game where Log had beaten Keyano. In the tournament, my program YingYang was matched up with Keyano. My opening book stored moves down to 20 remaining pieces (60 moves max in an Othello game), which was the threshold where my program could easily find a forced win. To my amazement, Keyano played the same moves it had played against Log and my program played the entire first 40 moves out of book and then found the win on the next move.

Michael Buro wrote a paper on the topic of opening books back then...
https://skatgame.net/mburo/ps/book.pdf
Perhaps it might be helpful.

Ozymandias · Post by **Ozymandias** » Sun Jun 17, 2018 6:03 pm

jdart wrote: ↑Sun Jun 17, 2018 4:07 pm I am very aware that human games in particular are suspect data sources because of the relatively high blunder rate even among high-rated players.

But I am looking to improve over an ad-hoc approach and looking forward to a learning system, which over time should be able to overcome even bad initial data.

--Jon

As long as you know what you're dealing with... just don't think human games are the most problematic. A relatively simple blunder check can clean a human games DB rather easily, lower quality engine games aren't so easy to "fix". Then there's the problem with playchess games, and last but not least, a quite noticeable amount of correspondence games with wrong results.

jdart · Post by **jdart** » Sun Jun 17, 2018 6:21 pm

I already have a filter program (https://github.com/jdart1/arasan-chess/ ... ychess.cpp) that cleans playchess and correspondence games, and drops those whose end eval is way different from the reported result (mostly these are losses on time).

--Jon

Ozymandias · Post by **Ozymandias** » Sun Jun 17, 2018 10:28 pm

That looks like a useful tool, it would substitute a variety of programs I'm currently forced to use. Any binaries available?

jdart · Post by **jdart** » Mon Jun 18, 2018 12:58 am

Ozymandias wrote: ↑Sun Jun 17, 2018 10:28 pm That looks like a useful tool, it would substitute a variety of programs I'm currently forced to use. Any binaries available?

I put a Windows binary here: https://arasanchess.org/util/playchess.exe.

--Jon

Ozymandias · Post by **Ozymandias** » Mon Jun 18, 2018 6:48 am

Thx, I'll try that at the first opportunity.

book move selection logic

book move selection logic

Re: book move selection logic

Re: book move selection logic

Re: book move selection logic

Re: book move selection logic

Re: book move selection logic

Re: book move selection logic

Re: book move selection logic

Re: book move selection logic