Position learning and opening books

Zenmastur · Post by **Zenmastur** » Mon May 11, 2015 4:32 am

I've have been looking at position learning as a way to improve engine performance. After looking at large game databases it became obvious that general position learning is a waste of time because the average position isn't repeated with sufficient frequency to justify the time and resources needed to store the data. The two obvious exceptions are during the opening and endgames. The low hanging fruit in both areas have already been harvested by 6-man EGTB and easy to implement but "stupid" opening books. I decided that opening books are the least well developed and present a "good" opportunity for improvement.

Since the opening precedes all other phases of the game it would seem to be the most important. If all else is equal, a failure in the opening will have negative consequences for all that follows it. And yet, the engine programming community as a whole has abdicated control over this phase of the game. They have, in effect, relegated control over the most import part of the game to third parties. This is a rather ridiculous state of affairs.

Even more ridiculous is most tournaments start the game at or near the end of the opening with a position that is judged to be equal. This robs the engine of any chance to gain an advantage in the most important part of the game. And then, everyone laments the high number of draws in computer vs. computer matches. I often wonder what these people are thinking. At the start of a normal game white has a small but clear advantage. Given the common practice of starting the engines 4 to 12 moves into the game, do these people really expect a large(r) number of decisive results given that a large and important fraction of the game is outside the control of the engine and the positions selected to start the game are equal?

A way to solve this problem is to have tournaments in which each engine has to play the entire game. If it has an opening book it can use it. Of course most will adopt the standard but "stupid" opening books that are commonly used by most engines. So this change will have an effect on tournaments but these will be minor when first implemented.

I suspect that it would only take one program to adopt a "robust" learning book for things to change. By "robust" I mean a book that takes advantage of all available information to produce the best game results possible while learning. (Notice that learning is the last thing mentioned in this sentence. There is a reason for this.) I believe that such a book can consistently produce higher average game scores than any of the hand-tuned books in use today and provide more variety of play in tournaments than is provided by the hand picked positions commonly used in many tournaments.

The current crop of opening books have several problems. e.g. Human intervention is needed on a regular basis to correct errors, remove sub-standard moves and enter newly discovered lines of play. This process is error prone and requires an inordinate amount of time to enter the data and then test it. With current books, the engines don't have the ability to exit the book early. The book formats used don't allow the engine to analyze the positions and then store the results for future reference. Instead the engine is left with the option of re-searching the position every time it's seen or relying on the book builder to select lines that suit the playing style of the engine and then playing the chosen line until it ends or the opponent makes a move not in the book.

Although most books allow the storage of statistical data, it's inadequate to do the job properly. Example, the win-lose-draw information that is stored is a conglomeration of third party information, the engine results and the results of the engine's opponents. In general these three categories shouldn't be intermingled. Another example, there is no "last game" information stored in the position records or information about the opponents play from the same position etc.

The lack of useful information stored in the book, the ridged and less than optimal move selection algorithms employed and inflexibility of these books make them problematic in practice. This has caused a proliferation of different books for different purposes further compounding the problem of their maintenance.

My take on this is this:

If as much effort/resources were put into designing and producing a "real" opening book and the algorithms to use them as has been put into EGTBs the rewards would be far greater.

Your thoughts on the subject?

Regards,

Zen

Dann Corbit · Post by **Dann Corbit** » Mon May 11, 2015 6:14 am

For each position in the book store the following:
1. Centipawn evaluation of the engine used to analyze the position
2. Depth in plies of the analysis
3. Wins for the position from your database
4. Losses for the position from your database
5. Draws for the position from your database
6. Blitz wins for your chess engine
7. Blitz losses for your chess engine
8. Blitz draws for your chess engine
9. Standard wins for your chess engine
10. Standard losses for your chess engine
11. Standard draws for your chess engine
12. Correspondence wins for your chess engine
12. Correspondence losses for your chess engine
12. Correspondence draws for your chess engine

Whenever you play a game that contains the position, update the stats when it ends. If the position is not in the book, add it along with the evaluation, depth, etc.

You will have two measures of excellence, the chess engine's opinion and the statistical opinion.

Any place that the two disagree is an excellent place for study (e.g. Maybe it deserves a full weekend of engine analysis. Maybe we need to research the position in chess books for the reasoning of GMs, etc).

Every so often, mini-max the whole thing.

I recommend an in-memory transactional database for storage. FastDB and Gigabase are good for simple use, and MonetDB for a more heavy duty approach. The two setbacks of FastDB and Gigabase are that you can only have one writer and that the interface is not standards based SQL or ODBC, etc. On the other hand, maintenance is very simple.

My two cents.

Zenmastur · Post by **Zenmastur** » Mon May 11, 2015 7:47 am

Dann Corbit wrote:For each position in the book store the following:
1. Centipawn evaluation of the engine used to analyze the position
2. Depth in plies of the analysis
3. Wins for the position from your database
4. Losses for the position from your database
5. Draws for the position from your database
6. Blitz wins for your chess engine
7. Blitz losses for your chess engine
8. Blitz draws for your chess engine
9. Standard wins for your chess engine
10. Standard losses for your chess engine
11. Standard draws for your chess engine
12. Correspondence wins for your chess engine
12. Correspondence losses for your chess engine
12. Correspondence draws for your chess engine

It looks like you forgot a few things.

A Zorbist key would be nice. The move the search returned as best. Since this is to be a self-leaner it might be nice to have the engines second choice along with the evaluation and depth of search. It would be helpful to have a metric for the amount of searching done to obtain the given score.

Since this is a learning book it will contain entries that weren't there when the book was created. This can lead to unresolved problems if there is no way to determine what's in the book after it has been modified by the program. Being able to see which board position have been added, why they were added, and what moves are deemed best is needed. This isn't really a problem with current books since most can't add positions after they're built. This requires that the exact board position be stored along with move clock, EP square, castling rights etc. This information would also allow combining of books into one even if the books were used by different engines. It would also allow extracting learned information and/or transferring it to less sophisticated book formats etc.

There are several other things that need to be added as well, but the exact details of what needs to be in the records and how they are stored or retrieved should be dealt with only after it's been determined exactly how it's to be used. Things that need to be known first are, what its move selection algorithm will be based on and details on how and what it will learn.

One of the things I hate is having a large number of books for different purposes/GUI's/Engines. I want one book and one algorithm to handle "all" normal needs including develop, testing, and serious tournaments. In short, I want a "Universal" book.

In addition, how it should behave for testing purposes, casual play, analysis, tournament play etc should be determined in advance. Example, most people deplore repetitive play. There are several ways to solve this problem. One way is to store the last move played by the program and its opponent for every opening position along with a game counter. Then when the engine arrives at a position it will know what it played the last time it saw this position and how many games ago this occurred. The same goes for the opponents responses. It then becomes a very simple matter to adjust variety of play depending on the type of play desired. e.g. if both engines are to play both sides of the same line in successive games this is easily accomplished with out outside intervention assuming both are using a "Universal" book.

I already know what I want out of the book. I'm just trying to find out what everybody else wants or thinks would be a good addition AND what the would and wouldn't give as far as resource to get such a book. e.g. if it takes 50 GB of storage space to get this book is it too much. I'm guessing not since many people have 6-man EGTB's. But you never know unless you ask. I'm sure there are other people who have some very good ideas on the subject some of which I probably haven't though of.

So, I'm open to suggestions and new / different ideas.

Regards,

Zen

Ferdy · Post by **Ferdy** » Mon May 11, 2015 9:02 am

Zenmastur wrote:I've have been looking at position learning as a way to improve engine performance. After looking at large game databases it became obvious that general position learning is a waste of time because the average position isn't repeated with sufficient frequency to justify the time and resources needed to store the data. The two obvious exceptions are during the opening and endgames. The low hanging fruit in both areas have already been harvested by 6-man EGTB and easy to implement but "stupid" opening books. I decided that opening books are the least well developed and present a "good" opportunity for improvement.

Since the opening precedes all other phases of the game it would seem to be the most important. If all else is equal, a failure in the opening will have negative consequences for all that follows it. And yet, the engine programming community as a whole has abdicated control over this phase of the game. They have, in effect, relegated control over the most import part of the game to third parties. This is a rather ridiculous state of affairs.

Creating a decent opening book at this time is easier compared in the past - do you know IDEA of aquarium gui? Testing the engine on different positions is more important to me than finding the position to suit the engine. The third party must have its own reason.

Zenmastur wrote:Even more ridiculous is most tournaments start the game at or near the end of the opening with a position that is judged to be equal. This robs the engine of any chance to gain an advantage in the most important part of the game. And then, everyone laments the high number of draws in computer vs. computer matches. I often wonder what these people are thinking. At the start of a normal game white has a small but clear advantage. Given the common practice of starting the engines 4 to 12 moves into the game, do these people really expect a large(r) number of decisive results given that a large and important fraction of the game is outside the control of the engine and the positions selected to start the game are equal?

A way to solve this problem is to have tournaments in which each engine has to play the entire game. If it has an opening book it can use it. Of course most will adopt the standard but "stupid" opening books that are commonly used by most engines. So this change will have an effect on tournaments but these will be minor when first implemented.

I suspect that it would only take one program to adopt a "robust" learning book for things to change. By "robust" I mean a book that takes advantage of all available information to produce the best game results possible while learning. (Notice that learning is the last thing mentioned in this sentence. There is a reason for this.) I believe that such a book can consistently produce higher average game scores than any of the hand-tuned books in use today and provide more variety of play in tournaments than is provided by the hand picked positions commonly used in many tournaments.

The current crop of opening books have several problems. e.g. Human intervention is needed on a regular basis to correct errors, remove sub-standard moves and enter newly discovered lines of play. This process is error prone and requires an inordinate amount of time to enter the data and then test it. With current books, the engines don't have the ability to exit the book early. The book formats used don't allow the engine to analyze the positions and then store the results for future reference. Instead the engine is left with the option of re-searching the position every time it's seen or relying on the book builder to select lines that suit the playing style of the engine and then playing the chosen line until it ends or the opponent makes a move not in the book.

Although most books allow the storage of statistical data, it's inadequate to do the job properly. Example, the win-lose-draw information that is stored is a conglomeration of third party information, the engine results and the results of the engine's opponents. In general these three categories shouldn't be intermingled. Another example, there is no "last game" information stored in the position records or information about the opponents play from the same position etc.

The lack of useful information stored in the book, the ridged and less than optimal move selection algorithms employed and inflexibility of these books make them problematic in practice. This has caused a proliferation of different books for different purposes further compounding the problem of their maintenance.

My take on this is this:

If as much effort/resources were put into designing and producing a "real" opening book and the algorithms to use them as has been put into EGTBs the rewards would be far greater.

Your thoughts on the subject?

Regards,

Zen

hMx · Post by **hMx** » Mon May 11, 2015 3:29 pm

Zenmastur wrote: So, I'm open to suggestions and new / different ideas.

I very much like the idea of a program, sitting around "idle", and since not fighting against any opponent just now, pondering over its book.

I suggest you carefully tag all entries to your book with the source of the data/information: who did it, when and how. On later integration of newer data, do not throw away old data carelessly. We may later detect that some of our data sources weren't of as good a quality as we thought, so that we want to reconsider their integration.

mvk · Post by **mvk** » Mon May 11, 2015 3:34 pm

Zenmastur wrote: Your thoughts on the subject?

Regards,

Zen

Many thoughts, as it has been a part of the work in my current program. I have posted about the methods before.
http://talkchess.com/forum/viewtopic.ph ... 2&start=16
http://talkchess.com/forum/viewtopic.ph ... 121#550121

In short: the book graph is minimaxed with dropout/exclusion searches in every node. It expands in various ways. Dropout expansion is one. Another one is learning from PGN collections, for example from players playing a line from the current repertoire lines, yet ending up in a poor position. ('Repertoire' is defined by the path errors in each node). Yet another expansion method is learning from own games. Last is opponent adaptation by changing affinity among book moves depending on prior results against the same opponent (least significant of all, but works well on the servers).

Basically, it is a big custom made 'IDEA'-like project in python, holding several million analysed book nodes, and some computers working on updates.

I find it very rewarding to work on the book side-project. It gives plenty of new ideas for the main search as well.

syzygy · Post by **syzygy** » Mon May 11, 2015 10:17 pm

Zenmastur wrote:A way to solve this problem is to have tournaments in which each engine has to play the entire game. If it has an opening book it can use it. Of course most will adopt the standard but "stupid" opening books that are commonly used by most engines. So this change will have an effect on tournaments but these will be minor when first implemented.

I think the problem with this is that creating a good opening book really is an art of its own. In addition, opening books cannot really be "part" of engines as we known them. (Sure, an engine may come with a default opening book for the convenience of the user, but that's it, just for convenience.)

If, say, SF came with a strong opening book, its opponents could download it, find its weaknesses, and easily beat SF in the opening. Whereas TBs contain "absolute truth", the knowledge contained in opening books is very relative.

So the opening book used in a tournament would have to be kept secret and could not be distributed. That means SF could only be entered into a tournament by the person maintaining the "official SF tournament opening book". Users wouldn't be able to run the "full SF" including its opening book. Testers wouldn't be able to test the "full SF" including its opening book.

I suspect that it would only take one program to adopt a "robust" learning book for things to change. By "robust" I mean a book that takes advantage of all available information to produce the best game results possible while learning. (Notice that learning is the last thing mentioned in this sentence. There is a reason for this.) I believe that such a book can consistently produce higher average game scores than any of the hand-tuned books in use today and provide more variety of play in tournaments than is provided by the hand picked positions commonly used in many tournaments.

It would be a nice thing for engines playing tens of thousands of games on a chess server. For a normal tournament a bit of learning won't do much (unless it serves to prevent duplicate losses).

But I do agree there is room for tools that allow users to build and improve opening books in a (semi-)automated manner.

The somewhat depressing thing is that objectively it makes little sense to build a book using anything less than the best engine available. If I build a tool to generate a book for my own engine to play on FICS or so, I'm afraid I am not going to use my own engine to do it. I'd be using SF for the book building and then let my engine play with the result.

syzygy · Post by **syzygy** » Mon May 11, 2015 10:29 pm

So, basically, an engine + opening book is an individual player that cannot / should not be "cloned" anymore. Perfect for online play.

Quite many users would like to have their engine as configurable as possible so that they can "personalise" it. It seems to me the best personalisation possibilities lie in the opening book.

Zenmastur · Post by **Zenmastur** » Tue May 12, 2015 1:59 am

Ferdy wrote: Creating a decent opening book at this time is easier compared in the past - do you know IDEA of aquarium gui? Testing the engine on different positions is more important to me than finding the position to suit the engine. The third party must have its own reason.

I know of IDEA in Aquarium. The problem is that it's not part of the engine its part of the GUI. It's also not transparent, meaning that it's not clear what algorithm is used to grow the tree etc. This doesn't help the engine much, if at all, and it's not automatic. I want something that is specifically design to be linked into the engine when compiled and is "on" all the time. IDEA may be good for humans but that's not what I'm after.

mvk wrote:
Many thoughts, as it has been a part of the work in my current program. I have posted about the methods before.
http://talkchess.com/forum/viewtopic.ph ... 2&start=16
http://talkchess.com/forum/viewtopic.ph ... 121#550121

In short: the book graph is minimaxed with dropout/exclusion searches in every node. It expands in various ways. Dropout expansion is one. Another one is learning from PGN collections, for example from players playing a line from the current repertoire lines, yet ending up in a poor position. ('Repertoire' is defined by the path errors in each node). Yet another expansion method is learning from own games. Last is opponent adaptation by changing affinity among book moves depending on prior results against the same opponent (least significant of all, but works well on the servers).

Basically, it is a big custom made 'IDEA'-like project in python, holding several million analysed book nodes, and some computers working on updates.

I find it very rewarding to work on the book side-project. It gives plenty of new ideas for the main search as well.

I read through the posts you linked and have read Buro's paper. I haven't read Lincke's thesis yet (this may take me a while), so I'm not sure what a dropout/exclusion search or dropout expansion is referring to.

I have read several papers on Multi-Armed Bandit problems (MABs) and I concluded that they are isomorphic to chess positions in general. While a single chess position doesn't constitute an opening book all chess positions with more than a single legal move are, in fact, MAB problems. A rose by any other name...

Therefore its natural to set up an opening book as a series of related MAB's. The only problems I see is that most MAB problems assume no knowledge of the rewards given by each arm (move) and some assume no knowledge of the reward distribution either. For chess positions both of these assumptions are a mistake and will lead to slow or very slow learning and sub-optimal average game scores if not properly addressed.

After reading the other threads, I noticed several people have the false impression that they needed (or wanted) to pre-search the book positions before making them into a book. I'm not sure what they were thinking about, but I don't think this will help and isn't needed. Better to let the engine do this while it's playing real games. This saves a lot of work up front and is one of the reasons to use a learning book.

I also noted you mentioned your move selection algorithm. I'm kind of curios about this. The reason I'm curios is this is the "KEY" element that will "make" or "break" the book IMO. I can't thing of anything worse than a learning book that uses a move selection algorithm that doesn't take advantage of the data it went to trouble of learning. Your book could learn at light speed and it wouldn't make any difference if the move selection algorithm can't turn the learned data in game points. I did some research into MAB move selection algorithms and found this to be a problem with the "standard" indexes used, a solvable problem, but a problem none the less.

Regards,

Zen

Ferdy · Post by **Ferdy** » Tue May 12, 2015 3:24 am

Zenmastur wrote:
Ferdy wrote: Creating a decent opening book at this time is easier compared in the past - do you know IDEA of aquarium gui? Testing the engine on different positions is more important to me than finding the position to suit the engine. The third party must have its own reason.
I know of IDEA in Aquarium. The problem is that it's not part of the engine its part of the GUI. It's also not transparent, meaning that it's not clear what algorithm is used to grow the tree etc. This doesn't help the engine much, if at all, and it's not automatic. I want something that is specifically design to be linked into the engine when compiled and is "on" all the time. IDEA may be good for humans but that's not what I'm after.

For opening preparation be it for engine and human, it is an excellent tool.
But it is indeed a challenge for learning not necessarily for opening moves. There are positions that would look a strong engine like a beginner, and that would be a nice thing to solve and explore, problems related to NMP, fortresses, and bishop of opposite color endings.

Position learning and opening books

Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books