Position learning and opening books

Zenmastur · Post by **Zenmastur** » Tue May 12, 2015 5:57 am

syzygy wrote:
Zenmastur wrote:A way to solve this problem is to have tournaments in which each engine has to play the entire game. If it has an opening book it can use it. Of course most will adopt the standard but "stupid" opening books that are commonly used by most engines. So this change will have an effect on tournaments but these will be minor when first implemented.
I think the problem with this is that creating a good opening book really is an art of its own. In addition, opening books cannot really be "part" of engines as we known them. (Sure, an engine may come with a default opening book for the convenience of the user, but that's it, just for convenience.)

I want to take the "art" out of making books . I want it to be purely algorithmic.
I will agree that the data that the book algorithm uses shouldn't be in the program, but the algorithm itself should be made an integral part of the engine. The end user can use any book positions they want as long as it meets the storage format the algorithm requires. This part won't change.

syzygy wrote: If, say, SF came with a strong opening book, its opponents could download it, find its weaknesses, and easily beat SF in the opening. Whereas TBs contain "absolute truth", the knowledge contained in opening books is very relative.

So the opening book used in a tournament would have to be kept secret and could not be distributed. That means SF could only be entered into a tournament by the person maintaining the "official SF tournament opening book". Users wouldn't be able to run the "full SF" including its opening book. Testers wouldn't be able to test the "full SF" including its opening book.

Clearly you have a perception problem. First, unlike the current crop of books that only contain "playable" positions a learning book is used to track all "seen" positions that meet the book requirements. These requirements can be varied by the book user. So, just because a position is in the book doesn't make it playable across the board.

Second, if you give two different programs the same facilities to learn and then provide them with the same opening book to start with, which they can add moves as they please, the books will be different after some play. Even with different instances of the same program they will not learn the same positions.

Third, the book will be specifically "tuned" to the engine that used it to learn. You could, of course, transfer this information to another program, but it's not likely to be as effective since it hasn't been tailored to that programs style of play. It would benefit the second program but not by as much as it would benefit the original learner.

But a more important aspect is that the books are no longer static. So learning what's in your opponents book from this years TCEC won't guarantee any success six month from now. e.g. Depending on the final design the book will add on average between one half and three positions per game played. The book will be deeper, the state of the game counters will be unknown (this is key during move selection, with out this information an opponent will have little chance of predicting your opening play), and new positions will have been added. So you may know the basics but it will be hard to pin down a moving target. The only moves that will be easy to pin down are those that are considered "best" by a wide margin. These happen to be the moves that are the hardest to play against. Moves from positions where multiple moves have about the same "average" game scores or are relatively new are totally unpredictable by an opponent unless they know the exact state of your book statistics, the algorithm you use to manage it, and the settings used by the engine to adjust the book characteristics. This would be exceedingly hard to accomplish since like the engine the book can't be change, other than by the program using it, during tournament play.

Another aspect is different instances of the same starting book, when used by different instances of the same program, can be easily combined with no loss of validity. So just before a big tournament you collect all the books from your worker bees (i.e. the machines that were used to test all your patches) and combine them. Now you have a book that nobody has ever seen before and it's perfectly tuned to your specific engine.

So tell me, how could an opponent exploit your opening play under these conditions? Even if there were a way to do this (like stealing your book during play), there are very simple remedies for this.

Zenmastur wrote:I suspect that it would only take one program to adopt a "robust" learning book for things to change. By "robust" I mean a book that takes advantage of all available information to produce the best game results possible while learning. (Notice that learning is the last thing mentioned in this sentence. There is a reason for this.) I believe that such a book can consistently produce higher average game scores than any of the hand-tuned books in use today and provide more variety of play in tournaments than is provided by the hand picked positions commonly used in many tournaments.

syzygy wrote:It would be a nice thing for engines playing tens of thousands of games on a chess server. For a normal tournament a bit of learning won't do much (unless it serves to prevent duplicate losses).

It will stop all duplicate losses if so desired.

I'm not talking about a little bit of learning.

But before I get into this further I would like to make a couple of pertinent points. All the learning algorithms I've seen suffer one or more major flaws. One of these is that most are written in an academic environment. The people that write them have one topic on their minds and that's to produce generic learning algorithm that learn quickly when no previous information on the subject is available. These may be really nice algorithms, but they aren't anywhere near optimal for a chess opening book. So using one of theses "standard" algorithms "as is" isn't likely to increase your average game scores.

Instead, I want an algorithm that is optimized for this one very specific task AND it needs to use ALL available information including engines scores and all available game data.

syzygy wrote: But I do agree there is room for tools that allow users to build and improve opening books in a (semi-)automated manner.

The somewhat depressing thing is that objectively it makes little sense to build a book using anything less than the best engine available. If I build a tool to generate a book for my own engine to play on FICS or so, I'm afraid I am not going to use my own engine to do it. I'd be using SF for the book building and then let my engine play with the result.

I want it totally automated and learning to be on all the time even when the book isn't being used to make opening moves. i.e. the game is started from a given position or set of positions.

No advantage will be gained by using some other program to build a book for your engine. The book algorithm will find the errors on its own using any "reasonable" strength engine no other assistance required. It just requires game play, that's all. That IS the point of using this method! A weaker engine may take a bit longer than a stronger engine but this is completely outweighed by the fact that it's the only engine that can tune the book properly for it's style of play.

syzygy wrote:So, basically, an engine + opening book is an individual player that cannot / should not be "cloned" anymore. Perfect for online play.

Yep! The book will contribute to the engines personality.

syzygy wrote:Quite many users would like to have their engine as configurable as possible so that they can "personalise" it. It seems to me the best personalisation possibilities lie in the opening book.

Having a learning book and having a customizable book aren't mutually exclusive so I don't see this as a problem.

Regards,

Zen

Ferdy · Post by **Ferdy** » Tue May 12, 2015 1:29 pm

Zenmastur wrote: Even more ridiculous is most tournaments start the game at or near the end of the opening with a position that is judged to be equal. This robs the engine of any chance to gain an advantage in the most important part of the game.

This is a ridiculous statement, that position is used to test the engine with that specific position. There is no intention of robbing here. Some tournaments are conducted with position starting color reversed to make it more fair.

mvk · Post by **mvk** » Tue May 12, 2015 3:19 pm

Zenmastur wrote:Therefore its natural to set up an opening book as a series of related MAB's. The only problems I see is that most MAB problems assume no knowledge of the rewards given by each arm (move) and some assume no knowledge of the reward distribution either. For chess positions both of these assumptions are a mistake and will lead to slow or very slow learning and sub-optimal average game scores if not properly addressed.

MAB addresses the exploration/exploitation trade-off. But the resources I use for book expansion are not available for public play in the first place. Only a small fraction of the games used for learning come from public play (either own server games, or external PGNs). So I'm not sure how MAB fits in that as a primary solution. For a trade-off to make sense there must be something to exchange for another.

After reading the other threads, I noticed several people have the false impression that they needed (or wanted) to pre-search the book positions before making them into a book. I'm not sure what they were thinking about, but I don't think this will help and isn't needed. Better to let the engine do this while it's playing real games. This saves a lot of work up front and is one of the reasons to use a learning book.

I can explain what I'm thinking if that is your interest:

Everywhere along the book line, and especially near the tips, I want some confidence that the book move is at least as good as what I can calculate on the fly during the game. Otherwise I make weaker moves near the end of the book line. The obvious way to guarantee that is to have performed that search upfront yourself. Another way to achieve something of the same kind is to filter the games from strong playchess PGNs, because then you piggy-back on somebody else's calculations. (A horrible way is to use human PGNs, because those are riddled with blunders.)

In my system I have simply chosen to rely only on own searches. That means I choose to throw away game statistics and game results, and I only work with the raw materials, positions, and add my own searches to order them. (And RdM is certainly right that objectively it is better to use the best engine for that instead of your own. A weak defence is that in my case, as white, I know for sure that I can't get out of book with a negative score and then go for a draw by repetition right after). There is probably something to learn by comparing statistics to scores, as Dann suggests, but I just haven't gone there yet.

The distinction more interesting to me is if the learning is done only in the public space (where results matter) or also in private. Apart from that, I don't care if a result (= a candidate sequence of positions for book inclusion) comes from an actual game or from an another algorithm. A result is a result. I'm pretty sure that learning only from public games is not the best use of resources. In other words, I wouldn't expect to be able to remove dropout expansion and private games and then learn faster. On the contrary. Not because opponents can see the public games and learn from that (I care a bit about that, but not too much), but because public games are a very limited resource. There are much more moves I can add in private than if I have wait for them to appear online for the first time.

I also noted you mentioned your move selection algorithm. I'm kind of curios about this. The reason I'm curios is this is the "KEY" element that will "make" or "break" the book IMO.

The path-error based repertoire automatically adapts as a result of expanding the book, if that is what you mean.

IMO it makes not much sense to compare static books (except in the trivial cases), but more so to compare book systems as a whole.

Your book could learn at light speed and it wouldn't make any difference if the move selection algorithm can't turn the learned data in game points

You lost me there. If it doesn't work, then it didn't learn... Hence the system view.

I'm currently in the midst of quantifying some of the book improvement results. One coarse measurement I did last month is where I pitted my engine with different versions of the book (from 4 different years) against each other, and one without book. No opponent learning was active in these test matches, just a static book. 'b<year>' means that a book from that year was used. Results are:

Code: Select all

Rank Name                Elo    +    - games score oppo. draws 
   1 rookie3.9x1-b2015     0   12   12  2000   54%   -22   48% 
   2 rookie3.9x1-b2014   -10   10   10  3000   55%   -41   47% 
   3 rookie3.9x1-b2013   -34    9    9  4000   57%   -79   44% 
   4 rookie3.9x1-b2012   -88   10   10  3000   50%   -88   43% 
   5 rookie3.9x1-bnone  -219   13   13  2000   26%   -61   36%

This encourages me to believe that there is some kind of progression in the books, but also that that effect starts to taper off.

Zenmastur · Post by **Zenmastur** » Wed May 13, 2015 1:04 am

Zenmastur wrote: After reading the other threads, I noticed several people have the false impression that they needed (or wanted) to pre-search the book positions before making them into a book. I'm not sure what they were thinking about, but I don't think this will help and isn't needed. Better to let the engine do this while it's playing real games. This saves a lot of work up front and is one of the reasons to use a learning book.

mvk wrote:I can explain what I'm thinking if that is your interest:

Everywhere along the book line, and especially near the tips, I want some confidence that the book move is at least as good as what I can calculate on the fly during the game. Otherwise I make weaker moves near the end of the book line. The obvious way to guarantee that is to have performed that search upfront yourself. Another way to achieve something of the same kind is to filter the games from strong playchess PGNs, because then you piggy-back on somebody else's calculations. (A horrible way is to use human PGNs, because those are riddled with blunders.)

I understand the desire to be confident that the book won't lead the engine astray. My solution to this problem is simple, unlike current books, never blindly follow book lines.
Until the move is fully vetted by the engine using the book, i.e. it's no longer in the early phases of active learning, search results are always checked before any decision about a move is made. This requires that engine search results be stored in each position record. These should always be consulted first. This does take some time the first few times the position is seen by the engine but after that it will revert to using game statistics. The point to doing it this way instead of pre-searching all moves to some depth is that the engine may never encounter some of these position. In which case the time was wasted. Yes, it may save the engine some time if this is done in advance but if the book is very large this becomes very time consuming. In addition, moves that are added after the book is created won't have the luxury of having this done before hand. Doing this on the fly distributes the time required across many, many games and removes a lengthy step in the book making process. I think many people would avoid this type of book making due to lack of time or appropriate computer resources. They will simply choose a different type of book.

mvk wrote:In my system I have simply chosen to rely only on own searches. That means I choose to throw away game statistics and game results, and I only work with the raw materials, positions, and add my own searches to order them. (And RdM is certainly right that objectively it is better to use the best engine for that instead of your own. A weak defence is that in my case, as white, I know for sure that I can't get out of book with a negative score and then go for a draw by repetition right after). There is probably something to learn by comparing statistics to scores, as Dann suggests, but I just haven't gone there yet.

The distinction more interesting to me is if the learning is done only in the public space (where results matter) or also in private. Apart from that, I don't care if a result (= a candidate sequence of positions for book inclusion) comes from an actual game or from an another algorithm. A result is a result. I'm pretty sure that learning only from public games is not the best use of resources. In other words, I wouldn't expect to be able to remove dropout expansion and private games and then learn faster. On the contrary. Not because opponents can see the public games and learn from that (I care a bit about that, but not too much), but because public games are a very limited resource. There are much more moves I can add in private than if I have wait for them to appear online for the first time.

I see all sources of game information as useful for learning. I don't want to disregard any of it, if it can help reduce the need for learning or speed the learning process. These include human and computer game databases, online and self play, weekly TWIC files or similar, and game analysis done by the engine. If all available information is used the amount of learning and the time that's required can be considerably reduced. i.e. by an order of magnitude as compared to more "standard" approaches. You just need to be single minded about the goals and persistent in there pursuit.

A good example of using all available information: I've been analyzing a game using Stockfish since Saturday evening. I've been concentrating on the opening in particular because there were several errors made in it. After one of the players erred it's not clear what the best line of play is. I'm doing a complete "line of play" analysis to a depth of ~45 plies. i.e. every move in the "best" line of play is analyzed to a depth of 45 plies. Since this analysis starts at move 6 of the advance variation of the Caro-Kann there is no reason that this information shouldn't be collected and stored in a book for later use. This is true even though I'm not playing a game and neither is the engine, I'm just doing some analysis. This is one of the reasons I want the book to be integrated into the engine and always "on". Right now this isn't the case so when I'm done with the analysis I have to manually enter it into a book.

People user their engines to do analysis all the time. In fact this is probably the number one use of chess engines. Much of this is done in the opening phase of the game. So why not kill two birds with one stone. Most people are going to be doing the analysis anyway so your engine might as well take advantage of all the "free" work it can.

Zenmastur wrote: I also noted you mentioned your move selection algorithm. I'm kind of curios about this. The reason I'm curios is this is the "KEY" element that will "make" or "break" the book IMO.

mvk wrote:The path-error based repertoire automatically adapts as a result of expanding the book, if that is what you mean.

IMO it makes not much sense to compare static books (except in the trivial cases), but more so to compare book systems as a whole.

I think I found the answer I was looking for when I read chapter three of Lincke's thesis.
This explained a lot about why you made some of the choices.

Zenmastur wrote:Your book could learn at light speed and it wouldn't make any difference if the move selection algorithm can't turn the learned data in game points

mvk wrote:You lost me there. If it doesn't work, then it didn't learn... Hence the system view.

Not true. It's entirely possible for an actively learning book to spend too much time learning and too little time applying what it has learned. This is one of the problems I ran into when using the somewhat standard UCB MAB indexing equations I=R(i)+sqrt(2*Ln(T)/N(i)).

If this equation is used to index chess moves it will grossly over learn. i.e. it spends too much time exploring and too little time exploiting what is already known about the position and what it has learned. This results in reduced average games scores compared to an indexing equation that has been optimized specifically for the problem of chess opening books.

The point I'm making is this: The move selection algorithm in any book is a key element in converting the information in the book into better game scores. If it can't convert the information in the book to higher average games scores it makes no difference how fast the book learns or how much information is already in the book the system as a whole will fail to meet expectations. So, even if your book is static during normal play and all moves have been adequately vetted, selecting moves randomly isn't likely to produce stellar performance.

mvk wrote:I'm currently in the midst of quantifying some of the book improvement results. One coarse measurement I did last month is where I pitted my engine with different versions of the book (from 4 different years) against each other, and one without book. No opponent learning was active in these test matches, just a static book. 'b<year>' means that a book from that year was used. Results are:
Code: Select all
Rank Name                Elo    +    - games score oppo. draws 
   1 rookie3.9x1-b2015     0   12   12  2000   54%   -22   48% 
   2 rookie3.9x1-b2014   -10   10   10  3000   55%   -41   47% 
   3 rookie3.9x1-b2013   -34    9    9  4000   57%   -79   44% 
   4 rookie3.9x1-b2012   -88   10   10  3000   50%   -88   43% 
   5 rookie3.9x1-bnone  -219   13   13  2000   26%   -61   36% 
This encourages me to believe that there is some kind of progression in the books, but also that that effect starts to taper off.

Well, it may be that you have reached a level of expansion that is sufficient to defeat any attempt to take you out of book early. I would note that there is nothing in the dropout expansion approach that would preclude you from adopting a more traditional learning format in addition to what you have already done.

Regards,

Zen

Zenmastur · Post by **Zenmastur** » Wed May 13, 2015 1:36 am

One thing I forgot to mention. 219 ELO over a no book configuration is nothing to scoff at. I suspect that an additional 100 to 200 ELO is possible with additions to the learning features already in your book.

Regards,

Zen

Dann Corbit · Post by **Dann Corbit** » Wed May 13, 2015 11:41 am

hMx wrote:
Zenmastur wrote: So, I'm open to suggestions and new / different ideas.
I very much like the idea of a program, sitting around "idle", and since not fighting against any opponent just now, pondering over its book.

I suggest you carefully tag all entries to your book with the source of the data/information: who did it, when and how. On later integration of newer data, do not throw away old data carelessly. We may later detect that some of our data sources weren't of as good a quality as we thought, so that we want to reconsider their integration.

I call this idea "active book" because the chess engine is spending as much energy during opening book time as when playing. I think that every pv node should be stored in a database because pv nodes are so rare. They are, in fact, the thing searched for during normal game play (as a side effect).

And yet we recalculate them all from scratch every time.

mvk · Post by **mvk** » Wed May 13, 2015 7:41 pm

Zenmastur wrote: After reading the other threads, I noticed several people have the false impression that they needed (or wanted) to pre-search the book positions before making them into a book. I'm not sure what they were thinking about, but I don't think this will help and isn't needed.

I understand the desire to be confident that the book won't lead the engine astray. My solution to this problem is simple, unlike current books, never blindly follow book lines.
Until the move is fully vetted by the engine using the book, i.e. it's no longer in the early phases of active learning, search results are always checked before any decision about a move is made. This requires that engine search results be stored in each position record. These should always be consulted first. This does take some time the first few times the position is seen by the engine but after that it will revert to using game statistics. The point to doing it this way instead of pre-searching all moves to some depth is that the engine may never encounter some of these position. In which case the time was wasted.

Thanks for your explanation, it helps to understand your proposal. I hope you succeed with the project.

Two things about this:

1. In my case the book is always the a fully negamaxed graph with dropout searches in every node. But just because a position is present in the book/graph/file, it doesn't mean it is always associated with a deep dropout search. Only the repertoire nodes, which is the subset in which the program can end up using the move selection algorithm, needs deep dropout searches, for the reason I gave. But non-reportoire moves in the file will usually be covered by significantly shallower searches. (Unless they have been repertoire in the past. This is a bit of a simplification and I won't bore anyone with the edge cases of this.)

Also, in expanding the book, that is adding new nodes, the effort is focussed on nodes that likely affect the repertoire. The distribution of learning effort is not uniform across the book. Naturally, the effort is mostly on repertoire nodes themselves and on nodes "nearby", evaluation wise. You can see that of the methods I mention, they all in one way or another have this focus on nodes that are likely to appear on the board. Wether you do that by game playing, or by dropout expansion, that that doesn't matter in principle. (In fact, I believe those two are complementary methods, because game playing can bias the expansion to the preference of your test opponents, but it will focus on the moves your program likes to play, while dropout expansion helps to anticipate moves from public opponents you don't test against in private).

For example, there is a rather extensive subset of nodes present in my book for the Najdorf Poisoned Pawn variation, which happens to evaluate, after negamax, as 0.000. Partly due to this score, the NPP doesn't end up in either White's or Black's repertoire as seen from the starting position. But if I would remove those nodes from the graph, that might change, because the negamax result will change.

If a node hasn't been searched at all (I leave it open now where that search may have been done, online or offline), it isn't in the book/graph/file. I confess I still see little point to add moves to the book file if you don't have the intention to do anything with it.

Which leads me to the second point:

2. The primary reason I don't place such unsearched nodes in the graph/book, is not mainly disk space concerns (which are minor), but more because I perform negamax in memory. For that I want the book to be able to fit in memory for simplicity. My thinking is that if a book doesn't fit in memory, it is too big anyway. I need negamax to keep the repertoire updated, so I can't just leave that out. So for every update, which I do in batches, I load the entire graph into ram, do the operations I want to do, and write it back to disk. Graph expansion, adding new nodes, works the same: load the graph, add the new nodes, find all edges (there might be new transpositions), and search dropout moves and scores with the new nodes, negamax all, update the path errors and save. Such cycle may or may not change the repertoire subset from which the program then can select its moves from.

Given that, may I ask what in your proposal is the essential difference between "a book move that is still in the early phases of learning" (and therefore won't be blindly played if encountered during play), and a "a candidate move that is not in the book file at all" (for example, the move that the program finds on its own, a novelty, so to say?). Is there a difference in treatment in your scheme between such moves? From your description it isn't really clear. Or is it just a carrier of statistical information?

Let's say that how I do that is that I keep unsearched moves stored in the PGN archives, separate from the graph/book, available when one of the PGN-based expansion algorithms sees fit for search and inclusion in the graph. From what I can see the difference in approach doesn't seem fundamental?

Zenmastur wrote:This is one of the reasons I want the book to be integrated into the engine and always "on". Right now this isn't the case so when I'm done with the analysis I have to manually enter it into a book.

This other main idea, of harvesting search results form other places than dedicated searches, is fine of course. I think that is an orthogonal concept. My system doesn't exclude that possibility for example, I just don't do it now. At any point I could harvest engine logs and decide if what I see there can be used as new book nodes. (In practice I would still want to perform dropout searches, because every inner node has at least two moves scores of course as negamax doesn't work on a book graph otherwise. During game play you only search the best move. This is probably the reason I haven't gone in that direction.)

Concluding, as far as I can see, placing moves in the book file that you don't use during play seems merely a semantic difference, only observable in file size. Not a difference observable in learning capabilities of the system.

Zenmastur wrote:I would note that there is nothing in the dropout expansion approach that would preclude you from adopting a more traditional learning format in addition to what you have already done.

I agree. I'm not tied to one expansion algorithm, not even to one move selection algorithm (although I will likely will stick to the path error mechanism for the latter). I was thinking for example of doing some kind of "trap detection", and steer the affinity of repertoire moves towards the more difficult parts of the tree using that.

Zenmastur wrote:One thing I forgot to mention. 219 ELO over a no book configuration is nothing to scoff at. I suspect that an additional 100 to 200 ELO is possible with additions to the learning features already in your book.

Thank you! The cynic in me just attributes that number to the quality of opening play of the underlying engine

Zenmastur · Post by **Zenmastur** » Fri May 15, 2015 1:30 am

mvk wrote:
Zenmastur wrote: After reading the other threads, I noticed several people have the false impression that they needed (or wanted) to pre-search the book positions before making them into a book. I'm not sure what they were thinking about, but I don't think this will help and isn't needed.

I understand the desire to be confident that the book won't lead the engine astray. My solution to this problem is simple, unlike current books, never blindly follow book lines.
Until the move is fully vetted by the engine using the book, i.e. it's no longer in the early phases of active learning, search results are always checked before any decision about a move is made. This requires that engine search results be stored in each position record. These should always be consulted first. This does take some time the first few times the position is seen by the engine but after that it will revert to using game statistics. The point to doing it this way instead of pre-searching all moves to some depth is that the engine may never encounter some of these position. In which case the time was wasted.
Thanks for your explanation, it helps to understand your proposal. I hope you succeed with the project.

One of the main tenants I adopted very early when I was considering this design project was this: Because there will always be more work to be done than there are resources or time with which to do it, I want to defer work as long as possible or completely eliminate it in some cases. This is why pre-searching isn't used. They will get searched eventually, if and when they are seen. This eliminates any waste of effort. It has another effect. Positions that are very commonly seen will be searched deeper than most. They can be searched deeper than even the longest time controls would normally allow. This is great for removing even very subtle tactical errors from your book. Unfortunately it doesn't do much else. Although somewhat deeper searching should be done on very common positions, the time used for very deep searches is probably better spent else where.

Learning by most algorithms is slow. Sometimes it appears absolutely glacial in it's pace. This is a problem since the average branching factor of the search tree is 30+. I don't think the algorithms are necessarily to blame for slow learning. The major problem is that most of the useful information that could be used to learn is simply discarded because it occurs outside the domain of the learning algorithm. The implementations don't use all available information to adjust what the learning algorithm is processing. This causes a lot of time to be wasted learning information that is either worthless, already known, or has been previously learned and discarded. I want to eliminate this wasteful behavior.

A lot of what needs to be learned about a position is purely tactical in nature. Simply storing engine evaluations will remove about 90% of the need to learn. This is at least three orders of magnitude faster than trying to learn the same information by playing games and collecting statistics about them. A good example is when one side is threatened with great loss without compensation and does nothing to rectify the situation. The algorithm shouldn't voluntarily engage in testing this type of position by playing games so it can gain statistical knowledge about what will happen. There is already good statistical evidence that demonstrates that loss of material without compensation is bad so there is no reason to re-demonstrate this on every position in the opening book by playing games from bad positions so we can collect statistical data on the outcomes. And yet, this is exactly what will happen if the implementation blindly submits all positions to the same learning process.

I've tested several different MAB indexing algorithms and they all do about the same thing in this regard. They will play all legal moves to see what happens in each case. The frequency of the bad moves drops quickly as more games are played but by this time the damage is already done. The average game score has dropped and a large amount of time was wasted on games that no computer would ever play if it didn't have a book or had a more conventional book.

The net effect of all this wasted effort is that USEFUL learning is slowed because USELESS leaning will be dominating the limited resources available. The question is what's the best way to fix this problem?

One might think pre-filtering the moves that the statistical learning algorithm gets to see is a good method. I'm not so sure this is best. The UCB MAB indexing algorithm can be absolutely fantastic at providing variety to play and it does a very good job of balancing exploration with exploitation in most cases.

The underlying problem is the MAB indexing algorithm knows nothing about chess positions so it doesn't know that loss of material is bad. Further, it sees every position as a completely different problem about which it has no knowledge unless it has statistical information available. The only statistics it understands are W-L-D game statistics, the total number of times the position has been encountered and the number of times each individual move has been played from the position. So another way to solve the problem is to generate W-L-D statistics for each legal move on the fly by using the engines search results and a logistics equation that matches the engines performance curve under the full range of advantages and disadvantages. These "false" statistics would be added to any real statistics already gathered. The net effect of this ruse is to stop the playing of useless exploratory games caused by the algorithms lack of chess knowledge. Since there are considerably more bad move than good ones in most positions this will cause a large shift in where the learning takes place, and the "apparent" rate of learning for the much smaller pool of candidate moves should increase proportionately. The average game score produced will increase accordingly.

mvk wrote:Two things about this:

1. In my case the book is always the a fully negamaxed graph with dropout searches in every node. But just because a position is present in the book/graph/file, it doesn't mean it is always associated with a deep dropout search. Only the repertoire nodes, which is the subset in which the program can end up using the move selection algorithm, needs deep dropout searches, for the reason I gave. But non-reportoire moves in the file will usually be covered by significantly shallower searches. (Unless they have been repertoire in the past. This is a bit of a simplification and I won't bore anyone with the edge cases of this.)

I thought about negamaxing the engines search results to obtain better scores for each node. I think Bob Hyatt tried this with out any additional search information and he described the results as "iffy". So it's considerably more work because you need search results from a lot of descendants. These may not be available even if your storing all your searches. A modified version of LMR could be used to lessen the burden somewhat. But even with this addition I'm wondering if it's worth the effort. Most book lines aren't that deep the average is between 15 and 25 plies (depending on the size of the book of course). A single search can be deeper than this. The nodes that will benefit the most from this are the ones nearest the root. The positions that need the most help are the ones near the leaf nodes. There's not much you can do for these other than search deeper since they will have very few layers of descendants in the tree. In addition, these nodes are seen considerably less often on average and there is a lot more of them so searching them much deeper is going to require a lot of effort.

The other problem is that the evaluations returned from searches are only going to have a dominating influence on the move selection for a short period of time if statistics are initially lacking. This influence wanes as more statistics become available. After that happens they won't have much influence unless a radical change in the engines evaluation occurs.

What REALLY needs to be negamaxed IMO is the game statistics. This solves a longstanding problem when using games statistics for move selection. When a new move is found deeper in the tree that refutes a line of play the program won't be fooled by the older and better established game statistics that show that a line of play is sound. It will immediately head for the less well established but higher scoring new move that seems to refute the older lines of play. It will do this is because the MAB index for this move will suddenly jump because it hasn't been played enough compared to the other more established moves. This is because of the large imbalance in the game counts between the more established moves and a new move with few games in its statistics. The MAB indexing algorithm will give this move a much higher index because it hasn't been played enough considering its score. The change in behavior this causes is huge! It means that, unlike older books where you had to repeatedly beat the hell out of a well established move before the statistics would change enough to make the program try something else, the learning book will gladly change it's direction on a dime.

This could potentially cause a MAJOR paradigm shift in the way engines play the openings. It will make the book have a massive affinity for playing new theory if it's good for the engine. It won't necessarily avoid old lines that it now believes are worse but will play them considerably less often at least until the MAB indexes are brought back into balance though sufficient play of the new lines. I think this type of behavior has been heretofore unseen in computer opening play. It means that computers would have a much greater influence on current opening theory. Currently they contribute little if anything to opening theory because of the screwed up way the open is handled.

mvk wrote:Also, in expanding the book, that is adding new nodes, the effort is focussed on nodes that likely affect the repertoire. ...

Precisely the reason the game statistics should be negamaxed. One note I should add though, if you decide to try this, you don't actually want to move the game statistics from one node to the other and you definitely don't want to overwrite the "real" statistics for any position. Instead you simply negamax a pointer to the node that contains the pertinent statistics.

mvk wrote:...Given that, may I ask what in your proposal is the essential difference between "a book move that is still in the early phases of learning" (and therefore won't be blindly played if encountered during play), and a "a candidate move that is not in the book file at all" (for example, the move that the program finds on its own, a novelty, so to say?). Is there a difference in treatment in your scheme between such moves? From your description it isn't really clear. Or is it just a carrier of statistical information?

Well, I'm not sure exactly what you are asking since our terminology differs significantly, but I'll give it a shot. All moves that are not leaf nodes are candidate moves. If you just feed all legal moves to the UCB MAB indexing equation it will spit out a set of indexes. The one with the highest index is the one played. It will play every move with some non-zero probability. Even moves that lead to it getting mated (assuming this isn't inhibited in some way). Moves that produce good average game scores get played considerably more often of course. And moves that are winning are played with high probability. The equation can be tuned. It can be made to be VERY greedy or it can be made to explore all moves with equally high probability or anything in between these two extremes. Since the leaf nodes don't have sufficient statistical information stored to be useful for decision making the engine just does a normal search and makes a move assuming a search result of sufficient depth isn't available in that node. The primary difference is that while in-book the book routine calls the search routine and then records the appropriate information in the leaf record before making the move. If this move pushes the leaf node over the threshold then a new record is added to the main part of the book. Leaf nodes are stored separately, in a different file, and have a different record format than main book records. When a new record is added to the book, by this method or any other, all of it's descendants get added to the leaf node file. These records are small and are stored as a group to save space.

The only real difference between early stages of learning and the later stages are which statistics dominate the move selection process. Third party statistics, if they exist for a particular node, are discounted in such a way that they may dominate the decision making process early on, but will not dominate it for long. As more games are played the balance shifts to favor the engine statistics. If all statistics are sparse then the engines search evaluations will be used to generate false game statistics. (It actually does this regardless..) These are scaled so they are relatively small in magnitude so they will only dominate the decision process for a short time while enough real statistics are generated.

I'm not sure if this answers your questions or not.

mvk wrote:Let's say that how I do that is that I keep unsearched moves stored in the PGN archives, separate from the graph/book, available when one of the PGN-based expansion algorithms sees fit for search and inclusion in the graph. From what I can see the difference in approach doesn't seem fundamental?

I'm not sure I completely understand, but assuming I do, I'm not sure that I agree.

Zenmastur wrote:This is one of the reasons I want the book to be integrated into the engine and always "on". Right now this isn't the case so when I'm done with the analysis I have to manually enter it into a book.

mvk wrote:This other main idea, of harvesting search results form other places than dedicated searches, is fine of course. I think that is an orthogonal concept. My system doesn't exclude that possibility for example, I just don't do it now. At any point I could harvest engine logs and decide if what I see there can be used as new book nodes. (In practice I would still want to perform dropout searches, because every inner node has at least two moves scores of course as negamax doesn't work on a book graph otherwise. During game play you only search the best move. This is probably the reason I haven't gone in that direction.)

Concluding, as far as I can see, placing moves in the book file that you don't use during play seems merely a semantic difference, only observable in file size. Not a difference observable in learning capabilities of the system.

All moves that are in the main book file have a non-zero probability of being played if no special code is inserted dictating otherwise. All the move statistics matter, even the ones in moves that aren't being played very often. If you change or remove any of these statistics it will have an immediate effect on all future move selections until the MAB indexes are brought back into balance. I can give you a real example if you need further explanation of why and how this happens.

Zenmastur wrote:I would note that there is nothing in the dropout expansion approach that would preclude you from adopting a more traditional learning format in addition to what you have already done.

mvk wrote:I agree. I'm not tied to one expansion algorithm, not even to one move selection algorithm (although I will likely will stick to the path error mechanism for the latter). I was thinking for example of doing some kind of "trap detection", and steer the affinity of repertoire moves towards the more difficult parts of the tree using that.

I would recommend a version of UCB MAB to you if you kept game statistics. But without them that's not really possible.

Zenmastur wrote:One thing I forgot to mention. 219 ELO over a no book configuration is nothing to scoff at. I suspect that an additional 100 to 200 ELO is possible with additions to the learning features already in your book.

mvk wrote:Thank you! The cynic in me just attributes that number to the quality of opening play of the underlying engine

I think the cynic in you is dead wrong. I think its all the book!

Regards,

Zen

syzygy · Post by **syzygy** » Fri May 15, 2015 1:34 pm

Zenmastur wrote:I see all sources of game information as useful for learning. I don't want to disregard any of it, if it can help reduce the need for learning or speed the learning process. These include human and computer game databases, online and self play, weekly TWIC files or similar, and game analysis done by the engine. If all available information is used the amount of learning and the time that's required can be considerably reduced. i.e. by an order of magnitude as compared to more "standard" approaches. You just need to be single minded about the goals and persistent in there pursuit.

It seems you know exactly what should be done. Go ahead and let us know when you're done...

Zenmastur · Post by **Zenmastur** » Fri May 15, 2015 5:33 pm

syzygy wrote:
Zenmastur wrote:I see all sources of game information as useful for learning. I don't want to disregard any of it, if it can help reduce the need for learning or speed the learning process. These include human and computer game databases, online and self play, weekly TWIC files or similar, and game analysis done by the engine. If all available information is used the amount of learning and the time that's required can be considerably reduced. i.e. by an order of magnitude as compared to more "standard" approaches. You just need to be single minded about the goals and persistent in there pursuit.
It seems you know exactly what should be done. Go ahead and let us know when you're done...

Always with the negative attitude that your so practiced at. It doesn't serve you as well as you think.

Regards,

Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books

Re: Position learning and opening books