Bookbuilding 101

bob · Post by **bob** » Mon Aug 06, 2012 11:28 pm

Don wrote:
bob wrote:
Don wrote:
jdart wrote:
But there is a drawback. The only book lines you consider are the book lines you choose to include by whatever means.
This is a result of the fact that most book formats (including mine) are based on a fixed size book that is basically a persistent hash table. A real learning book would be able to dynamically add new moves that the program's opponents have played, which would require something closer to a real database interface - in fact I have considered using a SQL db like mySQL for this. Then you are not limited to a fixed move set but can add lines that have been played against you. I don't know why this is not more commonly done, except that it is more complex to implement.
I think a general solution is to to make sure that the computer would play one of those moves on it's own given the time control of the tournament the book is for. If not, the move the computer would play should be added. By doing this there are no blatant holes in the book where the only responses are losing move. Of course as you say you should have to option to add a move that is not in the book but could be critical. Also, if a move is playable you would want to be prepared against it, even if it's not of major interest. Any time your program is surprised by a move not in book it should be added and given proper treatment unless of course it's a blunder.
What about all the current opening moves a computer would not play by itself, but which are the ONLY way to avoid losing a game???
Please note that when I say, "put the move in the book" I don't mean that is what it should play. The context here is figuring out some automatic or semi-automatic way to build a custom book for a given program so you have to start with something that is as comprehensive as possible and then figure out how to narrow down the choices to moves your program will do well with.

If you KNOW there is only 1 move and everything else loses, then of course you would want some mechanism to specify this so that time is not wasted with the computer having to learn that it's bad. On other other hand you probably WANT that move in the book if it's an opponent option, if your program thinks it is good chances are other programs and perhaps humans too will think so. You would want your program to be able to refute it.

You sound like you are describing a simple sequential file update where you add what you played to the end in something like PGN format? Something you could later filter any way you want to create moves to add to a real book?

My book learning stuff was written specifically to eliminate that part of the process, and it works to a point. But it can't learn from "surprise moves" until they are actually encountered OTB, which is a problem...

Don · Post by **Don** » Tue Aug 07, 2012 12:27 am

bob wrote:
Don wrote:
bob wrote:
Don wrote:
jdart wrote:
But there is a drawback. The only book lines you consider are the book lines you choose to include by whatever means.
This is a result of the fact that most book formats (including mine) are based on a fixed size book that is basically a persistent hash table. A real learning book would be able to dynamically add new moves that the program's opponents have played, which would require something closer to a real database interface - in fact I have considered using a SQL db like mySQL for this. Then you are not limited to a fixed move set but can add lines that have been played against you. I don't know why this is not more commonly done, except that it is more complex to implement.
I think a general solution is to to make sure that the computer would play one of those moves on it's own given the time control of the tournament the book is for. If not, the move the computer would play should be added. By doing this there are no blatant holes in the book where the only responses are losing move. Of course as you say you should have to option to add a move that is not in the book but could be critical. Also, if a move is playable you would want to be prepared against it, even if it's not of major interest. Any time your program is surprised by a move not in book it should be added and given proper treatment unless of course it's a blunder.
What about all the current opening moves a computer would not play by itself, but which are the ONLY way to avoid losing a game???
Please note that when I say, "put the move in the book" I don't mean that is what it should play. The context here is figuring out some automatic or semi-automatic way to build a custom book for a given program so you have to start with something that is as comprehensive as possible and then figure out how to narrow down the choices to moves your program will do well with.

If you KNOW there is only 1 move and everything else loses, then of course you would want some mechanism to specify this so that time is not wasted with the computer having to learn that it's bad. On other other hand you probably WANT that move in the book if it's an opponent option, if your program thinks it is good chances are other programs and perhaps humans too will think so. You would want your program to be able to refute it.
You sound like you are describing a simple sequential file update where you add what you played to the end in something like PGN format? Something you could later filter any way you want to create moves to add to a real book?

sort of .... see my next comment

My book learning stuff was written specifically to eliminate that part of the process, and it works to a point. But it can't learn from "surprise moves" until they are actually encountered OTB, which is a problem...

I am thinking about using sqlite for the book because it's dirt simple to manipulate but that is an unimportant detail - it could easily be done with a polyglot-like hash table format.

But the concept is that the book is expandable and contains everything, even moves the program may never play (or may only play in some mode where you allow off-beat and weaker lines of play as a user option.) But by some undefined mechanism - perhaps off-line processing - the computer is constantly inspecting and improving it's opening book choices. This likely would include data from games it has won and lost.

I don't know how masters do it, but I learned the openings I played primarily be losing games. I never made the same mistake twice and when I lost a game (or even got into a difficult game that I did not lose) I figured out where I could have improved.

A funny thing happened when I was weaker too which illustrates a real problem. Sometimes the computer would throw something at me (right out of book) that I could not handle but was technically not a good move. I would work to figure out how to refute it thus improving my results against the computer which I was usually evenly matched with (this was back in the fidelity days.) But for fun I would try these same moves on my friends at the club in speed-chess and would usually get easy wins. I would watch them go through the same process I did of figuring out what was wrong with the move, then I would stop playing it.

But the computer would not have that flexibility. If you set it up to play on icc and maximize it's wins, it may be learning what works under those conditions but not necessarily be improving. I don't know if there is anything to be done about this, but it's a possibility with any kind of learning I can think of that doesn't have the ability to recursively modify itself. But if the opponents don't find a refutation for a long time the machine may take a long time to unlearn some bad habit.

Which reminds me of another anecdote - when I was a kid at school, I watched one kid always play 1. e4 e5 2. Qh5 when he could against all the other kids. He won game after game because almost everyone without thinking at all would play g6 to attack the king as a gut reaction. A couple of kids figured out to defend the pawn with Nc6 or d6 but he would play Bc4 and 90% of the time his opponent didn't see the checkmate threat. I never played 1... e5 but after watching him I went home and analyzed this like crazy, figuring out all the things he might try and what I would do. I came back the next day and played 1 ... e5 just so that I could face the "intimidating" Qh5!

For dramatic effect I pretended like I was going to play g6 but stopped just short of touching the pawn and pretended that I saw the problem at the last possible moment, then played Nc6.
I pretended that I did not know what I was doing.

He continued with his usual Bc4 and I had Qe7 prepared - I don't really know to this day if it's best but it worked for me and it frustrated him because it solves the mate threat. He played Nf3 and I knew that he planned to gang up on f7 again with his knight - I knew his style already. I responded with Nf6 and he played the stupid Qh3 move and I easily won a piece and the game.

This is the first time I had ever analyzed a game or really saw the beauty of chess and got my first inkling that chess was more than just attacking a piece and hoping the opponent didn't see it. Even though I never lost to this kid I learned a lot from him - and I think I went from being much weaker than him to being much stronger than him in a single 24 hour period. It was one of those experiences that you never forget because it makes something click inside you and changes you.

But the real point of this is that book learning could produce books that play like this! By nature it is more interested in what works than in what is actually best!

Dan Honeycutt · Post by **Dan Honeycutt** » Tue Aug 07, 2012 12:29 am

Thanks to all for your thoughts and comments. Lots of good food for thought. A few random thoughts of mine -

The more I read and think about it the less I think of popularity as a metric for determining the worth of a move. Maybe some (fairly small) minimum times played consideration but that's about all I think I might do.

@Matthew - I don't know whether it is better to use the engine the book is for (so it comes out of the book more often in a position it likes) or the best engine available to get the most accurate score. I plan to make the engine easily selected so you can do what you want.

@ Jon - I thought about scoring only the leaves and backing those up but decided against that because I can't know if the best move for each position is in my book. I could have some line that I think is about equal based on the leaf scores but, in fact, there is a sockdolager available at some intermediate position that I never considered. The only solution I could see was to cut the engine loose on every position in the book. Maybe you know of another way?

@Bob - I think current engines can evaluate the soundness of a lot of gambits though there certainly some beyond their comprehension. I plan to have hand tuning so you can go to any position and put in the moves and scores as you think they should be.

@Harvey - The book for my program Bruja has about 1000 positions which, in turn, give about 5000 paths a book opening can take. That's a pretty small book but it gave me enough diversity to run 1000 game matches without duplicates being much of a problem. I'm thinking 30 seconds per position should give a decent score as well as pick up any outright blunders. For 1000 positions that's about 8 hours. So scoring could take weeks or months but I wouldn't think years unless it is a really big book.

@Don - you wrote:

At node N move A is played 5 times and scores 70%
move B is played 15 times and score 60%

What should be score of node N?

The score for node N is the winning percentage for that node - which could be higher than 70 or lower than 60. Move A doesn't score 70% but rather it leads to a position where the winning percentage (actually the losing percentage if we're talking about node N) is 70%. If A and B are the only moves for node N then the percentage is 62.5.

@Tim - yes, the formula will be the key to how well it works. From some very preliminary scoring and comparing with on-line database percentages it looks like a 1% win percentage equates to about 1/10 pawn. I see this value as key to the formula weights. What I need is to get Adam interested in this question then we could get a real good number.

Another factor in the formula is the fact that Black is typically behind in both position evaluation and win percentages. I don't want this to result in fewer (or worse, no) book moves to choose from. The simplest way to deal with this seems to be to give all black moves a little bonus, a win percent or two and maybe 15 centipawns.

Best
Dan H.

jdart · Post by **jdart** » Tue Aug 07, 2012 1:18 am

I am thinking about using sqlite for the book because it's dirt simple to manipulate but that is an unimportant detail - it could easily be done with a polyglot-like hash table format.

You might consider a B-tree or similar system. I was going to recommend Berkeley DB, http://www.oracle.com/technetwork/produ ... index.html, which is designed to be an embeddable storage engine, but it has a copyleft license which might not suit you. There are a few similar products such as Dx (http://www.dss.bc.ca/dx/) which has a BSD-style license.

--Jon

Dan Honeycutt · Post by **Dan Honeycutt** » Tue Aug 07, 2012 1:18 am

Don wrote:In the book I talked about I was horrified to see that 1. h4 made it into the book.

According to Chessbase 1. h4 is the 16 th most popular move, played a paltry 453 times in over 6 million games. Much more popular is the Polish Opening, 1. b4, 8th on the list and played some 24 thousand times.

Try an experiment. Play both 1. h4 and 1. b4 against Komodo with no book. Give it enough time for the score to settle and see which move it likes best (or least since it is black).

Best
Dan H.

jdart · Post by **jdart** » Tue Aug 07, 2012 1:28 am

Be careful with playchess games. A lot of them are lost by forfeit or on time, and the other side might actually be winning. I have a filter that sanity checks the game result by doing a shallow search at the end position to eliminate these.

I do not think there is any magic way to get an excellent book automatically although this thread is full of good suggestions. I still have a hand-tuned book file that I use to override some of the lines that my PGN-derived book has.

--Jon

Norm Pollock · Post by **Norm Pollock** » Tue Aug 07, 2012 4:05 am

jdart wrote:Be careful with playchess games. A lot of them are lost by forfeit or on time, and the other side might actually be winning. I have a filter that sanity checks the game result by doing a shallow search at the end position to eliminate these.

I do not think there is any magic way to get an excellent book automatically although this thread is full of good suggestions. I still have a hand-tuned book file that I use to override some of the lines that my PGN-derived book has.

--Jon

Hi Jon,

Is your "sanity checker" available to download?

-Norm

jdart · Post by **jdart** » Tue Aug 07, 2012 5:35 am

It is one of several utility programs I don't distribute with the Arasan source. But I don't mind if you use it.

I have uploaded a Windows executable to

http://www.arasanchess.org/playchess2.exe

It is a filter so it takes a PGN as input and outputs a subset (the valid PGNs) to stdout. It can handle comments but not annotated games with variations.

Caveat: its eval is only as good the Arasan search engine that is embedded in it. Also the margins for considering a game valid are hard coded.

Source code follows, in case you are interested or want it on a non-Windows platform. You need the Arasan source (14.3) to build it.

Code: Select all

// Copyright 2010, 2011 by Jon Dart. All Rights Reserved.
#include "board.h"
#include "notation.h"
#include "legal.h"
#include "hash.h"
#include "globals.h"
#include "chessio.h"
#include "util.h"
#include "search.h"
#include <iostream>
#include <fstream>
#include <ctype.h>

static const int MARGIN = 70;
static const int max_ply = 500;

int CDECL main&#40;int argc, char **argv&#41;
&#123;
   Bitboard&#58;&#58;init&#40;);
   initOptions&#40;argv&#91;0&#93;);
   Attacks&#58;&#58;init&#40;);
   Scoring&#58;&#58;init&#40;);
   if (!initGlobals&#40;argv&#91;0&#93;, false&#41;) &#123;
      cleanupGlobals&#40;);
      exit&#40;-1&#41;;
   &#125;
   atexit&#40;cleanupGlobals&#41;;
   Hash&#58;&#58;initHash&#40;&#40;size_t&#41;64000000&#41;;

   options.book.book_enabled = options.log_enabled = 0;

   int arg = 1;
   unsigned minMoves = 30;
   int minELO = 0;

   if &#40;argc ==1&#41; &#123;
      cerr << "usage&#58; playchess.com <input file>" << endl;
      return -1;
   &#125;
   else &#123;
      for (;arg < argc;++arg&#41; &#123;
         if &#40;argv&#91;arg&#93;&#91;0&#93; == '-') &#123;
         &#125;
         else
            break;
      &#125;
      SearchController *searcher = new SearchController&#40;);
      Statistics stats;
      for (;arg < argc;arg++) &#123;
         ifstream pgn_file&#40; argv&#91;arg&#93;, ios&#58;&#58;in&#41;;
         int c;
         int count = 0;
         hash_t game_hash;
         ColorType side;
         string result, white, black;
         if (!pgn_file.good&#40;)) &#123;
            cerr << "could not open file " << argv&#91;arg&#93; << endl;
            exit&#40;-1&#41;;
         &#125;
         else &#123;
            mainloop&#58;
            ArasanVector<StringPair> hdrs;

            Board board;
            while (!pgn_file.eof&#40;)) &#123;

               long first;
               MoveArray moves;
               board.reset&#40;);
               side = White;
               while &#40;pgn_file.good&#40;) && &#40;c = pgn_file.get&#40;)) != EOF&#41; &#123;
                  if &#40;c=='&#91;') &#123;
                     pgn_file.putback&#40;c&#41;;
                     break;
                  &#125;
               &#125;
               hdrs.removeAll&#40;);
               ChessIO&#58;&#58;collect_headers&#40;pgn_file,hdrs,first&#41;;

               game_hash = 0;
               float last_score = -1;
               int mate = 0;
               int valid = 1;
               int have_scores =  0;
               int var = 0;
               for (;;) &#123;
                  static char buf&#91;2048&#93;;
                  static char num&#91;20&#93;;
                  ChessIO&#58;&#58;Token tok = ChessIO&#58;&#58;get_next_token&#40;pgn_file,buf,2048&#41;;
                  if &#40;tok == ChessIO&#58;&#58;Eof&#41;
                     break;
                  if &#40;tok == ChessIO&#58;&#58;Number&#41; &#123;
                    if &#40;var&#41; continue;
                    strncpy&#40;num,buf,20&#41;;
                  &#125;
                  else if &#40;tok == ChessIO&#58;&#58;GameMove&#41; &#123;
                     if &#40;var&#41; continue; 
                     // parse the move
                     Move m;
                     m = Notation&#58;&#58;value&#40;board,board.sideToMove&#40;),buf&#41;;
                     if &#40;IsNull&#40;m&#41; ||
                         !legalMove&#40;board,StartSquare&#40;m&#41;, DestSquare&#40;m&#41;)) &#123;
                                                  // echo to both stdout and stderr
                        cerr << "Illegal move&#58; " << buf << endl;
                        cout << "Illegal move&#58; " << buf << endl;
                        valid = 0;

                     &#125;
                     else if &#40;valid&#41; &#123;
                        char result&#91;40&#93;;
                        Notation&#58;&#58;image&#40;board,m,result&#41;;
                        BoardState bs = board.state;
                        moves.add_move&#40;board,bs,m,result&#41;;

                        board.doMove&#40;m&#41;;
                     &#125;
                     side = OppositeColor&#40;side&#41;;
                  &#125;
                  else if &#40;tok == ChessIO&#58;&#58;Unknown&#41; &#123;
                     if &#40;strcmp&#40;buf,"(") == 0&#41;
                          ++var;
                     else if &#40;strcmp&#40;buf,")")==0&#41; &#123;
                          --var;   
                     &#125; else &#123;
                       cerr << "Unrecognized text&#58; " << buf << endl;
                       valid = 0;
                     &#125;
                  &#125;
                  else if &#40;tok == ChessIO&#58;&#58;Comment&#41; &#123;
                  &#125;
                  else if &#40;tok == ChessIO&#58;&#58;Result&#41; &#123;
                      result = buf;
                      if &#40;result == "#") &#123;
                          last_score = 10000.0F;
                      &#125;
                      else if &#40;result == "1/2-1/2" &&
                                   Scoring&#58;&#58;isDraw&#40;board&#41;) &#123;
                          last_score = 0.0;
                          break;
                      &#125; else &#123;
                          stats.clear&#40;);
                          searcher->findBestMove&#40;board,
                                                        FixedDepth,
                                                        10,
                                                        0,
                                                        6, false, false,
                                                        stats,
                                                        Silent&#41;;
                     
                          last_score = &#40;float&#41;stats.value*100;
                     &#125;
                     if &#40;board.sideToMove&#40;) != White&#41; last_score = -last_score;
                     break;
                  &#125;
               &#125;
               // done with a game
               if &#40;moves.num_moves&#40;) < minMoves || !valid&#41; &#123;
                  continue;
               &#125;
               if &#40;strcmp&#40;result.c_str&#40;),"1/2-1/2")==0&#41; &#123;
                  if &#40;last_score >50.0 || last_score <-50.0&#41; &#123;
                     continue;
                  &#125;
               &#125;
               else if &#40;strcmp&#40;result.c_str&#40;),"1-0")==0&#41; &#123;
                  if &#40;last_score <= 300.0&#41; continue;
               &#125;
               else if &#40;strcmp&#40;result.c_str&#40;),"0-1")==0&#41; &#123;
                  if &#40;last_score >= -300.0&#41; continue;
               &#125;

               string eco;
               ChessIO&#58;&#58;get_header&#40;hdrs,"ECO",eco&#41;;

               // output header
               if &#40;valid&#41; &#123;
                   string whiteELOStr,blackELOStr;
                   if &#40;minELO > 0&#41; &#123;
                       if &#40;ChessIO&#58;&#58;get_header&#40;hdrs,"WhiteElo",whiteELOStr&#41; &&
                           ChessIO&#58;&#58;get_header&#40;hdrs,"BlackElo",blackELOStr&#41;) &#123;
                           int whiteElo=0, blackElo=0;
                           if &#40;sscanf&#40;whiteELOStr.c_str&#40;),"%d",&whiteElo&#41; &&
                               sscanf&#40;blackELOStr.c_str&#40;),"%d",&blackElo&#41;) &#123;
                               if &#40;whiteElo < minELO || blackElo < minELO&#41;
                                   continue;
                           &#125;
                       &#125;
                       else                            // no ELO info available
                           continue;
                   &#125;
                   ChessIO&#58;&#58;store_pgn&#40;cout, moves, result, hdrs&#41;;
               &#125;
            &#125;
            pgn_file.close&#40;);
         &#125;
      &#125;
   &#125;

   return 0;
&#125;

Codesquid · Post by **Codesquid** » Tue Aug 07, 2012 8:25 am

Don wrote: I am thinking about using sqlite for the book because it's dirt simple to manipulate but that is an unimportant detail - it could easily be done with a polyglot-like hash table format.

Good choice. Sqlite's ACID properties (atomic, consistent, isolated, durable) have saved me more than once. Every time I write my own file format I inadvertently do something that corrupts the file.

The biggest drawback with using SQLite for a book is the size overhead. A specially tailored file format will produce much smaller files.

Rein Halbersma · Post by **Rein Halbersma** » Tue Aug 07, 2012 9:38 am

bob wrote:
jdart wrote:This is basically correct but you need to start evaluating at the leaf nodes of the book and back up the scores, using the minimax algorithm. For a large book and a lot of nodes, that is a lot of CPU time.

--Jon
I did this in early versions of Crafty. CPU for minimaxing is basically not an issue. The I/O however, is expensive if you use a large book.

But the problem is in the minimaxing. You are choosing an opening line based on whatever info you get at the end of that line, minimaxed with all of the alternatives along that line. That caused me great grief and I gave up on the idea. You choose lines based on a position that might have resulted from really lousy play. I eventually tried to minimax based on GM analysis, but I found positions where "white is clearly winning, but black had a forced mate" and gave up on that as well.

I even tried the idea of doing a search at each book position to be sure that there are no tactical errors along interior book paths. But shallow searches are confused by gambits and you can't trust them. Long searches on a large book becomes computationally intractable. Another approach I have tried (with better success) is to create a book, use book learning, and play millions of games against a variety of opponents and let the learning weed out the bad lines. But gambits are again a problem, since shallow searches won't understand the reason behind the gambit every time...

In short, most likely, hand-tuned books will still be the optimal solution for quite a while longer.

In checkers/draughts/othello/amazons/awari, people very successfully use dropout-expansion, a book-building technique developed a decade ago by Thomas Lincke, see chapter 3 of:
http://e-collection.library.ethz.ch/ese ... 905-02.pdf

Martin Fierz has written an old post at CCC about his experiences with dropout-expansion in checkers and chess:
http://www.stmintz.com/ccc/index.php?id=201669

A decade ago, the main problem for chess seemed the proliferation of slightly weaker but irrelevant moves. It would be interesting to see what modern engines think of the examples he gave.

Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101

Re: Bookbuilding 101