Understanding polyglot books

Discussion of chess software programming and technical issues.

Moderator: Ras

chesskobra
Posts: 348
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Understanding polyglot books

Post by chesskobra »

I am trying to get some insight into making polyglot books, especially for casual human use, not for engine testing. I did a few experiments, but mainly used the following general procedure.

Started with the recent version of caissabase, and filtered games by the following criteria: kept games after 1980, with minimum game length 50 plies (to eliminate games with early errors), with both players rated 2400+. Exported these games from scid. Cleaned the pgn with the following command
  • pgn-extract --fixresulttags -e --nosetuptags input.pgn -o clean.pgn
Then I constructed white and black books, and merged the books, as follows:
  • polyglot make-book -pgn clean.pgn -bin white.bin -min-game 32 -max-ply 100 -min-score 45 -only-white
  • polyglot make-book -pgn clean.pgn -bin black.bin -min-game 32 -max-ply 100 -min-score 40 -only-black
  • polyglot merge-book -in1 white.bin -in2 black.bin -out m32.s45-40.bin
The book in this example has 9964 white lines and 7597 black lines (significantly fewer black lines). I found that I get significantly fewer black lines when I set even a modest minimum black score like 35-40.

Or I simply created a book in one pass as follows:
  • polyglot make-book -pgn clean.pgn -bin m32.bin -min-game 32
This book has 10683 white lines and 10833 black lines (similar numbers of white and black lines).

I also did 'polyglot info-book' on the 4 books that are distributed with scid. I noticed that the books Elo2400 and gm2600 have similar numbers of white and black lines, while the books varied and performance contain significantly fewer black lines. In fact by the procedure I have described above, I could obtain comparable books (in terms of the numbers of lines and book sizes).

I don't want to make books that are too big because I have a few big books, and I suspect that they, with the exception of cebellum3merge, contain many very weak lines.

So my request is for people familiar with the scid books or in general people who know the dark art of book binding, could you comment on how the scid books were made, and if you have suggestions to improve my procedure described above. I may be using polyglot with a lot of misunderstanding. I am not a pro, but for me the scid books are somewhat like black boxes, and I don't know what to expect from them. But if I could produce similar books by setting appropriate parameters, I would get some insight into some general properties of these books.
chesskobra
Posts: 348
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

polyglot --min-score

Post by chesskobra »

I am trying to understand the -min-score option. The manual says

-min-game (default: 3)
Specifies the minimum number of games that have to contain this move for it to be included in the book.

-min-score (default: 0.0)
Specifies the minimum score (or weight) this move should have received for it to be included in the book. The score is
2*(wins)+(draws), globally scaled to fit into 16 bits.

Here the score 2*(wins) + (draws) confuses me. Is it the absolute number? What should happen if I give -min-score 40? It would require at least 20 games (and in practice many more) for some position to be included. But what I see is that a book created with "-min-game 3 -min-score 40" is much larger than a book created with "-min-game 8 -min-score 40", even though in both cases we would need at least 20 games to get the score of 40.

It would be useful to specify -min-score as % score, i.e., (wins + draws/2)*100/(number of games). In fact that is what I had interpreted it to mean until I read the manual.

Does it make sense to use the two options -min-game and -min-score together?
JoAnnP38
Posts: 253
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Understanding polyglot books

Post by JoAnnP38 »

If I'm not mistaken, polyglot uses the -min-score setting to set a minimum threshold on the score of the move which is calculated by looking at how many times that move is made from that position inside the entire PGN and summing together the score (i.e. (2 * wins + draws)/#of games it was played in or some other scaling constant. This way polyglot can filter out books moves that lead to lost games.
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Understanding polyglot books

Post by phhnguyen »

For understanding, you may need to look inside the Polyglot book you have created before going further ;)

Image
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
chesskobra
Posts: 348
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: Understanding polyglot books

Post by chesskobra »

Thanks @phhnguyen, I will take a look at banksia.

My question is precisely this. If I create two books - book A with -min-game 3 -min-score 40, and book B with -min-game 8 -min-score 40, then 2W+D >= 40 should force the books A and B to be identical because positions that appear in fewer than 20 games will be excluded anyway, i.e., if -min-score is large, then -min-game 3 or 8 should not make difference. Then why is book A much larger than B?

It does not seem that -min-score option is used only to set weights, and not to exclude or include positions, because if I create 2 books - book A with -min-game 3 and no -min-score option, and B with -min-game 3 -min-score 40, then A is bigger than B. This shows that -min-score option does influence inclusion/exclusion of positions.
User avatar
phhnguyen
Posts: 1524
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: Understanding polyglot books

Post by phhnguyen »

chesskobra wrote: Wed Jun 07, 2023 12:12 pm Thanks @phhnguyen, I will take a look at banksia.

My question is precisely this. If I create two books - book A with -min-game 3 -min-score 40, and book B with -min-game 8 -min-score 40, then 2W+D >= 40 should force the books A and B to be identical because positions that appear in fewer than 20 games will be excluded anyway, i.e., if -min-score is large, then -min-game 3 or 8 should not make difference. Then why is book A much larger than B?

It does not seem that -min-score option is used only to set weights, and not to exclude or include positions, because if I create 2 books - book A with -min-game 3 and no -min-score option, and B with -min-game 3 -min-score 40, then A is bigger than B. This shows that -min-score option does influence inclusion/exclusion of positions.
I can't answer your question since I have never used nor looked into the code of Polyglot program. Perhaps the logic of that program differs from your thoughts or it has some bugs in the implementation. Almost all people just use it without thinking/checking deeply ;)

You may find the answer yourself, by looking at the code. The other way is by trial or error: just create some very small books (using very few games with a few plies). Then looks into those books to understand branches/weights and compare them.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
chesskobra
Posts: 348
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: Understanding polyglot books

Post by chesskobra »

phhnguyen wrote: Tue Jun 06, 2023 3:25 pm For understanding, you may need to look inside the Polyglot book you have created before going further ;)
Why does banksia show only first 2-3 moves of the book? Is there a setting to change that? Also, is there a way to export a bin book to pgn?