Started with the recent version of caissabase, and filtered games by the following criteria: kept games after 1980, with minimum game length 50 plies (to eliminate games with early errors), with both players rated 2400+. Exported these games from scid. Cleaned the pgn with the following command
- pgn-extract --fixresulttags -e --nosetuptags input.pgn -o clean.pgn
- polyglot make-book -pgn clean.pgn -bin white.bin -min-game 32 -max-ply 100 -min-score 45 -only-white
- polyglot make-book -pgn clean.pgn -bin black.bin -min-game 32 -max-ply 100 -min-score 40 -only-black
- polyglot merge-book -in1 white.bin -in2 black.bin -out m32.s45-40.bin
Or I simply created a book in one pass as follows:
- polyglot make-book -pgn clean.pgn -bin m32.bin -min-game 32
I also did 'polyglot info-book' on the 4 books that are distributed with scid. I noticed that the books Elo2400 and gm2600 have similar numbers of white and black lines, while the books varied and performance contain significantly fewer black lines. In fact by the procedure I have described above, I could obtain comparable books (in terms of the numbers of lines and book sizes).
I don't want to make books that are too big because I have a few big books, and I suspect that they, with the exception of cebellum3merge, contain many very weak lines.
So my request is for people familiar with the scid books or in general people who know the dark art of book binding, could you comment on how the scid books were made, and if you have suggestions to improve my procedure described above. I may be using polyglot with a lot of misunderstanding. I am not a pro, but for me the scid books are somewhat like black boxes, and I don't know what to expect from them. But if I could produce similar books by setting appropriate parameters, I would get some insight into some general properties of these books.