Creating Books from .PGN files

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 5:48 am

Creating Books from .PGN files

Post by Dave_N » Tue Dec 20, 2011 2:48 pm

I had a look at the Polyglot format and found that it doesn't give a win/loss/draw percentage and instead gives a weight, so I decided to have another look at creating books. My first thought was to modify the polyglot book creation to store white wins in the "weight" variable and draws or black wins in the "learn" variable. I decided to ignore polyglot and look at making my own format from my own pgn tree code...

At the moment I have written a code that will merge games into 1 game with variations, however I cannot detect transpositions. So for example if 2 games take a Ruy Lopez line with the Marshall attack and one line has black playing b5 after a6 and the other has Nf6 first (as in the mainline) then I don't have the games rejoining after the positions become equal, and instead the merged game is a variation of the first.

Obviously I could not compare FEN in a realistic time to detect the transposition so I would need some kind of hash table (probably) unless I create a FEN tree ... I am sure this has been tried before but is it practical?

User avatar
hgm
Posts: 23723
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Creating Books from .PGN files

Post by hgm » Tue Dec 20, 2011 4:56 pm

Polyglot format is what you could call a 'cooked' book, which does as much as possible of the procesing in advance. In the end the only thing that is relevant (and observable) is the probability with which everymove is played. In some ('raw') formats, these probabilities are calculated in the probing code, from the basic statistics, but Polylot books store those probabilities directly, as the weights. Polyglot's book building function sets the weights as the number of hlf-points scored with the move, which is a dubious method, which, IMO, could be much improved on.

Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 5:48 am

Re: Creating Books from .PGN files

Post by Dave_N » Tue Dec 20, 2011 5:08 pm

Yeah I read that the weight is (wins+losses)/2 and the improvement I considered would be at least to score more highly for the win percentage, I like to use win/draw as separate percentages, and if the learn variable isn't used I would try to overwrite this.

(I realized the FEN tree would be ridiculous).

I also googled about mathematical techniques for computing hash keys and so far I don't have a solution to computing the linear independence without brute force ...

I don't know if any book formats use a pgn tree to compute the openings but I am sure its a huge amount of time for larger books.

User avatar
hgm
Posts: 23723
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Creating Books from .PGN files

Post by hgm » Tue Dec 20, 2011 5:53 pm

No, weight = wins + draws/2. But it makes no sense for me to prefer a move that scored 2 out of 100 over one that cored 1 out of 2 by 2 to 1. It is true that the playing frequency is more significant than the percentage (if the games are taken from sufficiently strong players), but not infinitely more significant. At someoint the algorithm must be able to decide a move that always loses is no good...

kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 3:48 am

Re: Creating Books from .PGN files

Post by kbhearn » Wed Dec 21, 2011 5:34 am

You can eliminate false positive hash hits in a book by simply storing the full position (4 bits per square = 32 bytes, plus either a byte for flags or encode them into the squares - only 4-5 64 bit compares to validate a position). You'd still use a hash key of some sort to spread out the positions evenly amongst the buckets (you don't want to have to check every position for a match), just wouldn't have to worry about the odd pair of positions that cause a false positive :)

User avatar
hgm
Posts: 23723
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Creating Books from .PGN files

Post by hgm » Wed Dec 21, 2011 7:56 am

With 64-bit key, like Polyglot books use, worrying about collissions seems a bit paranoic. Even with 4 billion positions in the book it would be unlikely there was a single collission. Doubling the size of the book by using stronger keys would probably hurt more than accepting you play an illegal opening move once every quadrillion games...

Polyglot (and XBoard) probing code don't even use hashing, but a binary search. (The moves are sorted by key in the book.)

Michel
Posts: 2048
Joined: Sun Sep 28, 2008 11:50 pm

Re: Creating Books from .PGN files

Post by Michel » Wed Dec 21, 2011 8:10 am

Polyglot (and XBoard) probing code don't even use hashing, but a binary search. (The moves are sorted by key in the book.)
For really large books the binary search should probably be replaced by interpolation
search. That would dramatically reduce the number of seeks necessary as interpolation search is O(log log N) and binary search is O(log N).

I always wanted to implement this but never got around to it. Part of the reason is that there are currently some "limitations" in Polyglot which prevent you from making large books. That should be fixed first.

Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 5:48 am

Re: Creating Books from .PGN files

Post by Dave_N » Wed Dec 21, 2011 9:40 am

Perhaps detecting collisions during book creation and having a resolution code ... I don't understand how the book is binary searched, and the interpolation isn't something I can implement, however I am going to attempt some modifications - if I make any progress I'll post some source

Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 5:48 am

Re: Creating Books from .PGN files

Post by Dave_N » Wed Dec 21, 2011 10:10 am

well I can get this version
http://alpha.uhasselt.be/Research/Algeb ... t-release/
(the first tar.gz)
to compile, I can't see the problem with book size at first read through book_make.cpp

User avatar
hgm
Posts: 23723
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Creating Books from .PGN files

Post by hgm » Wed Dec 21, 2011 10:14 am

Binary search is simply bisection: you starting the middle, to see if the key there is smaller or larger than the one you are searching, and then you know if it should be in the first or second half of the book, and you look in the middle of that half, etc., untill the section the searched entry must be in has <= 1 entry in it.

Post Reply