CookieCat's opening book implementation

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

CookieCat's opening book implementation

Post by sje »

CookieCat's Pascal's opening book implementation is not yet finished, but it will be essentially the same as Symbolic's C++ implementation. The code is mostly ready except for the PGN game file parser.

Construction:

1) Open a PGN game file with lots of high quality games.

2) For each game, read the moves and create a position hash plus move record and store this in a binary tree with the hash as a key. Each position/move is normalized for White to move, so color reversed positions are also covered. To save space, only the first N ply moves per games are stored. A good value for N is 48.

3) Scan the tree in left/self/right order and write a binary file of sorted hash/move/usage records. This is the book.

Usage:

1) At start time, load the entire book into memory as a simple vector.

2) A probe is done by a simple binary search of the book vector with the position hash as the key. Before each probe, a BTM position is flipped to the corresponding WTM position.

3) Each probe looks at all records with a matching key and picks a move based on usage frequency with some help from a random number generator. For a BTW probe, the picked move is flipped.

Having the book in memory makes the probe operation much, much faster than is the case with an external file. Each record uses 16 bytes, so a book with 64 K moves takes only 1 MB of RAM. The slowest part of the operation is the BTW flip of the position; a new hash is generated by scanning the man locus bitboard and most of the men are still on the board during the opening. This slowdown can be eliminated by having a separate book just for BTM positions.

Elaborations:

1) Allow the user to supply a different book while the program is running.

2) Allow two separate books (or book pairs); one for each player independent of player color.

3) Probe the book at some nodes other than at the root for assistance with move ordering.
Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 7:48 am

Re: CookieCat's opening book implementation

Post by Dave_N »

Interesting thanks for sharing. I did a some research into memory mapped files last night can't see a quicker way to actually scan them for information other than looping with the pointer to the start of the file, I don't know how practical this is for very large files. After the initial scan I can probably create indexes/pointers to the actual games, this is how I think chessbase does it. There is a separate file with lists of indexes into the large file, so the database file can be accessed from Virtual Memory faster than in RAM. I think the one drawback is the ability to map several databases at once, my current program neatly concatenates the games from new pgn's loaded but this might be impractical with Vmem.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: CookieCat's opening book implementation

Post by mar »

Dave_N wrote:Interesting thanks for sharing. I did a some research into memory mapped files last night can't see a quicker way to actually scan them for information other than looping with the pointer to the start of the file, I don't know how practical this is for very large files. After the initial scan I can probably create indexes/pointers to the actual games, this is how I think chessbase does it. There is a separate file with lists of indexes into the large file, so the database file can be accessed from Virtual Memory faster than in RAM. I think the one drawback is the ability to map several databases at once, my current program neatly concatenates the games from new pgn's loaded but this might be impractical with Vmem.
Memory mapped files can't exceed 2G under Win32, don't know about other systems but don't expect you'll be able to memory map anything larger or close to 4G on a 32-bit system. Sure there's plenty of virtual space on 64-bit systems.
According to my own experiments some time ago, I found memory mapping slower than using a sufficiently large buffer and reading blocks of file into that block of memory.
Sure you can access memory mapped file just using a pointer and going back and forth.
If I understand properly, memory mapping works the way that the system creates a continuous chunk of pages in virtual memory of the calling process, intially unmapped (so i guess it will generate page fault exceptions on first access which then force the system to load a specific page from file and cache it for later use; similar to how page files [swap] work).

EDIT: another drawback is that memory mapping api will be different for Windows and other systems while fopen/fread will work everywhere.

Martin
Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 7:48 am

Re: CookieCat's opening book implementation

Post by Dave_N »

Yes perhaps some kind of buffered read also occurred to me, this also gives the ability to index the file for random access, however I would guess it would be less convenient for very large files depending on system RAM. For accessing .PGN files I would guess that reading in chunks into a 1KB buffer and storing the offsets to the start of the headers and the game text start and finish and then freeing the chunk could possibly work ... the headers and indexes to the game text are all that is really needed.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: CookieCat's opening book implementation

Post by mar »

Dave_N wrote:Yes perhaps some kind of buffered read also occurred to me, this also gives the ability to index the file for random access, however I would guess it would be less convenient for very large files depending on system RAM. For accessing .PGN files I would guess that reading in chunks into a 1KB buffer and storing the offsets to the start of the headers and the game text start and finish and then freeing the chunk could possibly work ... the headers and indexes to the game text are all that is really needed.
There's certainly no other way than to scan through the whole pgn for the first time, storing indices to separate index file. Each index should probably contain a 64-bit file pointer (offset) and a 16-bit or 32-bit game size, perhaps even last pgn file modification date so that you could rebuild the index if the pgn changed since the index was last built.
The index file will be significantly smaller but in theory might as well go beyond 2G. IMO accessing the index file without keeping it in memory should do. That way you save a lot of memory (and virtual address space as well). System disk cache should do great job here.
Considering the buffer size for reading the pgn, I would go well above 1KB, I would try 16 or 32KB for sure. On x86 systems, page size is 4KB.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: CookieCat's opening book implementation

Post by Sven »

Martin and David,

can one of you please enlighten me why this "memory mapped file" issue is related to the opening book creation? For the latter you open a PGN file and scan it once from top to bottom, no need to store the whole PGN somehow, so I don't quite get the point. Why would you need any kind of indexing into the PGN file to create a book?

Sven
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: CookieCat's opening book implementation

Post by mar »

Sven Schüle wrote:Martin and David,

can one of you please enlighten me why this "memory mapped file" issue is related to the opening book creation? For the latter you open a PGN file and scan it once from top to bottom, no need to store the whole PGN somehow, so I don't quite get the point. Why would you need any kind of indexing into the PGN file to create a book?

Sven
Hi Sven,
I was simply sharing my experience with memory mapping with David. I used to memory map the whole PGN into memory and then create a book from it. I sure don't need any index here and I think it's better not to use memory mapping at all. But I thought David is perhaps writing a PGN viewer or something like that. Sure it's a bit off topic so I'm sorry for that.

Martin
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: CookieCat's opening book implementation

Post by Sven »

mar wrote:
Sven Schüle wrote:Martin and David,

can one of you please enlighten me why this "memory mapped file" issue is related to the opening book creation? For the latter you open a PGN file and scan it once from top to bottom, no need to store the whole PGN somehow, so I don't quite get the point. Why would you need any kind of indexing into the PGN file to create a book?

Sven
Hi Sven,
I was simply sharing my experience with memory mapping with David. I used to memory map the whole PGN into memory and then create a book from it. I sure don't need any index here and I think it's better not to use memory mapping at all. But I thought David is perhaps writing a PGN viewer or something like that. Sure it's a bit off topic so I'm sorry for that.
No problem at all, I just worried whether I had missed an important point.

Sven
Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 7:48 am

Re: CookieCat's opening book implementation

Post by Dave_N »

Sven Schüle wrote:
mar wrote:
Sven Schüle wrote:Martin and David,

can one of you please enlighten me why this "memory mapped file" issue is related to the opening book creation? For the latter you open a PGN file and scan it once from top to bottom, no need to store the whole PGN somehow, so I don't quite get the point. Why would you need any kind of indexing into the PGN file to create a book?

Sven
Hi Sven,
I was simply sharing my experience with memory mapping with David. I used to memory map the whole PGN into memory and then create a book from it. I sure don't need any index here and I think it's better not to use memory mapping at all. But I thought David is perhaps writing a PGN viewer or something like that. Sure it's a bit off topic so I'm sorry for that.
No problem at all, I just worried whether I had missed an important point.

Sven
I must have misread the original post, I thought virtual memory was mentioned, and this is what I was working on all night.
Dave_N
Posts: 153
Joined: Fri Sep 30, 2011 7:48 am

Re: CookieCat's opening book implementation

Post by Dave_N »

mar wrote:
Sven Schüle wrote:Martin and David,

can one of you please enlighten me why this "memory mapped file" issue is related to the opening book creation? For the latter you open a PGN file and scan it once from top to bottom, no need to store the whole PGN somehow, so I don't quite get the point. Why would you need any kind of indexing into the PGN file to create a book?

Sven
Hi Sven,
I was simply sharing my experience with memory mapping with David. I used to memory map the whole PGN into memory and then create a book from it. I sure don't need any index here and I think it's better not to use memory mapping at all. But I thought David is perhaps writing a PGN viewer or something like that. Sure it's a bit off topic so I'm sorry for that.

Martin
Yes I am writing (completing) a viewer project, basically at a stage of optimzing for larger .pgn files to create opening books.