A PGN parser

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: A PGN parser

Post by Sven »

jwes wrote:
Sven Schüle wrote:
mcostalba wrote:I have seen that although move generation is needed, it can be restricted to only the sensible subset with just a look at the san move, for instance if the PGN move contains 'x' then it is a capture and you can generate just captures.
Wouldn't it be sufficient to only generate moves to the given target square, and also only for pieces of the given type? For instance "e4" -> only pawn moves to e4 or "Ng5" -> only knight moves to g5? A simple intersection of two bitboards in the latter case ...
If you need to check for legality, you would also need to check if the moving piece is pinned. You could also need to check for pins to disambiguate moves
Right of course - I did not write about legality check (which is required) but only about the amount of required move generation (which can be kept at a minimum). The legality check will not cost a lot, though, since we all know that only a small subset of pseudo-legal moves can be illegal. Whether a move given by SAN is pseudo-legal is a direct result of the minimal move generation step. And if I assume bitboards then detecting pins is very cheap as well.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: A PGN parser

Post by mcostalba »

Sven Schüle wrote: Wouldn't it be sufficient to only generate moves to the given target square, and also only for pieces of the given type?
Yes it is possible, the real saving for me is to reduce the number of moves to compare with the given one in san format.

With the above optimization I am below 5 seconds now, at about 26K games per second.

Code: Select all

$ ../parser/parser fics.pgn

Analizing...done
Processing...done
Sorting...done
Writing to files...done

Games: 129207
Moves: 8186293
Games/second: 26091
Moves/second: 1653128
MBytes/second: 20.6267
Size of positions index (MB): 93.6847
Size of games index (MB): 1.97154
Positions index: fics.pgn.kidx
Games index: fics.pgn.gidx
Processing time (ms): 4952
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: A PGN parser

Post by jwes »

mcostalba wrote:
Sven Schüle wrote: Wouldn't it be sufficient to only generate moves to the given target square, and also only for pieces of the given type?
Yes it is possible, the real saving for me is to reduce the number of moves to compare with the given one in san format.

With the above optimization I am below 5 seconds now, at about 26K games per second.

Code: Select all

$ ../parser/parser fics.pgn

Analizing...done
Processing...done
Sorting...done
Writing to files...done

Games: 129207
Moves: 8186293
Games/second: 26091
Moves/second: 1653128
MBytes/second: 20.6267
Size of positions index (MB): 93.6847
Size of games index (MB): 1.97154
Positions index: fics.pgn.kidx
Games index: fics.pgn.gidx
Processing time (ms): 4952
How much time do you spend reading and parsing the file? It may be that move validation is no longer the slowest part.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: A PGN parser

Post by mcostalba »

jwes wrote: How much time do you spend reading and parsing the file? It may be that move validation is no longer the slowest part.
The slowest part it was to generate the moves and compare them with the PGN move in san format. now I have very optimized this part, generating only the needed move, so that the san comparison is done only once per move in almost all cases. Doing the actual move is starting to be relevant now, in the above example the tool actually makes 8,186,293 moves.

Anyhow with further optimizations now I think I have one of the fastest PGN parser around at about 40K games/sec:

Code: Select all

$ ../parser/parser fics.pgn

Analizing...done
Processing...done
Sorting...done
Writing to files...done

Games: 129207
Moves: 8186293
Unique positions: 85%
Games/second: 39927
Moves/second: 2529756
MBytes/second: 31.5647
Size of index file (MB): 114866528
Index file: fics.pgn.idx
Processing time (ms): 3236
User avatar
hgm
Posts: 27791
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: A PGN parser

Post by hgm »

mcostalba wrote:The slowest part it was to generate the moves and compare them with the PGN move in san format.
Of course. The slowest part of generating checks is to go through all 4096 (fromSqr, toSqr) combinations, to test if they make a valid pseudo-legal move.

It is always possible to do things in a way that wastes a horrendous amount of work.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: A PGN parser

Post by Dann Corbit »

Will you publish the code for it?
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: A PGN parser

Post by mcostalba »

Dann Corbit wrote:Will you publish the code for it?
Well the parser by itself is not so useful, so I have written a small Polyglot book building application that uses it.

You can download sources from here:

https://github.com/mcostalba/chess_db/archive/v0.1.zip

This tool uses the same Makefile of Stockfish, so compilation steps are the same, for instance like:

Code: Select all

make build ARCH=x86-64-modern
Binary name is parse (or parse.exe under Windows), you run it as:

Code: Select all

parse myfile.pgn
And it will create a myfile.bin that is the Polyglot book out of the PGN.

The tool is relatively forgiving in the input moves in SAN format, but requires:

- A correct disambiguation of moves like Rac3, in particular if only one move is legal then no disambiguation

- Castling should be O-O, not 0-0 (big O, not zero)


Please report any issues you may find.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: A PGN parser

Post by Dann Corbit »

mcostalba wrote:
Dann Corbit wrote:Will you publish the code for it?
Well the parser by itself is not so useful, so I have written a small Polyglot book building application that uses it.

You can download sources from here:

https://github.com/mcostalba/chess_db/archive/v0.1.zip

This tool uses the same Makefile of Stockfish, so compilation steps are the same, for instance like:

Code: Select all

make build ARCH=x86-64-modern
Binary name is parse (or parse.exe under Windows), you run it as:

Code: Select all

parse myfile.pgn
And it will create a myfile.bin that is the Polyglot book out of the PGN.

The tool is relatively forgiving in the input moves in SAN format, but requires:

- A correct disambiguation of moves like Rac3, in particular if only one move is legal then no disambiguation

- Castling should be O-O, not 0-0 (big O, not zero)


Please report any issues you may find.
In order to create a pgo compile, I needed to make the following steps:
1. Create a folder called syzygy and put the syzygy interface stuff in it.
2. Change the redirect from /dev/nul to nul (for windows) {maybe not necessary, if msys2 has a device /dev/nul}
3. Copy a pgn file to the working directory of the binary and rename it "bench"

Code: Select all

Step 2/4. Running benchmark for pgo-build ...
./parser bench > nul

Processing...done
Sorting...done
Writing Polygot book...done

Games: 224
Moves: 31989
Unique positions: 90%
Games/second: 2986
Moves/second: 426520
MBytes/second: 92.612
Size of index file (MB): 469552
Book file: bench.bin
Processing time (ms): 75


Step 3/4. Building final executable ...
make ARCH=x86-64-modern COMP=mingw gcc-profile-use
make[1]: Entering directory '/f/project/dcorbit/chess_db-0.1/parser'
make ARCH=x86-64-modern COMP=mingw \
EXTRACXXFLAGS='-fprofile-use -fno-peel-loops -fno-tracer' \
EXTRALDFLAGS='-lgcov' \
all
make[2]: Entering directory '/f/project/dcorbit/chess_db-0.1/parser'
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -fno-peel-loops -fno-tracer -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT   -c -o bitboard.o bitboard.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -fno-peel-loops -fno-tracer -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT   -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -fno-peel-loops -fno-tracer -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT   -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -fno-peel-loops -fno-tracer -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT   -c -o parser.o parser.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -fno-peel-loops -fno-tracer -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT   -c -o position.o position.cpp
g++ -o parser bitboard.o main.o misc.o parser.o position.o -lgcov -static
make[2]: Leaving directory '/f/project/dcorbit/chess_db-0.1/parser'
make[1]: Leaving directory '/f/project/dcorbit/chess_db-0.1/parser'

Step 4/4. Deleting profile data ...
make ARCH=x86-64-modern COMP=mingw gcc-profile-clean
make[1]: Entering directory '/f/project/dcorbit/chess_db-0.1/parser'
make[1]: Leaving directory '/f/project/dcorbit/chess_db-0.1/parser'

Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: A PGN parser

Post by Dann Corbit »

F:\project\dcorbit\chess_db-0.1\parser>parser \pgn\ccrl\nocomments\ccrl-4040-bare.pgn

Processing...done
Sorting...done
Writing Polygot book...done

Games: 657560
Moves: 89142420
Unique positions: 82%
Games/second: 20956
Moves/second: 2841011
MBytes/second: 23.54
Size of index file (MB): 1193956144
Book file: \pgn\ccrl\nocomments\ccrl-4040-bare.bin
Processing time (ms): 31377

Book is bigger than the PGN file:
2016-11-02 12:54 AM 1,193,956,144 ccrl-4040-bare.bin
2016-10-06 06:31 PM 738,613,036 ccrl-4040-bare.pgn
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: A PGN parser

Post by mcostalba »

Dann Corbit wrote: In order to create a pgo compile, I needed to make the following steps:
This is really not meant to be PGO compiled, it misses the bench part.

You should compile with:

Code: Select all

make build ARCH=x86-64-modern COMP=gcc
I have further improved the speed (huge jump) and now I think it is the fastest PGN parser out there :-)

Please give it another spin:
https://github.com/mcostalba/chess_db/archive/v0.2.zip