A PGN parser

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

A PGN parser

Post by mcostalba »

For a new project I am starting to work on I need a batch PGN parser, i.e. a tool that reads a PGN file and parses the moves in SAN notation checking for legality. Note that this means to make the move, this is a needed side effect of converting the move in an internal representation compatible with the move generator.

I would like to load very big files of hundred of MB in few seconds.

I didn't found anything available, although there are very nice parsers in Python, like the cool python.chess library.

So I wrote it myself and tested on fics games DB:

Code: Select all

./parser ficsgamesdb_201601_standard_nomovetimes_1410669.pgn 
Size: 102143524 bytes

Processing...

Elpased time: 22356ms
Games: 129207
Moves: 8186293
Lines: 5070862
Games/second: 5779
Moves/second: 366178
MBytes/second: 4.56895
I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: A PGN parser

Post by Ferdy »

mcostalba wrote:For a new project I am starting to work on I need a batch PGN parser, i.e. a tool that reads a PGN file and parses the moves in SAN notation checking for legality. Note that this means to make the move, this is a needed side effect of converting the move in an internal representation compatible with the move generator.

I would like to load very big files of hundred of MB in few seconds.

I didn't found anything available, although there are very nice parsers in Python, like the cool python.chess library.

So I wrote it myself and tested on fics games DB:

Code: Select all

./parser ficsgamesdb_201601_standard_nomovetimes_1410669.pgn 
Size: 102143524 bytes

Processing...

Elpased time: 22356ms
Games: 129207
Moves: 8186293
Lines: 5070862
Games/second: 5779
Moves/second: 366178
MBytes/second: 4.56895
I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
The versatile pgn-extract can also check move legality, 104600 games in 28s which is slower than yours.

Run this intentionally adding an illegal move at black's 2nd move, it detected it with some comments.
[Event "?"]
[Site "?"]
[Date "2016.06.28"]
[Round "1"]
[White "asmFishW_2016-06-25_popcnt"]
[Black "asmFish_16.06.2016"]
[Result "1-0"]
[ECO "D00"]
[Opening "Queen's pawn, Mason Variation"]
[TimeControl "30+0.1"]
[Termination "adjudication"]
[PlyCount "110"]

1. d4 d5 2. Bf4 Bf3 3. c4 Bxb1 4. Rxb1 e6 5. Nf3 Bb4+ 6. Bd2 Bxd2+ 7. Qxd2
c6 8. e3 Nf6 9. Bd3 O-O 10. O-O Nbd7 11. b4 Qe7 12. Qb2 Rfe8 13. Rfd1 e5
14. dxe5 Nxe5 15. Nxe5 Qxe5 16. Qxe5 Rxe5 17. Be2 dxc4 18. Bxc4 Ree8 19. b5
cxb5 20. Bxb5 Red8 21. Be2 b6 22. Rbc1 g6 23. f3 Rac8 24. Rxd8+ Rxd8 25. e4
Rd2 26. Bb5 a6 27. Bxa6 Rxa2 28. Bf1 Rd2 29. Rc7 Nd7 30. h4 Kg7 31. Bb5 Ne5
32. Re7 Kf6 33. Rb7 Rb2 34. Rxb6+ Ke7 35. Rb7+ Kd6 36. Bc6 Rxb7 37. Bxb7
Kc5 38. Kf2 Kd4 39. Ba6 f6 40. Bb5 g5 41. h5 Nf7 42. Kg3 Ke3 43. Be8 Nh6
44. Bc6 Kd4 45. Ba4 Ke5 46. Bc2 f5 47. exf5 Nxf5+ 48. Kf2 Kf4 49. Bxf5 Kxf5
50. Ke3 Ke5 51. g3 Kd5 52. Kd3 Ke5 53. Kc4 Kf6 54. Kd5 g4 55. f4 Kf5 1-0

Code: Select all

C:\chess\Tools>pgn-extract-17-21 -s -obig1.pgn big.pgn
No bishop move possible to f3.
File big.pgn: Line number: 22
Failed to make move 2... Bf3 in the game:
rnbqkbnr
ppp.pppp
........
...p....
...P.B..
........
PPP.PPPP
RN.QKBNR

asmFishW_2016-06-25_popcnt - asmFish_16.06.2016 ? ? 2016.06.28

Games: 104600
elapsed: 28s
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: A PGN parser

Post by Rein Halbersma »

mcostalba wrote:For a new project I am starting to work on I need a batch PGN parser, i.e. a tool that reads a PGN file and parses the moves in SAN notation checking for legality. Note that this means to make the move, this is a needed side effect of converting the move in an internal representation compatible with the move generator.

I would like to load very big files of hundred of MB in few seconds.

I didn't found anything available, although there are very nice parsers in Python, like the cool python.chess library.

So I wrote it myself and tested on fics games DB:

Code: Select all

./parser ficsgamesdb_201601_standard_nomovetimes_1410669.pgn 
Size: 102143524 bytes

Processing...

Elpased time: 22356ms
Games: 129207
Moves: 8186293
Lines: 5070862
Games/second: 5779
Moves/second: 366178
MBytes/second: 4.56895
I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
What's the speed of raw reading + parsing, but without move legality validation (i.e. no move generation and copying to internal data structures) for you? I have a raw parser that does this file in 11 seconds, so if it's less than 50% to validate...
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: A PGN parser

Post by Milos »

mcostalba wrote:For a new project I am starting to work on I need a batch PGN parser, i.e. a tool that reads a PGN file and parses the moves in SAN notation checking for legality. Note that this means to make the move, this is a needed side effect of converting the move in an internal representation compatible with the move generator.

I would like to load very big files of hundred of MB in few seconds.

I didn't found anything available, although there are very nice parsers in Python, like the cool python.chess library.

So I wrote it myself and tested on fics games DB:

Code: Select all

./parser ficsgamesdb_201601_standard_nomovetimes_1410669.pgn 
Size: 102143524 bytes

Processing...

Elpased time: 22356ms
Games: 129207
Moves: 8186293
Lines: 5070862
Games/second: 5779
Moves/second: 366178
MBytes/second: 4.56895
I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
I wrote Brainfish bin book parser based on SF's move generator. It is kind of reversed process of what you are doing. In the process I am generating all the legal moves for each position not just checking legality for single moves and I get around 80Kpositions/second checked and write around 15Kpgn games/second. All performance single core.

I guess the quickest for you is just to use position function from SF. It will give you an idea about maximum speed even though without full legality check.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: A PGN parser

Post by mcostalba »

Rein Halbersma wrote: What's the speed of raw reading + parsing, but without move legality validation (i.e. no move generation and copying to internal data structures) for you? I have a raw parser that does this file in 11 seconds, so if it's less than 50% to validate...
Less than half second...

Code: Select all

./parser ficsgamesdb_201601_standard_nomovetimes_1410669.pgn 
File size (bytes): 102143524

Analizing...done.

Elpased time (ms): 482
Games: 129207
Moves: 8186293

Processing...done.
Sorting...done.

Elpased time (ms): 21746
Size of positions index (MB): 124.913
Size of games index (MB): 2.95731
Games/second: 5941
Moves/second: 376450
MBytes/second: 4.69712
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: A PGN parser

Post by Fulvio »

mcostalba wrote: I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
On my pc (q6600 + SSD) Scid (scid.sourceforge.net) takes 10 seconds for a file of similar size.

To run it without a GUI it's necessary a custom script parser.tcl:

Code: Select all

set pgnfile [lindex $argv 0]
if {[catch {sc_base import [sc_info clipbase] "$pgnfile"} result]} {
    puts stderr "Error importing "$pgnfile": $result"
    exit 1
}
set numImported [lindex $result 0]
set warnings [lindex $result 1]
puts "Imported $numImported games from $pgnfile"
puts $warnings
and use it with:

Code: Select all

./tkscid parser.tcl test.pgn
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: A PGN parser

Post by mcostalba »

Fulvio wrote: On my pc (q6600 + SSD) Scid (scid.sourceforge.net) takes 10 seconds for a file of similar size.
Does it check and validate every move?
G.B. Harms
Posts: 42
Joined: Sat Aug 08, 2009 2:18 pm
Location: Almere

Re: A PGN parser

Post by G.B. Harms »

You can try

Parser
https://github.com/Bobcat/bobcat/blob/master/src/Pgn.h

Derived class that checks legality
https://github.com/Bobcat/bobcat/blob/m ... gnPlayer.h

My own use of it
https://github.com/Bobcat/bobcat/blob/master/src/Tune.h

Only tested on pgn's from LittleBlitzer.

Speed seems a bit faster than yours but is with SSD and a fast computer.
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: A PGN parser

Post by Fulvio »

mcostalba wrote:
Does it check and validate every move?
Yes; it also converts the moves into Scid format (avoiding the conversion improves the speed by another 30%).
G.B. Harms
Posts: 42
Joined: Sat Aug 08, 2009 2:18 pm
Location: Almere

Re: A PGN parser

Post by G.B. Harms »

Oh I misread the opening post in my hurry. First I see it takes a little more than 22 seconds, and in that case my parser is probably much faster, although, secondly, I generate pseudo legal moves only and don't make/unmake to check legality. However the move genertator can take a LegalOnly flag and then it makes/unmakes in some edge cases only, so probably still fast enough. I'm also pretty sure that SF move generator is quite a bit faster than mine.