For a new project I am starting to work on I need a batch PGN parser, i.e. a tool that reads a PGN file and parses the moves in SAN notation checking for legality. Note that this means to make the move, this is a needed side effect of converting the move in an internal representation compatible with the move generator.
I would like to load very big files of hundred of MB in few seconds.
I didn't found anything available, although there are very nice parsers in Python, like the cool python.chess library.
I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
mcostalba wrote:For a new project I am starting to work on I need a batch PGN parser, i.e. a tool that reads a PGN file and parses the moves in SAN notation checking for legality. Note that this means to make the move, this is a needed side effect of converting the move in an internal representation compatible with the move generator.
I would like to load very big files of hundred of MB in few seconds.
I didn't found anything available, although there are very nice parsers in Python, like the cool python.chess library.
I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
The versatile pgn-extract can also check move legality, 104600 games in 28s which is slower than yours.
Run this intentionally adding an illegal move at black's 2nd move, it detected it with some comments.
mcostalba wrote:For a new project I am starting to work on I need a batch PGN parser, i.e. a tool that reads a PGN file and parses the moves in SAN notation checking for legality. Note that this means to make the move, this is a needed side effect of converting the move in an internal representation compatible with the move generator.
I would like to load very big files of hundred of MB in few seconds.
I didn't found anything available, although there are very nice parsers in Python, like the cool python.chess library.
I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
What's the speed of raw reading + parsing, but without move legality validation (i.e. no move generation and copying to internal data structures) for you? I have a raw parser that does this file in 11 seconds, so if it's less than 50% to validate...
mcostalba wrote:For a new project I am starting to work on I need a batch PGN parser, i.e. a tool that reads a PGN file and parses the moves in SAN notation checking for legality. Note that this means to make the move, this is a needed side effect of converting the move in an internal representation compatible with the move generator.
I would like to load very big files of hundred of MB in few seconds.
I didn't found anything available, although there are very nice parsers in Python, like the cool python.chess library.
I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
I wrote Brainfish bin book parser based on SF's move generator. It is kind of reversed process of what you are doing. In the process I am generating all the legal moves for each position not just checking legality for single moves and I get around 80Kpositions/second checked and write around 15Kpgn games/second. All performance single core.
I guess the quickest for you is just to use position function from SF. It will give you an idea about maximum speed even though without full legality check.
Rein Halbersma wrote: What's the speed of raw reading + parsing, but without move legality validation (i.e. no move generation and copying to internal data structures) for you? I have a raw parser that does this file in 11 seconds, so if it's less than 50% to validate...
mcostalba wrote:
I don't have a very fast PC, but I think this is quite fast. I would be interested in hearing from people that went though this already or that can suggest something alternative that perhaps I have missed. Thanks.
On my pc (q6600 + SSD) Scid (scid.sourceforge.net) takes 10 seconds for a file of similar size.
To run it without a GUI it's necessary a custom script parser.tcl:
Oh I misread the opening post in my hurry. First I see it takes a little more than 22 seconds, and in that case my parser is probably much faster, although, secondly, I generate pseudo legal moves only and don't make/unmake to check legality. However the move genertator can take a LegalOnly flag and then it makes/unmakes in some edge cases only, so probably still fast enough. I'm also pretty sure that SF move generator is quite a bit faster than mine.