Hello,
I'm preparing a collection of games truncated to 3 moves for testing purposes. I used Norm Pollock PGN collections (thanks Norm).
I plan to check positions balance, but by now, I have a PGN with duplicated final positions (sometimes same games, sometimes different move order), so need a utility to delete games with same final position.
EPD collection is not valid because first 3 moves would be lost, and I'd like complete games.
I try pgnscanner, pgn-extract, CDB ... but i don't get satisfactory results.
If there is no utility to delete PGN games with the same final position, maybe someone could easily make a script or something to do it?
Any hint will be highly appreciated!
Utility to delete PGN games with same final position?
Moderator: Ras
-
- Posts: 812
- Joined: Tue Jun 16, 2009 10:09 am
- Location: Spain
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Utility to delete PGN games with same final position?
I don't know what PGN utilities are already available, but if you have a library that can read PGN, you could write the utility yourself:
(1) It needs to read games from the PGN file one by one (keep the block of text for the game around in a buffer, in case you need to write it to the output file)
(2) Use the board representation from your own chess engine (or someone else's). Set up the initial position, and make the first 3 moves in the PGN. (3) After 3 moves, store the Zobrist key of the position in a big hash table. If it was already in the hash table, its a duplicate so you can move on to the next game. If it was not already in the hash table, write out a copy of the game into the output PGN file.
Parsing PGN properly is the hardest part, but someone probably has a library that can do that. The best case would be if you have source for an engine that can read PGN games and continue them. You could use its board, its Zobrist hash and its PGN reading stuff.
(1) It needs to read games from the PGN file one by one (keep the block of text for the game around in a buffer, in case you need to write it to the output file)
(2) Use the board representation from your own chess engine (or someone else's). Set up the initial position, and make the first 3 moves in the PGN. (3) After 3 moves, store the Zobrist key of the position in a big hash table. If it was already in the hash table, its a duplicate so you can move on to the next game. If it was not already in the hash table, write out a copy of the game into the output PGN file.
Parsing PGN properly is the hardest part, but someone probably has a library that can do that. The best case would be if you have source for an engine that can read PGN games and continue them. You could use its board, its Zobrist hash and its PGN reading stuff.
-
- Posts: 317
- Joined: Mon Jun 26, 2006 9:44 am
Re: Utility to delete PGN games with same final position?
If in one game, the game ends with the same KQK mating position as prior game, that game you want to filter out. If another game ending with a unique KQK mate occurs, that game you want to keep. I'm curious as to how such a distinction can be useful?
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Utility to delete PGN games with same final position?
He wants to have starting positions for games. In fact, the first three moves.rjgibert wrote:If in one game, the game ends with the same KQK mating position as prior game, that game you want to filter out. If another game ending with a unique KQK mate occurs, that game you want to keep. I'm curious as to how such a distinction can be useful?
Finding game duplications do not work well enough. You may end up with different transpositions to the same starting position.
I need this tool too so I may eventually write it myself, if no one finds a solutions.
Miguel
-
- Posts: 317
- Joined: Mon Jun 26, 2006 9:44 am
Re: Utility to delete PGN games with same final position?
I would aim for a unique move 40 position or last position if game is shorter. This would save going to the end of the game and be more reliable. There are opening variations that go past move 30, so a smaller number might not be enough. Using last position of game will filter out a lot of games you want to keep.
EDIT: This isn't enough either. Sometimes a game will have a position repetition inserted. The problem will require a bit more thought.
EDIT: This isn't enough either. Sometimes a game will have a position repetition inserted. The problem will require a bit more thought.
Last edited by rjgibert on Thu Jun 10, 2010 7:53 am, edited 1 time in total.
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Utility to delete PGN games with same final position?
Sorry, I was not clear. The algorithm isrjgibert wrote:I would aim for a unique move 40 position or last position if game is shorter. This would save going to the end of the game and be more reliable. There are opening variations that go past move 30, so a smaller number might not be enough. Using last position of game will filter out a lot of games you want to keep.
1) Get a pgn collection
2) truncate all games to <n> plies with pgnextract.
3) discard games that end up in the same position (retain only one). In this case, it is the position after <n> plies, because after that, the moves have been deleted.
That is what I meant and I am almost sure that is what Aser needs.
Miguel
PS: In my case, for 2) I am trying to build an opening book for my engine, get two instances , make them play to each other, and make one of them to resign after <n> plies. The difference is that the games will be chosen if the moves are statistically sound.
-
- Posts: 317
- Joined: Mon Jun 26, 2006 9:44 am
Re: Utility to delete PGN games with same final position?
See my EDIT of my prior post.
-
- Posts: 317
- Joined: Mon Jun 26, 2006 9:44 am
Re: Utility to delete PGN games with same final position?
In 3), you need to have a zobrist hash table of all the end positions and use this to compare the zobrist hash key of all the positions of the candidate game after <m> moves to <n>. This will filter otherwise identical games that differ by the insertion of a repetition. It will also filter games that reach the same position via a longer non-repeating sequence. IOW, you need to check a range of moves of the candidate game.michiguel wrote:Sorry, I was not clear. The algorithm isrjgibert wrote:I would aim for a unique move 40 position or last position if game is shorter. This would save going to the end of the game and be more reliable. There are opening variations that go past move 30, so a smaller number might not be enough. Using last position of game will filter out a lot of games you want to keep.
1) Get a pgn collection
2) truncate all games to <n> plies with pgnextract.
3) discard games that end up in the same position (retain only one). In this case, it is the position after <n> plies, because after that, the moves have been deleted.
That is what I meant and I am almost sure that is what Aser needs.
Miguel
PS: In my case, for 2) I am trying to build an opening book for my engine, get two instances , make them play to each other, and make one of them to resign after <n> plies. The difference is that the games will be chosen if the moves are statistically sound.
Some openings offer the option to repeat position. Other openings, like the Scheveshnikov Sicilian can reach the same position by inserting the moves e6 by black and Bf4 by white followed by a later e5 by black and Bg5 by white. A longer sequence that does not include a repetition, but reaches the same position.
With these modifications, it will still not be perfect, but perhaps good enough for practical purposes.
-
- Posts: 812
- Joined: Tue Jun 16, 2009 10:09 am
- Location: Spain
Re: Utility to delete PGN games with same final position?
That is exactly what I'm looking for. Read a PGN collection, search final position in the PGN (or position at ply 6 or X) and keep only one game per position. Doing this manually is a lot of work!michiguel wrote:Sorry, I was not clear. The algorithm isrjgibert wrote:I would aim for a unique move 40 position or last position if game is shorter. This would save going to the end of the game and be more reliable. There are opening variations that go past move 30, so a smaller number might not be enough. Using last position of game will filter out a lot of games you want to keep.
1) Get a pgn collection
2) truncate all games to <n> plies with pgnextract.
3) discard games that end up in the same position (retain only one). In this case, it is the position after <n> plies, because after that, the moves have been deleted.
That is what I meant and I am almost sure that is what Aser needs.
Since a utility that truncates a PGN to a given ply exist, I would expect a utility that delete final duplicates exist too ...

-
- Posts: 812
- Joined: Tue Jun 16, 2009 10:09 am
- Location: Spain
Re: Utility to delete PGN games with same final position?
A partner in CCRL has made a script using well know utilities as pgn-extract, pgnscanner, etc. and it seems that it works well, because Ferdinand Mosca got the same results with different method. Thanks both. PM me if you are interested.
I want to thank too Miguel Ballicora for his efforts in incorporate such a utility in Gaviota.
Best regards.
I want to thank too Miguel Ballicora for his efforts in incorporate such a utility in Gaviota.
Best regards.