Hello, I have a pgn file of games truncated to 10 moves. I want to remove duplicate positions. How is the easiest way to do this? I have Norm Polluck's programs but seem to be stuck. I have used Chess Assistant to save the pgn file as an epd file. Will this work with Norms "Single" EPD command to remove the duplicates?
Many thanks in advance,
Gerald
Removing duplicate position from a pgn file
Moderators: hgm, Rebel, chrisw
-
- Posts: 148
- Joined: Thu Nov 19, 2009 4:58 pm
- Location: College Station, Texas
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: Removing duplicate position from a pgn file
Hi Gerald,Amstaff wrote:Hello, I have a pgn file of games truncated to 10 moves. I want to remove duplicate positions. How is the easiest way to do this? I have Norm Polluck's programs but seem to be stuck. I have used Chess Assistant to save the pgn file as an epd file. Will this work with Norms "Single" EPD command to remove the duplicates?
Many thanks in advance,
Gerald
I have had to figure out ways to do this. I know of more than one way, depending on the size of the file and if the resulting file needs to be in pgn or epd format. For a situation were the resulting file can be in epd format, Norm's epdSingle utility works great.
Adam
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: Removing duplicate position from a pgn file
To remove duplicate positions from a pgn like Gerald wants to do, try Gaviota. It has a function, enddups, that stores the final positions found in a pgn and removes duplicate positions. It is handy for making pgns of starting positions.
From Gaviota 0.85.1:
enddups <pgn input> <output with no-duplicates> <output duplicates> <megabytes>
Divides a pgn file into two pgn outputs.
The first output contains the games from the input, but with no duplicates.
The second output contains only the duplicates.
Duplicates are determined only by positions at the end.
<megabytes> is the number of megabytes reserved for this process.
The more megabytes, the less chances there is to overlook a duplicate.
Remember to set hash memory to a very low number so you can use
more for this process.
From Gaviota 0.85.1:
enddups <pgn input> <output with no-duplicates> <output duplicates> <megabytes>
Divides a pgn file into two pgn outputs.
The first output contains the games from the input, but with no duplicates.
The second output contains only the duplicates.
Duplicates are determined only by positions at the end.
<megabytes> is the number of megabytes reserved for this process.
The more megabytes, the less chances there is to overlook a duplicate.
Remember to set hash memory to a very low number so you can use
more for this process.
-
- Posts: 1056
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Removing duplicate position from a pgn file
Depending on what you mean by duplicate positions, there are a couple of 40H-EPD tools to do the job. They both work in conjunction with "pgn-extract" by David Barnes.
If you want to get the position after the 20th half-move (ply), the use "epdPly".
pgn-extract -Wepd -s -otemp.epd alpha.pgn
epdPly temp.epd 20
epdList outY.epd
The last line could be
epdList outY.epd bare
if you do not care about en passant or castling rights in the position.
epdPly will extract the positions after the 20th ply, and epdList will tell you which are duplicate, indicating the line number which is equal to the game number.
To find duplicate positions at the end of the game, use epdFin instead of epdPly. Instructions are similar but the output file has a different name.
If you want to get the position after the 20th half-move (ply), the use "epdPly".
pgn-extract -Wepd -s -otemp.epd alpha.pgn
epdPly temp.epd 20
epdList outY.epd
The last line could be
epdList outY.epd bare
if you do not care about en passant or castling rights in the position.
epdPly will extract the positions after the 20th ply, and epdList will tell you which are duplicate, indicating the line number which is equal to the game number.
To find duplicate positions at the end of the game, use epdFin instead of epdPly. Instructions are similar but the output file has a different name.