Removing duplicate position from a pgn file

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Amstaff
Posts: 148
Joined: Thu Nov 19, 2009 4:58 pm
Location: College Station, Texas

Removing duplicate position from a pgn file

Post by Amstaff »

Hello, I have a pgn file of games truncated to 10 moves. I want to remove duplicate positions. How is the easiest way to do this? I have Norm Polluck's programs but seem to be stuck. I have used Chess Assistant to save the pgn file as an epd file. Will this work with Norms "Single" EPD command to remove the duplicates?
Many thanks in advance,
Gerald
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Removing duplicate position from a pgn file

Post by Adam Hair »

Amstaff wrote:Hello, I have a pgn file of games truncated to 10 moves. I want to remove duplicate positions. How is the easiest way to do this? I have Norm Polluck's programs but seem to be stuck. I have used Chess Assistant to save the pgn file as an epd file. Will this work with Norms "Single" EPD command to remove the duplicates?
Many thanks in advance,
Gerald
Hi Gerald,

I have had to figure out ways to do this. I know of more than one way, depending on the size of the file and if the resulting file needs to be in pgn or epd format. For a situation were the resulting file can be in epd format, Norm's epdSingle utility works great.

Adam
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Removing duplicate position from a pgn file

Post by Adam Hair »

To remove duplicate positions from a pgn like Gerald wants to do, try Gaviota. It has a function, enddups, that stores the final positions found in a pgn and removes duplicate positions. It is handy for making pgns of starting positions.

From Gaviota 0.85.1:

enddups <pgn input> <output with no-duplicates> <output duplicates> <megabytes>
Divides a pgn file into two pgn outputs.
The first output contains the games from the input, but with no duplicates.
The second output contains only the duplicates.
Duplicates are determined only by positions at the end.
<megabytes> is the number of megabytes reserved for this process.
The more megabytes, the less chances there is to overlook a duplicate.
Remember to set hash memory to a very low number so you can use
more for this process.
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Removing duplicate position from a pgn file

Post by Norm Pollock »

Depending on what you mean by duplicate positions, there are a couple of 40H-EPD tools to do the job. They both work in conjunction with "pgn-extract" by David Barnes.

If you want to get the position after the 20th half-move (ply), the use "epdPly".

pgn-extract -Wepd -s -otemp.epd alpha.pgn

epdPly temp.epd 20

epdList outY.epd

The last line could be

epdList outY.epd bare

if you do not care about en passant or castling rights in the position.

epdPly will extract the positions after the 20th ply, and epdList will tell you which are duplicate, indicating the line number which is equal to the game number.

To find duplicate positions at the end of the game, use epdFin instead of epdPly. Instructions are similar but the output file has a different name.