Consolidating PGN test suites into an EPD file

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Norm Pollock
Posts: 1079
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Consolidating PGN test suites into an EPD file

Post by Norm Pollock »

I came across 4 opening test suite files in PGN form. It is my understanding that they are each in the Public Domain. Each suite consisted of mini games in PGN format. I decided to consolidate them, remove duplicate ending positions, and put the consolidated positions into a single EPD file.

In an opening test suite only the final position is important because it becomes the starting position for the testing. For that reason, an EPD file is appropriate.

The test suites I used were created by Albert Silver (50 games), Salvo Spitaleri (50 games), Harry Schnapp (220 games) and Sedat Canbaz (200 games). Each file consisted of mini-pgn games of at most 12 moves, but the majority were only 8 moves. I removed 21 duplicated positions, leaving 499 unique positions in the EPD file.

The file is named "4S.epd" and is just 35K. The download link is:

http://www.mediafire.com/?ydjne0znthj

"4S.epd" can be used for engine testing and tournaments where EPD opening suites are accepted, such as in Arena 2.0.1.

I would like to know of any other opening test suites that could be added.
Updated links for 40H Tools and Databases
http://40Hchess.epizy.com
http://nk-qy.info/40h
Ron Murawski
Posts: 397
Joined: Sun Oct 29, 2006 4:38 am
Location: Schenectady, NY

Re: Consolidating PGN test suites into an EPD file

Post by Ron Murawski »

Norm Pollock wrote:I would like to know of any other opening test suites that could be added.
download
http://computer-chess.org/doku.php?id=c ... load:index

info
http://computer-chess.org/doku.php?id=c ... d_contents

The contents of the above download are somewhat dated. I put them up at a time when Dann Corbit's site was down. It hasn't been updated in a while.

Dann Corbit collects many positions, some of them from the opening stage.

Ron
Norm Pollock
Posts: 1079
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Consolidating PGN test suites into an EPD file

Post by Norm Pollock »

I wrote a utility that when used in conjunction with pgn-extract by David Barnes, will enable a user to take a pgn file of games and create an epd file of the final position of each game.

My new utility is "epdFilter.exe" and can be downloaded from

http://www.mediafire.com/?gtuyzrtnzie

Here is how it works:

In a command window:

pgn-extract -s -Wepd filename.pgn > filename.epd

epdfilter filename.epd

the two output files are outC.epd (contains the final position of each game in the original order), and outD.epd (removes duplicate final positions and sorts the positions alphanumerically).

This procedure is only recommended for use on small pgn files. The epd file of a pgn file will be substantially larger in size than the pgn file. For example, a pgn file of 3.4M that I used for testing produced an epd file of 54M. However the final output files outC.epd and outD.epd were each only 0.5M.
Updated links for 40H Tools and Databases
http://40Hchess.epizy.com
http://nk-qy.info/40h
istolacio
Posts: 13
Joined: Thu Aug 31, 2006 12:29 am
Location: Valencia, Spain

Re: Consolidating PGN test suites into an EPD file

Post by istolacio »

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Consolidating PGN test suites into an EPD file

Post by Michael Sherwin »

All positions in the sherwin50.pgn file have all the pieces still on the board and at most one pawn for each side, captured. I tried to have all the most important pawn formations represented. The positions range from really simple to very complicated. Except for a very few positions, most have an incredible (or at least I hope so) variety of play. It was made for engine programmers in mind. I would welcome some feedback on my selections if anyone believes that it can/should be improved.

The 900 (977?) plus positions in EasyWay.pgn are all the columns from the book, "Chess Openings the Easy Way", by Nick DeFirman (spelling?). They are corrected colomns. However, it is intended as a source for people to create their own test sets with. IOW, the ending positions are not all suitable for a test set.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Norm Pollock
Posts: 1079
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Consolidating PGN test suites into an EPD file

Post by Norm Pollock »

I started writing a collection of utility tools for "epd" files. Currently there are 2 tools in the collection. This will be a separate collection from my utility tools for "pgn" files. The download is at:

http://www.hoflink.com/~npollock/chess.html

Here is an excerpt from the readme which explains the 2 tools that are in the collection:

Code: Select all

USAGE INSTRUCTIONS:

=========================(1) epdFilter =============================

"epdFilter" is used in conjunction with the excellent utility
"pgn-Extract" by David Barnes. Together they input a ".pgn" file
and produce an ".epd" file containing the final line (position) of 
each game in the input file.

"pgn-Extract" is free to download. Its current download site is:

  http://www.cs.kent.ac.uk/people/staff/djb/extract.html

First, "pgn-Extract" produces an intermediary ".epd" file, which
contains the positions encountered in the input ".pgn" file. It uses 
one line for each position and it contains a blank line to separate 
each pair of successive games.

Second, "epdFilter" inputs the intermediary ".epd" file and produces
"outC.epd" which contains the final line (position) of each game in
the input file. There are no blank lines within "outC.epd".

Usage:  pgn-extract -Wepd -s alpha.pgn > temp.epd

        epdFilter temp.epd

Output: outC.epd

Comments:

     1. The intermediary file can be extremely large.

=========================(2) epdUnique =============================

"epdUnique" inputs an ".epd" file. It produces "outA.epd" which
contains the input lines (positions) sorted alphanumerically, and
without duplicates.

"outB.epd" is also produced. It contains the duplicate positions that 
were removed.

Blank lines within the input ".epd" file are ignored.

Usage:  epdUnique alpha.epd

Output: outA.epd, outB.epd 

Comments:

     1. The lines (positions) in "outB.epd" may have different 
        comments than their "duplicate" lines in "outA.epd".

====================================================================

".bat" file example: 

pgn2epd.bat :

pgn-extract -Wepd -s %1 > temp.epd
epdFilter temp.epd
epdUnique outC.epd

Usage:

pgn2epd alpha.pgn

====================================================================


Updated links for 40H Tools and Databases
http://40Hchess.epizy.com
http://nk-qy.info/40h
Norm Pollock
Posts: 1079
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Consolidating PGN test suites into an EPD file

Post by Norm Pollock »

I have now written 4 EPD utilities. Here is an excerpt from the overview:

Code: Select all


In this document, lines in an ".epd" file that have the same position 
but different comments are considered to be "duplicate" lines.

1. "epdBest" is used in conjunction with the excellent graphical user 
interface, "Arena 2.0.1" by Martin Blume. The "Automatic Analysis"
feature of "Arena" calculates a suggested "best move" for each line 
(position) in the input ".epd" file, with the use of a user chosen 
computer chess engine. "epdBest" then takes the file produced by 
"Arena" and writes each suggested "best move" into the ".epd" file as
a "bm" comment. All pre-existing comments are erased.

2. "epdFilter" is used in conjunction with the excellent utility
"pgn-Extract" by David Barnes. Together they input a ".pgn" file
and produce an ".epd" file containing the final line (position) of
each game in the input file.

3. "epdList" inputs an ".epd" file. It produces an ".epd" file that
lists the positions in decreasing order of occurrence. Original
comments are removed and are replaced with a comment indicating the
number of occurrences of that position.

4. "epdUnique" inputs an ".epd" file. It produces an ".epd" file that
does not have "duplicate" lines and whose lines are sorted
alphanumerically. It also produces another ".epd" file that contains
the "duplicate" lines that were removed.


Updated links for 40H Tools and Databases
http://40Hchess.epizy.com
http://nk-qy.info/40h
Richard Allbert
Posts: 795
Joined: Wed Jul 19, 2006 9:58 am

Re: Consolidating PGN test suites into an EPD file

Post by Richard Allbert »

Hi Norman,

I have something similar as an internal class in my engine. Pgn file in -> epd file out.

It can also take command line parameters to specify what kind of positions it should write to the output file

e.g Material balance, total pieces, pawns... no capture made for x moves etc etc

If you want, I can send you this.

Richard
Norm Pollock
Posts: 1079
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Consolidating PGN test suites into an EPD file

Post by Norm Pollock »

Richard Allbert wrote:Hi Norman,

I have something similar as an internal class in my engine. Pgn file in -> epd file out.

It can also take command line parameters to specify what kind of positions it should write to the output file

e.g Material balance, total pieces, pawns... no capture made for x moves etc etc

If you want, I can send you this.

Richard
Hi Richard,

From your description, I recommend that you extract the code from your engine and make it into a self-standing "epd" utility. It will be a useful utility.

From what I notice, the pgn-2-epd utilities out there only convert ALL lines from the pgn file. As a result, a 3 Mb pgn file creates a 54 Mb epd file, for example. As I understand your code, only selected lines would be output to the epd file, and that will be parameter based. As a result the output epd file would have a size similar to the input pgn file.

Please let me know if you make this utility. I would be glad to add a reference to it, or whatever. But your code should definitely make for an excellent stand-alone "epd" utility.

One more thing. I'm sure you have a wide range of parameters. But one parameter, outputting the epd for the last line of each game, should definitely be included.

-Norm
Updated links for 40H Tools and Databases
http://40Hchess.epizy.com
http://nk-qy.info/40h
Richard Allbert
Posts: 795
Joined: Wed Jul 19, 2006 9:58 am

Re: Consolidating PGN test suites into an EPD file

Post by Richard Allbert »

Hi Norman,

Ok, I'll look at doing that over the next few days - I'm away next weekend, so it could take a couple of weeks to make a standalone version.

Have a nice weekend

Richard