Added 2 more PGN utilities

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Added 2 more PGN utilities

Post by Norm Pollock »

I wrote 2 new PGN utility programs for my collection named "40H". It now has 32 programs. A download link and the readme.txt are available at:

http://www.hoflink.com/~npollock/chess.html

The first new program is called "nameFix" which changes the format of player name data to bring it into conformity with PGN Standards (article 8.1.1). Basically it inserts a space after the comma if needed, and it also puts a period after each initial. By standardizing the format of player names, a sorted list will not have abnormalities like "Jones, Mary" listing before "Jones,Alice" due to the space character.

The second new program is called "nameSimilar" and what it does is list those player names that share the same surname. This is helpful in locating players who have more than one name variation in the PGN file.
Engin
Posts: 918
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: Added 2 more PGN utilities

Post by Engin »

hello Norm !,

i there one util that clean all comments and variants?

also correct the move numbers "21.Nf3" to "21. Nf3" (insert a space between numbers after dot "." ?

i need a util that clean PGN to only stay main moves, and maybe for FRC PGNs to generate opening books easy.

my Tornado have some problems with comments and variants.

where can i find a cleaning utility?

regards,
Engin
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Added 2 more PGN utilities

Post by Norm Pollock »

Engin wrote:hello Norm !,

i there one util that clean all comments and variants?

also correct the move numbers "21.Nf3" to "21. Nf3" (insert a space between numbers after dot "." ?

i need a util that clean PGN to only stay main moves, and maybe for FRC PGNs to generate opening books easy.

my Tornado have some problems with comments and variants.

where can i find a cleaning utility?

regards,
Engin
Try "trim" in the "40H" package.
Engin
Posts: 918
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: Added 2 more PGN utilities

Post by Engin »

Norm Pollock wrote:
Engin wrote:hello Norm !,

i there one util that clean all comments and variants?

also correct the move numbers "21.Nf3" to "21. Nf3" (insert a space between numbers after dot "." ?

i need a util that clean PGN to only stay main moves, and maybe for FRC PGNs to generate opening books easy.

my Tornado have some problems with comments and variants.

where can i find a cleaning utility?

regards,
Engin
Try "trim" in the "40H" package.
thats exactly what i want, thank you very much ;-)
User avatar
Kirill Kryukov
Posts: 492
Joined: Sun Mar 19, 2006 4:12 am

Re: Added 2 more PGN utilities

Post by Kirill Kryukov »

Hi Norm! There is one function that I am missing. I thought you might like to do it, or suggest the program that does it, if it already exists.

The problem is: To take a given PGN, and divide it into a number of smaller PGN files, each containing only games of particular pair of opponents. Suppose you have an input PGN for a round-robin of four participants, A, B and C and D. The result of running this program would be six PGN files, one with only games A-B, another with only games A-C, then A-D, B-C, B-D, and C-D.

The names of the output files could include the engine names, or just be numbered, does not matter to me. If the number of the output files would be too large, say, more than 100, the program could issue a warning before going on.

This would simplify the work resolving various issues with the CCRL game database. Thanks for considering!

(I know that similar effect could be achieved by extracting the names from a PGN, and then extracting games for particular name. But sometings I need just this function and faster way to do it would be much helpful).

Best,
Kirill
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Added 2 more PGN utilities

Post by Norm Pollock »

Kirill Kryukov wrote:Hi Norm! There is one function that I am missing. I thought you might like to do it, or suggest the program that does it, if it already exists.

The problem is: To take a given PGN, and divide it into a number of smaller PGN files, each containing only games of particular pair of opponents. Suppose you have an input PGN for a round-robin of four participants, A, B and C and D. The result of running this program would be six PGN files, one with only games A-B, another with only games A-C, then A-D, B-C, B-D, and C-D.

The names of the output files could include the engine names, or just be numbered, does not matter to me. If the number of the output files would be too large, say, more than 100, the program could issue a warning before going on.

This would simplify the work resolving various issues with the CCRL game database. Thanks for considering!

(I know that similar effect could be achieved by extracting the names from a PGN, and then extracting games for particular name. But sometings I need just this function and faster way to do it would be much helpful).

Best,
Kirill
Hi Kirill,

You could try "pairExtract" in the "40H" package. It doesn't do all you want, but it might be an improvement. First you specify a pair of engines by writing the 2 engine names, one per line. in a text file named "pairs", then run the command:

pairExtract pairs inputfile.pgn

"pairExtract" will extract all the games in that pairing. You would have to create a "pairs" text file for each pairing, and then rename each output file.

The exact program you are asking for would take some time to develop. It would first have to name and open a large number of output files, then read the input file and drop each game into the appropriate output file.

I also don't know if a large number of output files (over a 100) in each execution would be damaging to the hard drive, even if each file is small.

-Norm
User avatar
Kirill Kryukov
Posts: 492
Joined: Sun Mar 19, 2006 4:12 am

Re: Added 2 more PGN utilities

Post by Kirill Kryukov »

Norm Pollock wrote:
Kirill Kryukov wrote:Hi Norm! There is one function that I am missing. I thought you might like to do it, or suggest the program that does it, if it already exists.

The problem is: To take a given PGN, and divide it into a number of smaller PGN files, each containing only games of particular pair of opponents. Suppose you have an input PGN for a round-robin of four participants, A, B and C and D. The result of running this program would be six PGN files, one with only games A-B, another with only games A-C, then A-D, B-C, B-D, and C-D.

The names of the output files could include the engine names, or just be numbered, does not matter to me. If the number of the output files would be too large, say, more than 100, the program could issue a warning before going on.

This would simplify the work resolving various issues with the CCRL game database. Thanks for considering!

(I know that similar effect could be achieved by extracting the names from a PGN, and then extracting games for particular name. But sometings I need just this function and faster way to do it would be much helpful).

Best,
Kirill
Hi Kirill,

You could try "pairExtract" in the "40H" package. It doesn't do all you want, but it might be an improvement. First you specify a pair of engines by writing the 2 engine names, one per line. in a text file named "pairs", then run the command:

pairExtract pairs inputfile.pgn

"pairExtract" will extract all the games in that pairing. You would have to create a "pairs" text file for each pairing, and then rename each output file.
Thanks, yes I am aware of this function. Very useful indeed!
Norm Pollock wrote:The exact program you are asking for would take some time to develop. It would first have to name and open a large number of output files, then read the input file and drop each game into the appropriate output file.
I would do it by reading the whole PGN into a memory buffer, then extracting the list of pairs, then going through the buffer as many times as necessary, each time creating one output file. So only one output file needs to be open at every single moment.

The only limitation would be with very large PGN files that you can't cache into a memory buffer. Failing for those files would be totally fine with me as I don't plan to process 2 GB PGN files. Although a perfectionist could implement processing such files part by part, appending the result to the output files.
Norm Pollock wrote:I also don't know if a large number of output files (over a 100) in each execution would be damaging to the hard drive, even if each file is small.

-Norm
Over a 1000 files in one directory could result in slowdown of some directory operations, but I don't think hard drive would be in danger. A warning, or a [Y/N] confirmation would help in such cases (though in ideal case there should be a command line option to disable the warning, for batch execution).

Anyway, I know it would take some effort to do. Thanks for thinking about it!

Best,
Kirill