I wrote 2 new PGN utility programs for my collection named "40H". It now has 32 programs. A download link and the readme.txt are available at:
http://www.hoflink.com/~npollock/chess.html
The first new program is called "nameFix" which changes the format of player name data to bring it into conformity with PGN Standards (article 8.1.1). Basically it inserts a space after the comma if needed, and it also puts a period after each initial. By standardizing the format of player names, a sorted list will not have abnormalities like "Jones, Mary" listing before "Jones,Alice" due to the space character.
The second new program is called "nameSimilar" and what it does is list those player names that share the same surname. This is helpful in locating players who have more than one name variation in the PGN file.
Added 2 more PGN utilities
Moderators: hgm, Rebel, chrisw
-
- Posts: 1056
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
-
- Posts: 918
- Joined: Mon Jan 05, 2009 7:40 pm
- Location: Germany
- Full name: Engin Üstün
Re: Added 2 more PGN utilities
hello Norm !,
i there one util that clean all comments and variants?
also correct the move numbers "21.Nf3" to "21. Nf3" (insert a space between numbers after dot "." ?
i need a util that clean PGN to only stay main moves, and maybe for FRC PGNs to generate opening books easy.
my Tornado have some problems with comments and variants.
where can i find a cleaning utility?
regards,
Engin
i there one util that clean all comments and variants?
also correct the move numbers "21.Nf3" to "21. Nf3" (insert a space between numbers after dot "." ?
i need a util that clean PGN to only stay main moves, and maybe for FRC PGNs to generate opening books easy.
my Tornado have some problems with comments and variants.
where can i find a cleaning utility?
regards,
Engin
-
- Posts: 1056
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Added 2 more PGN utilities
Try "trim" in the "40H" package.Engin wrote:hello Norm !,
i there one util that clean all comments and variants?
also correct the move numbers "21.Nf3" to "21. Nf3" (insert a space between numbers after dot "." ?
i need a util that clean PGN to only stay main moves, and maybe for FRC PGNs to generate opening books easy.
my Tornado have some problems with comments and variants.
where can i find a cleaning utility?
regards,
Engin
-
- Posts: 918
- Joined: Mon Jan 05, 2009 7:40 pm
- Location: Germany
- Full name: Engin Üstün
Re: Added 2 more PGN utilities
thats exactly what i want, thank you very muchNorm Pollock wrote:Try "trim" in the "40H" package.Engin wrote:hello Norm !,
i there one util that clean all comments and variants?
also correct the move numbers "21.Nf3" to "21. Nf3" (insert a space between numbers after dot "." ?
i need a util that clean PGN to only stay main moves, and maybe for FRC PGNs to generate opening books easy.
my Tornado have some problems with comments and variants.
where can i find a cleaning utility?
regards,
Engin
-
- Posts: 492
- Joined: Sun Mar 19, 2006 4:12 am
Re: Added 2 more PGN utilities
Hi Norm! There is one function that I am missing. I thought you might like to do it, or suggest the program that does it, if it already exists.
The problem is: To take a given PGN, and divide it into a number of smaller PGN files, each containing only games of particular pair of opponents. Suppose you have an input PGN for a round-robin of four participants, A, B and C and D. The result of running this program would be six PGN files, one with only games A-B, another with only games A-C, then A-D, B-C, B-D, and C-D.
The names of the output files could include the engine names, or just be numbered, does not matter to me. If the number of the output files would be too large, say, more than 100, the program could issue a warning before going on.
This would simplify the work resolving various issues with the CCRL game database. Thanks for considering!
(I know that similar effect could be achieved by extracting the names from a PGN, and then extracting games for particular name. But sometings I need just this function and faster way to do it would be much helpful).
Best,
Kirill
The problem is: To take a given PGN, and divide it into a number of smaller PGN files, each containing only games of particular pair of opponents. Suppose you have an input PGN for a round-robin of four participants, A, B and C and D. The result of running this program would be six PGN files, one with only games A-B, another with only games A-C, then A-D, B-C, B-D, and C-D.
The names of the output files could include the engine names, or just be numbered, does not matter to me. If the number of the output files would be too large, say, more than 100, the program could issue a warning before going on.
This would simplify the work resolving various issues with the CCRL game database. Thanks for considering!
(I know that similar effect could be achieved by extracting the names from a PGN, and then extracting games for particular name. But sometings I need just this function and faster way to do it would be much helpful).
Best,
Kirill
-
- Posts: 1056
- Joined: Thu Mar 09, 2006 4:15 pm
- Location: Long Island, NY, USA
Re: Added 2 more PGN utilities
Hi Kirill,Kirill Kryukov wrote:Hi Norm! There is one function that I am missing. I thought you might like to do it, or suggest the program that does it, if it already exists.
The problem is: To take a given PGN, and divide it into a number of smaller PGN files, each containing only games of particular pair of opponents. Suppose you have an input PGN for a round-robin of four participants, A, B and C and D. The result of running this program would be six PGN files, one with only games A-B, another with only games A-C, then A-D, B-C, B-D, and C-D.
The names of the output files could include the engine names, or just be numbered, does not matter to me. If the number of the output files would be too large, say, more than 100, the program could issue a warning before going on.
This would simplify the work resolving various issues with the CCRL game database. Thanks for considering!
(I know that similar effect could be achieved by extracting the names from a PGN, and then extracting games for particular name. But sometings I need just this function and faster way to do it would be much helpful).
Best,
Kirill
You could try "pairExtract" in the "40H" package. It doesn't do all you want, but it might be an improvement. First you specify a pair of engines by writing the 2 engine names, one per line. in a text file named "pairs", then run the command:
pairExtract pairs inputfile.pgn
"pairExtract" will extract all the games in that pairing. You would have to create a "pairs" text file for each pairing, and then rename each output file.
The exact program you are asking for would take some time to develop. It would first have to name and open a large number of output files, then read the input file and drop each game into the appropriate output file.
I also don't know if a large number of output files (over a 100) in each execution would be damaging to the hard drive, even if each file is small.
-Norm
-
- Posts: 492
- Joined: Sun Mar 19, 2006 4:12 am
Re: Added 2 more PGN utilities
Thanks, yes I am aware of this function. Very useful indeed!Norm Pollock wrote:Hi Kirill,Kirill Kryukov wrote:Hi Norm! There is one function that I am missing. I thought you might like to do it, or suggest the program that does it, if it already exists.
The problem is: To take a given PGN, and divide it into a number of smaller PGN files, each containing only games of particular pair of opponents. Suppose you have an input PGN for a round-robin of four participants, A, B and C and D. The result of running this program would be six PGN files, one with only games A-B, another with only games A-C, then A-D, B-C, B-D, and C-D.
The names of the output files could include the engine names, or just be numbered, does not matter to me. If the number of the output files would be too large, say, more than 100, the program could issue a warning before going on.
This would simplify the work resolving various issues with the CCRL game database. Thanks for considering!
(I know that similar effect could be achieved by extracting the names from a PGN, and then extracting games for particular name. But sometings I need just this function and faster way to do it would be much helpful).
Best,
Kirill
You could try "pairExtract" in the "40H" package. It doesn't do all you want, but it might be an improvement. First you specify a pair of engines by writing the 2 engine names, one per line. in a text file named "pairs", then run the command:
pairExtract pairs inputfile.pgn
"pairExtract" will extract all the games in that pairing. You would have to create a "pairs" text file for each pairing, and then rename each output file.
I would do it by reading the whole PGN into a memory buffer, then extracting the list of pairs, then going through the buffer as many times as necessary, each time creating one output file. So only one output file needs to be open at every single moment.Norm Pollock wrote:The exact program you are asking for would take some time to develop. It would first have to name and open a large number of output files, then read the input file and drop each game into the appropriate output file.
The only limitation would be with very large PGN files that you can't cache into a memory buffer. Failing for those files would be totally fine with me as I don't plan to process 2 GB PGN files. Although a perfectionist could implement processing such files part by part, appending the result to the output files.
Over a 1000 files in one directory could result in slowdown of some directory operations, but I don't think hard drive would be in danger. A warning, or a [Y/N] confirmation would help in such cases (though in ideal case there should be a command line option to disable the warning, for batch execution).Norm Pollock wrote:I also don't know if a large number of output files (over a 100) in each execution would be damaging to the hard drive, even if each file is small.
-Norm
Anyway, I know it would take some effort to do. Thanks for thinking about it!
Best,
Kirill