Tool for splitting white and black moves into diff. files?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Tool for splitting white and black moves into diff. files?

Post by Guenther »

Does such a tool exist? I would like to do some stats on a test sample, which needs the moves separated per player for better usage.
The stats will be done on eval/depth/time info, thus it should keep those of course.

Guenther
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Tool for splitting white and black moves into diff. file

Post by Dann Corbit »

You can turn a PGN game into EPD records.
For instance, pgn2fen can do this.
Like this:
pgn2fen pgnfile.pgn -e -l > epdfile.epd

Then you can easily pick out the moves by color by filtering on the records.
You could use grep or sed or python or whatever you are comfortable with.
For instance:

grep " w " epdfile.epd > white.epd
grep " b " epdfile.epd > black.epd
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Tool for splitting white and black moves into diff. file

Post by Guenther »

Dann Corbit wrote:You can turn a PGN game into EPD records.
For instance, pgn2fen can do this.
Like this:
pgn2fen pgnfile.pgn -e -l > epdfile.epd

Then you can easily pick out the moves by color by filtering on the records.
You could use grep or sed or python or whatever you are comfortable with.
For instance:

grep " w " epdfile.epd > white.epd
grep " b " epdfile.epd > black.epd
Hi Dann, this won't help me, because I am practically only interested in score/depth/time info for each move to follow the right player.
The move itself is not interesting for my stats, just the move number.

I think Ferdinand once showed an example which had such an output,
but this was in preparation for a tool which is not released yet.
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Tool for splitting white and black moves into diff. file

Post by Ajedrecista »

Hello Guenther:
Guenther wrote:Hi Dann, this won't help me, because I am practically only interested in score/depth/time info for each move to follow the right player.
The move itself is not interesting for my stats, just the move number.

I think Ferdinand once showed an example which had such an output,
but this was in preparation for a tool which is not released yet.
Maybe regex? I have tried a made-up example with Notepad++. If you have a PGN of one game, you can get rid of tags with:

Code: Select all

Search mode: Regular expression

Find:
^\[.*

Replace:
$1
Then, if the info is between curly brackets, you can delete anything outside {} with:

Code: Select all

Search mode: Regular expression

Find:
.*?(\{.*?\}).*?

Replace:
$1
Then delete the blank lines with:

Code: Select all

Search mode: Extended

Find:
\r

Replace:

(Replace with nothing).
Then, making one line with each {}:

Code: Select all

Search mode: Extended

Find:
{

Replace:
\r\n{
Deleting the first blank line and then only keep the odd lines (white moves) or even lines (black moves) to different files:

Delete every other line in notepad++
Jasper wrote:Open the replace menu, fill in ([^\n]*\n)[^\n]*\n in the "Find what" box and $1 in the "Replace with" box. Then select regular expression for the search mode, click replace all and every second line is deleted.

You can build similar regexes if you want to do something similar. For example, (([^\n]*\n){a})[^\n]*\n will replace every nth line if you replace a by n - 1 and [^\n]*\n([^\n]*\n) will let you keep even lines instead of odd ones.
You can delete { and } later. Info will be splitted between white side and black side, but score-depth-time info will be together.

I know it is too much work for only one game. I said one game and not multiple games because the method will work for the first game, but it can end with white move or black move and it can not be assure that every odd line is a white move for the rest of the games.

Furthermore, I have not tested if { starts in a line and } starts in the next one... and I delete move numbers, but in one game, it could be parsed somehow with the line number.

I know, too much complications for just one game. There must be simpler solutions. Good luck!

Regards from Spain.

Ajedrecista.
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Tool for splitting white and black moves into diff. file

Post by Guenther »

Ajedrecista wrote:Hello Guenther:
Guenther wrote:Hi Dann, this won't help me, because I am practically only interested in score/depth/time info for each move to follow the right player.
The move itself is not interesting for my stats, just the move number.

I think Ferdinand once showed an example which had such an output,
but this was in preparation for a tool which is not released yet.
Maybe regex? I have tried a made-up example with Notepad++. If you have a PGN of one game, you can get rid of tags with:

...

I know, too much complications for just one game. There must be simpler solutions. Good luck!

Regards from Spain.

Ajedrecista.
Thanks for the interest Jesus! I found a good workaround which makes it easier and allows to omit a few regex steps by preprocessing the pgn file in Scid.
There are several options for pgn filter export e.g. adding ... + move no. for Black moves and 2 column output.
After that it was quite easy to acchieve my (current) goal. (BTW I always use an old version UltraEdit for regex operations - it's quite good in loading big files and has a lot of options)

e.g.
outW.pgn

Code: Select all

[Event "asmFish 191017 4CPU 40/4 Gauntlet"];
[Site "Dual X5670"];
[Date "2017.10.29"];
[Round "1"];
[White "asmFish 191017 64-bit 4CPU"];
[Black "Andscacs 0.92 64-bit 4CPU"];
[Result "1-0"];
[ECO "A35"];
[Opening "English"];
[PlyCount "138"];
[Termination "adjudication"];
[TimeControl "40/120"];
[Variation "Symmetrical Variation"];
1.{book};
2.{book};
3.{book};
4.{book};
5.{book};
6.{book};
7.{book};
8.{book};
9.{book};
10.{-0.23/25 13s};
11.{-0.30/26 4.2s};
12.{-0.30/24 0.48s};
13.{-0.24/27 3.3s};
14.{-0.26/24 3.6s};
15.{-0.26/25 6.6s};
16.{-0.25/26 5.1s};
17.{-0.21/25 0.51s};
18.{-0.25/27 4.5s};
19.{-0.23/22 0.45s};
20.{-0.16/26 4.1s};
21.{-0.10/27 6.8s};
22.{-0.22/29 9.5s};
23.{-0.23/25 0.52s};
24.{-0.13/25 1.6s};
25.{-0.08/25 2.3s};
26.{-0.08/24 1.3s};
27.{-0.02/25 1.9s};
28.{-0.02/25 4.1s};
29.{0.00/27 2.3s};
30.{0.00/28 2.3s};
31.{0.00/27 0.75s};
32.{0.00/28 2.9s};
33.{0.00/30 2.0s};
34.{0.00/33 3.5s};
35.{0.00/32 0.97s};
36.{0.00/34 3.2s};
37.{+0.19/26 2.9s};
38.{+0.13/28 1.3s};
39.{+0.08/34 7.7s};
40.{+0.08/34 3.6s};
41.{+0.24/23 2.2s};
42.{+0.32/27 6.2s};
43.{+0.18/29 6.5s};
44.{+0.50/29 3.5s};
45.{+0.48/26 0.61s};
46.{+0.58/31 7.3s};
47.{+0.75/27 3.3s};
48.{+0.96/25 5.5s};
49.{+1.17/24 2.1s};
50.{+2.64/27 4.0s};
51.{+2.31/30 5.7s};
52.{+2.86/30 3.4s};
53.{+3.03/28 1.2s};
54.{+3.06/31 2.3s};
55.{+4.38/30 7.0s};
56.{+4.52/24 0.67s};
57.{+4.87/30 2.1s};
58.{+4.95/29 0.83s};
59.{+6.61/28 1.7s};
60.{+60.07/24 1.8s};
61.{+132.65/35 2.0s};
62.{+132.68/40 4.2s};
63.{+6.07/17 0.65s};
64.{+132.70/43 2.3s};
65.{+132.73/46 10s};
66.{+132.74/48 4.6s};
67.{+132.75/52 1.5s};
68.{+M19/48 2.0s};
69.{+M17/55 0.75s};
{White wins by adjudication};
1-0
outB.pgn

Code: Select all

[Event "asmFish 191017 4CPU 40/4 Gauntlet"];
[Site "Dual X5670"];
[Date "2017.10.29"];
[Round "1"];
[White "asmFish 191017 64-bit 4CPU"];
[Black "Andscacs 0.92 64-bit 4CPU"];
[Result "1-0"];
[ECO "A35"];
[Opening "English"];
[PlyCount "138"];
[Termination "adjudication"];
[TimeControl "40/120"];
[Variation "Symmetrical Variation"];
1...{book};
2...{book};
3...{book};
4...{book};
5...{book};
6...{book};
7...{book};
8...{book};
9...{book};
10...{+0.20/21 3.1s};
11...{+0.09/21 4.4s};
12...{+0.27/19 2.6s};
13...{+0.35/19 5.0s};
14...{+0.30/21 2.9s};
15...{+0.30/21 2.7s};
16...{+0.43/22 3.3s};
17...{+0.41/23 3.4s};
18...{+0.39/22 2.9s};
19...{+0.47/22 3.5s};
20...{+0.44/24 3.4s};
21...{+0.39/21 3.8s};
22...{+0.51/21 4.2s};
23...{+0.43/22 7.6s};
24...{+0.42/22 2.8s};
25...{+0.29/22 3.3s};
26...{+0.30/22 2.6s};
27...{+0.37/21 3.5s};
28...{+0.33/20 2.6s};
29...{+0.32/20 3.4s};
30...{+0.12/20 2.8s};
31...{+0.34/21 4.8s};
32...{+0.01/26 4.5s};
33...{+0.01/28 4.1s};
34...{+0.01/29 2.4s};
35...{-0.01/26 2.0s};
36...{+0.01/25 4.7s};
37...{0.00/26 4.6s};
38...{-0.01/26 2.3s};
39...{-0.01/28 4.1s};
40...{-0.01/28 6.1s};
41...{-0.29/28 3.0s};
42...{-0.26/26 1.9s};
43...{0.00/26 3.5s};
44...{-0.25/25 3.5s};
45...{-0.39/25 3.1s};
46...{-0.69/24 3.6s};
47...{-0.80/24 3.6s};
48...{-0.88/24 3.8s};
49...{-0.87/24 3.6s};
50...{-1.60/21 3.6s};
51...{-1.87/25 2.1s};
52...{-1.95/25 3.9s};
53...{-1.89/27 3.7s};
54...{-2.22/26 3.5s};
55...{+0.01/33 2.3s};
56...{-0.01/35 2.4s};
57...{-2.21/25 3.7s};
58...{-2.79/26 3.3s};
59...{-4.72/23 3.1s};
60...{-5.36/27 2.9s};
61...{-6.16/25 2.9s};
62...{-7.43/23 3.0s};
63...{-9.15/23 2.9s};
64...{-8.23/24 2.9s};
65...{-11.12/24 3.5s};
66...{-81.59/25 3.6s};
67...{-81.24/22 3.3s};
68...{-M18/30 3.1s};
69...{-M16/28 2.8s};
{White wins by adjudication};
1-0
https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Tool for splitting white and black moves into diff. file

Post by Ajedrecista »

Hello again:
Guenther wrote:[...]

After that it was quite easy to acchieve my (current) goal.
Great that you managed that!
Guenther wrote:(BTW I always use an old version UltraEdit for regex operations - it's quite good in loading big files and has a lot of options)
Just for your information, I use a programme called Find And Replacement Text which is also very fast in normal replacements:

http://fart-it.sourceforge.net/

https://sourceforge.net/projects/fart-it/

https://sourceforge.net/projects/fart-i ... t/download

Code: Select all

Find And Replace Text  v1.99b                         by Lionello Lunesu

Usage&#58; FART &#91;options&#93; &#91;--&#93; <wildcard>&#91;,...&#93; &#91;find_string&#93; &#91;replace_string&#93;

Options&#58;
 -h, --help          Show this help message &#40;ignores other options&#41;
 -q, --quiet         Suppress output to stdio / stderr
 -V, --verbose       Show more information
 -r, --recursive     Process sub-folders recursively
 -c, --count         Only show filenames, match counts and totals
 -i, --ignore-case   Case insensitive text comparison
 -v, --invert        Print lines NOT containing the find string
 -n, --line-number   Print line number before each line &#40;1-based&#41;
 -w, --word          Match whole word &#40;uses C syntax, like grep&#41;
 -f, --filename      Find &#40;and replace&#41; filename instead of contents
 -B, --binary        Also search &#40;and replace&#41; in binary files &#40;CAUTION&#41;
 -C, --c-style       Allow C-style extended characters (\xFF\0\t\n\r\\ etc.)
     --cvs           Skip cvs dirs; execute "cvs edit" before changing files
     --svn           Skip svn dirs
     --remove        Remove all occurences of the find_string
 -a, --adapt         Adapt the case of replace_string to found string
 -b, --backup        Make a backup of each changed file
 -p, --preview       Do not change the files but print the changes
It is used from the command prompt and I have managed to do something like 40M replacements in let us say 30 seconds at most in TXT files of 0.5 GB. However, I do not see an option for regular expressions. Anyway, I post it here just in case it would be useful for someone.

Regards from Spain.

Ajedrecista.