Looking for a tool that find ups and downs in engine's eval

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Carlos777
Posts: 1730
Joined: Sun Dec 13, 2009 6:09 pm

Looking for a tool that find ups and downs in engine's eval

Post by Carlos777 »

Is there a tool that analyzes commented PGN and finds ups and downs in the eval score of a game? For example like ValDrop and ValJump options in PGNScanner, the problem with this useful tool is that it only takes the eval's drops or jumps if both engines think so. I'd like a tool that can do this, checking the eval of a single engine and save in a text file for example all the games and in other text file the positions where the engines' eval ups and downs 2.00 (200 centipawns) or more.

Thanks in advance,

Carlos
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Looking for a tool that find ups and downs in engine's e

Post by Ferdy »

Carlos777 wrote:Is there a tool that analyzes commented PGN and finds ups and downs in the eval score of a game? For example like ValDrop and ValJump options in PGNScanner, the problem with this useful tool is that it only takes the eval's drops or jumps if both engines think so. I'd like a tool that can do this, checking the eval of a single engine and save in a text file for example all the games and in other text file the positions where the engines' eval ups and downs 2.00 (200 centipawns) or more.

Thanks in advance,

Carlos
If the pgn is an output from cutechess-cli or winboard, I can probably make such a tool.
Saving game is normal but saving the fen is more work as I am only using a script, I have to run the pgn2fen tool to get the fen via remembered move number from scanning the move comments and comparing move scores.
Carlos777
Posts: 1730
Joined: Sun Dec 13, 2009 6:09 pm

Re: Looking for a tool that find ups and downs in engine's e

Post by Carlos777 »

Ferdy wrote:
Carlos777 wrote:Is there a tool that analyzes commented PGN and finds ups and downs in the eval score of a game? For example like ValDrop and ValJump options in PGNScanner, the problem with this useful tool is that it only takes the eval's drops or jumps if both engines think so. I'd like a tool that can do this, checking the eval of a single engine and save in a text file for example all the games and in other text file the positions where the engines' eval ups and downs 2.00 (200 centipawns) or more.

Thanks in advance,

Carlos
If the pgn is an output from cutechess-cli or winboard, I can probably make such a tool.
Saving game is normal but saving the fen is more work as I am only using a script, I have to run the pgn2fen tool to get the fen via remembered move number from scanning the move comments and comparing move scores.
Hi Ferdinand,

Thanks for your time. It is fine if the selected games are saved to a pgn.

I usually use winboard, so the output is like:

Code: Select all

[Event "Computer Chess Game"]
[Site "HOME"]
[Date "2015.01.14"]
[Round "7"]
[White "Deuterium v14.3.34.130"]
[Black "Tornado 6"]
[Result "1-0"]
[TimeControl "1200+3"]
[Annotator "9. +0.49   12... -0.67"]

1. d4 d5 2. c4 e6 3. Nc3 Nf6 4. Nf3 c6 5. e3 Nbd7 6. Bd3 dxc4 7. Bxc4 b5 8.
Bd3 Bb7 9. O-O {+0.49/21 1:13} a6 10. e4 {+0.43/21 1:23} b4 11. Na4
{+0.97/22 1:47} Be7 12. e5 {+1.13/20 1:28} Nd5 {-0.67/22 57} 13. Bd2
{+0.79/21 1:05} h5 {-0.77/21 55} 14. Qb3 {+1.02/19 30} a5 {-0.61/20 53} 15.
Rfc1 {+0.90/19 29} h4 {-0.64/23 50} 16. h3 {+0.99/20 29} Kf8 {-0.79/23 48}
17. a3 {+1.16/21 1:14} Kg8 {-1.00/22 46} 18. Be4 {+1.18/21 25} Qf8
{-1.08/22 44} 19. Bg5 {+1.23/20 25} Bxg5 {-1.10/23 42} 20. Nxg5
{+1.14/19 24} Rh5 {-1.23/22 41} 21. Nf3 {+1.60/21 23} Ra6 {-1.26/22 39} 22.
Ne1 {+1.67/19 22} Ba8 {-1.43/20 37} 23. axb4 {+2.36/21 30} Nxb4
{-1.88/23 36} 24. Nd3 {+2.70/21 29} Nxd3 {-2.03/24 34} 25. Qxd3
{+2.61/24 20} Bb7 {-2.02/25 33} 26. Nc5 {+2.60/22 19} Nxc5 27. Rxc5
{+2.63/22 19} Rb6 {-2.04/24 31} 28. Qc3 {+2.62/23 18} g6 {-2.10/25 30} 29.
Raxa5 {+2.59/22 18} Qd8 {-2.02/25 29} 30. b4 {+2.97/24 24} Kg7
{-2.61/26 28} 31. Ra7 {+2.84/23 23} Qb8 {-2.82/23 27} 32. Rca5
{+2.58/22 22} Rh8 {-3.00/25 26} 33. Qf3 {+3.99/24 37} Rf8 {-3.72/24 25} 34.
Qf6+ {+3.85/24 20} Kh7 {-3.81/27 24} 35. Qxh4+ {+3.84/23 14} Kg7
{-3.73/28 23} 36. Qe7 {+3.88/23 13} Kg8 {-3.98/25 22} 37. g3 {+3.83/22 13}
Re8 {-2.74/26 21} 38. Qc5 {+3.82/25 13} Qd8 39. h4 {+3.84/22 12} Kg7
{-2.74/25 20} 40. f4 {+3.87/23 18} Re7 {-2.80/25 19} 41. Kf2 {+3.85/21 12}
Rd7 {-2.92/23 19} 42. Ke3 {+3.54/22 11} Re7 {-2.84/25 18} 43. Bd3
{+4.04/24 15} Rd7 {-3.40/23 17} 44. Bc4 {+4.47/22 11} Re7 {-3.40/23 17} 45.
Ra1 {+4.48/21 14} Rd7 {-3.50/22 16} 46. g4 {+4.66/21 21} Kh7 {-3.46/20 15}
47. h5 {+5.29/20 9} gxh5 {-4.58/22 15} 48. Rh1 {+6.92/22 9} h4
{-4.13/16 1.4} 49. g5 {+7.86/23 9} h3 {-5.93/23 14} 50. Rxh3+ {+9.23/21 9}
Kg7 {-6.67/23 14} 51. Rh4 {+10.16/20 9} Bc8 {-8.40/23 13} 52. Ra1
{+11.63/20 8} Ba6 {-10.33/21 13} 53. Rah1 {+13.32/20 8} Qg8 {-17.49/23 12}
{Xboard adjudication} 1-0
I guess for Fritz games would be more difficult, because it saves them with this format:

Code: Select all

[Event "Fritzgt1501"]
[Site "Microsoft"]
[Date "2015.01.15"]
[Round "1.2"]
[White "Naum 4.6"]
[Black "Deep Fritz 14"]
[Result "1-0"]
[ECO "D48"]
[Annotator "0.38;0.09"]
[PlyCount "81"]
[EventDate "2015.01.15"]
[EventType "simul"]
[TimeControl "1200+3"]

{Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz 3093 MHz  W=15.9 plies; 1,644kN/s;
Perfect2014t.ctg  B=17.8 plies; 1,252kN/s; Perfect2014t.ctg} 1. d4 {[%eval 0,0]
[%emt 0:00:00]} Nf6 {[%eval 0,0] [%emt 0:00:00]} 2. c4 {[%eval 0,0] [%emt 0:00:
00]} e6 {[%eval 0,0] [%emt 0:00:00]} 3. Nf3 {[%eval 0,0] [%emt 0:00:00]} c6 {
[%eval 0,0] [%emt 0:00:00]} 4. Nc3 {[%eval 0,0] [%emt 0:00:00]} d5 {[%eval 0,0]
[%emt 0:00:00]} 5. e3 {[%eval 0,0] [%emt 0:00:00]} Nbd7 {[%eval 0,0] [%emt 0:
00:00]} 6. Bd3 {[%eval 38,18] [%emt 0:00:48]} dxc4 {[%eval 9,21] [%emt 0:00:40]
} 7. Bxc4 {[%eval 38,19] [%emt 0:00:23]} b5 {[%eval 4,21] [%emt 0:00:25] (Bd6)}
8. Bd3 {[%eval 37,17] [%emt 0:00:29]} Bb7 {[%eval 0,21] [%emt 0:01:11] (Bd6)}
9. O-O {[%eval 35,18] [%emt 0:01:21] (Ne4)} a6 {[%eval 0,21] [%emt 0:01:20] 
(Bd6)} 10. e4 {[%eval 18,18] [%emt 0:01:33] (a4)} c5 {[%eval 0,20] [%emt 0:00:
40]} 11. d5 {[%eval 18,19] [%emt 0:00:30]} Bd6 {[%eval 0,19] [%emt 0:01:35] 
(c4)} 12. dxe6 {[%eval 21,15] [%emt 0:00:26]} fxe6 {[%eval 0,0] [%emt 0:00:00]}
13. Bxb5 {[%eval 29,16] [%emt 0:00:15]} Bxh2+ {[%eval 18,19] [%emt 0:00:28]}
14. Nxh2 {[%eval 15,17] [%emt 0:00:37]} axb5 {[%eval 18,0] [%emt 0:00:00]} 15.
Nxb5 {[%eval 15,18] [%emt 0:00:12]} O-O {[%eval 9,21] [%emt 0:00:20]} 16. Nd6 {
[%eval 21,18] [%emt 0:00:11]} Qc7 {[%eval 21,21] [%emt 0:01:43] (Ba6)} 17. Re1
{[%eval 55,17] [%emt 0:00:16]} Bc6 {[%eval 21,20] [%emt 0:00:23] (Rfd8)} 18. e5
{[%eval 45,17] [%emt 0:00:19]} Nd5 {[%eval 22,20] [%emt 0:00:45]} 19. Qh5 {
[%eval 45,17] [%emt 0:00:19] (Qg4)} Ne7 {[%eval 26,18] [%emt 0:00:30]} 20. Re3
{[%eval 51,17] [%emt 0:00:28]} g6 {[%eval 32,18] [%emt 0:00:31]} 21. Qh3 {
[%eval 51,17] [%emt 0:00:09] (Rg3)} Bd5 {[%eval 25,19] [%emt 0:00:20] (Nf5)}
22. Ng4 {[%eval 51,16] [%emt 0:00:21]} Nc6 {[%eval 21,19] [%emt 0:00:48]} 23.
Nh6+ {[%eval 56,14] [%emt 0:00:15]} Kg7 {[%eval 16,19] [%emt 0:01:42]} 24. Ndf7
{[%eval 67,16] [%emt 0:00:33]} Nd4 {[%eval 21,18] [%emt 0:00:25]} 25. Ng5 {
[%eval 75,16] [%emt 0:00:19]} Ra6 {[%eval 55,17] [%emt 0:01:48]} 26. Bd2 {
[%eval 98,13] [%emt 0:00:24]} Qb7 {[%eval 58,16] [%emt 0:00:57] (Qb6)} 27. b3 {
[%eval 101,14] [%emt 0:00:19]} Qc6 {[%eval 88,16] [%emt 0:01:19]} 28. Qh2 {
[%eval 186,14] [%emt 0:00:23] (g4)} Qc7 {[%eval 144,15] [%emt 0:00:43]} 29. Bc3
{[%eval 192,14] [%emt 0:00:16]} Rc6 {[%eval 207,15] [%emt 0:00:58] (Raa8)} 30.
Rae1 {[%eval 404,13] [%emt 0:00:18] (Rd1)} Rh8 {[%eval 338,13] [%emt 0:00:10]}
31. Rh3 {[%eval 443,15] [%emt 0:00:12]} Nf8 {[%eval 464,17] [%emt 0:00:35]} 32.
Ngf7 {[%eval 458,16] [%emt 0:00:07]} Nf5 {[%eval 491,17] [%emt 0:00:30]} 33.
Nd6 {[%eval 573,16] [%emt 0:00:12]} Qd8 {[%eval 491,0] [%emt 0:00:00]} 34.
Nhxf5+ {[%eval 601,14] [%emt 0:00:11] (Nhf7)} gxf5 {[%eval 500,16] [%emt 0:00:
03]} 35. Nxf5+ {[%eval 604,15] [%emt 0:00:12]} Kf7 {[%eval 481,0] [%emt 0:00:
00]} 36. Nd6+ {[%eval 612,15] [%emt 0:01:09] (Qf4)} Kg8 {[%eval 648,15] [%emt
0:00:08]} 37. Rg3+ {[%eval 675,15] [%emt 0:00:22]} Ng6 {[%eval 648,0] [%emt 0:
00:00]} 38. f4 {[%eval 778,14] [%emt 0:00:37] (Bd2)} Qb6 {[%eval 679,14] [%emt
0:00:14] (Rc7)} 39. f5 {[%eval 795,13] [%emt 0:00:38]} Rc7 {[%eval 712,15] 
[%emt 0:00:08]} 40. Qh6 {[%eval 1089,16] [%emt 0:00:16] (fxg6)} Rg7 {[%eval
1067,13] [%emt 0:00:08] (c4+)} 41. Ne8 {[%eval 1382,16] [%emt 0:00:10] (f6)}
1-0
Maybe, is there a tool to convert that format into Winboard's? So, before running the script, all the games in the PGN have the same type of output.
Carlos777
Posts: 1730
Joined: Sun Dec 13, 2009 6:09 pm

Re: Looking for a tool that find ups and downs in engine's e

Post by Carlos777 »

I forgot to say that an eval range should be established before running the script. For example:

lowest limit: -2.00
highest limit: 3.00

I want to use this tool to spot engine's mistakes that make them lose the game. If, for example, the variation of eval goes from 4.00 to 7.00 is not interesting, because the game is most probably won.
User avatar
Steve Maughan
Posts: 1221
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: Looking for a tool that find ups and downs in engine's e

Post by Steve Maughan »

This could probably be coded easily using this Python chess library.

https://pypi.python.org/pypi/python-chess
http://www.chessprogramming.net - Maverick Chess Engine
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Looking for a tool that find ups and downs in engine's e

Post by Ferdy »

Winboard output can be used by Game analyzer gui to easily detect suspect positions.

Fritz output is white POV, and will not work in Game analyzer when not converted.

I will try to make a tool from winboard first. Fritz is possible but will take more time.
Carlos777
Posts: 1730
Joined: Sun Dec 13, 2009 6:09 pm

Re: Looking for a tool that find ups and downs in engine's e

Post by Carlos777 »

Ferdy wrote:Winboard output can be used by Game analyzer gui to easily detect suspect positions.
I did not know of Game analyzer.
Ferdy wrote:Fritz output is white POV, and will not work in Game analyzer when not converted.

I will try to make a tool from winboard first. Fritz is possible but will take more time.
Thanks!
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Looking for a tool that find ups and downs in engine's e

Post by Ferdy »

Carlos777 wrote:I forgot to say that an eval range should be established before running the script. For example:

lowest limit: -2.00
highest limit: 3.00

I want to use this tool to spot engine's mistakes that make them lose the game. If, for example, the variation of eval goes from 4.00 to 7.00 is not interesting, because the game is most probably won.
Started testing this now. When there is blunder in the game for the evaluated engine that game will be recorded in sf5-blunder.pgn for example.

A blunder is considered when the score of the current move is greater than or equal to the score of the next move by a value defined by threshold, meaning the score has dropped. And that the current score should be within the window defined by user. See sample run below too.
Example.
min = -2.0
max = 3.0
th = 1.5
... 18. d4 {1.0/15} 18... Nc5 {1.6/14} 19. Nf7 {-1.5/16}...
white move 18, score 1.0
white move 19, score -1.5
So it has dropped from 1.0 to -1.5, delta = 1 - (-1.5) = 2.5
Since the delta >= th or delta >= 1.5, there must be a blunder somewhere. In this case the game will be saved.

I don't know if you want a rising score or a dropping score. But for blunders, it should be dropping score.

The script can take cutechess-cli and winboard pgn output. Perhaps even arena output but without the pv comment, but I have not tried yet.

Saving to fen is possible but not in this version. For fen to be recorded I need a pgn with comments in every move like
1. e4 {0.2/14} e5 { book} ...
but not
1. e4 e5 2. Nf3 Nf6 3. Nxe5 {0.3/18}...
Output from cutechess-cli is a good candidate to include fen blunder list. But I will not do it right now, that will probably come when I get some time.

The script assumed that the score is side POV.
If you don't have other request I will upload this script.

Sample run.

Code: Select all

found wb.pgn !!

Players:
Sf5
Fire_4_x64
Houdini 4.0 x64
Sf6
Gull 3 x64

num_games 76

input min score in pawn value? -2.0
input max score in pawn value? 3.0
input blunder threshold? 1.5
enable debug (1 or 0)? 1
window [-2.00, 3.00]
blunder threshold 1.50

Evaluating Sf5 ...

Reading game 1 ...
wp Sf5
bp Fire_4_x64
res 1/2-1/2
player Sf5 is found
1. { book }
2. { book }
3. { book }
4. { book }
5. { book }
6. { book }
7. { book }
8. { book }
9. { +0.03 }
white score (0.03) is inside the window [-2.00, 3.00]
10. { +0.06 }
white score (0.06) is inside the window [-2.00, 3.00]

[...]

119. { +0.00 }
white score (0.00) is inside the window [-2.00, 3.00]
120. { +0.00 }
white score (0.00) is inside the window [-2.00, 3.00]

Compare scores and check for blunders
without blunder!! in game 1 for player Sf5

Reading game 2 ...
wp Fire_4_x64
bp Sf5
res 0-1
player Sf5 is found
1... { book }
2... { book }

[...]

64. { -0.65 }
white score (-0.65) is inside the window [-2.00, 3.00]
65. { -0.65 }
white score (-0.65) is inside the window [-2.00, 3.00]
66. { -6.83 }
67. { -6.98 }
68. { -44.17 }
69. { -44.26 }
70. { -44.30 }
71. { -M24/21 }
72. { -44.45 }

Compare scores and check for blunders
65 -0.65, 66 -6.83
with blunder!! in game 69 for player Sf5
actual score drop 6.18, ref. threshold 1.50

Reading game 70 ...
wp Houdini 4.0 x64
bp Sf5
res 0-1
player Sf5 is found
1... { book }
2... { book }
3... { book }

[...]

black score (0.00) is inside the window [-2.00, 3.00]
119... { +0.00 }
black score (0.00) is inside the window [-2.00, 3.00]
120... { +0.00 }
black score (0.00) is inside the window [-2.00, 3.00]

Compare scores and check for blunders
without blunder!! in game 76 for player Gull 3 x64

Done!! t = 306.3s (5.1m)
press enter to exit
Carlos777
Posts: 1730
Joined: Sun Dec 13, 2009 6:09 pm

Re: Looking for a tool that find ups and downs in engine's e

Post by Carlos777 »

Ferdy wrote:A blunder is considered when the score of the current move is greater than or equal to the score of the next move by a value defined by threshold, meaning the score has dropped. And that the current score should be within the window defined by user. See sample run below too.
Example.
min = -2.0
max = 3.0
th = 1.5

Quote:
... 18. d4 {1.0/15} 18... Nc5 {1.6/14} 19. Nf7 {-1.5/16}...

white move 18, score 1.0
white move 19, score -1.5
So it has dropped from 1.0 to -1.5, delta = 1 - (-1.5) = 2.5
Since the delta >= th or delta >= 1.5, there must be a blunder somewhere. In this case the game will be saved.
I tested Game analyzer yesterday, great tool. Although, I found an error in a case when the x+1 move was positive (+2.00) and the x move was negative (-1.00). It calculated the variation as 2.00 - 1.00 = 1.00 instead of 2.00 - (-1.00) = 3.00. So, your program it's going to be an improvement over it in those cases.
Ferdy wrote: I don't know if you want a rising score or a dropping score. But for blunders, it should be dropping score.
Actually, I'd like that the program also include the results of rising scores.
Ferdy wrote: The script assumed that the score is side POV.
If you don't have other request I will upload this script.
This is great.

Besides, including the rising scores results, I'd like to know if you have tested Steffano Gemma's tool (Ligsprecomp) and if it can convert Fritz' output to Winboard's in order that your program could analyze those games too.

Thank you very much for all your work.

Best,
Carlos
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Looking for a tool that find ups and downs in engine's e

Post by Ferdy »

Carlos777 wrote:I'd like to know if you have tested Steffano Gemma's tool (Ligsprecomp) and if it can convert Fritz' output to Winboard's in order that your program could analyze those games too.
I tried 4 games and it seemed to convert right.