Expected performance and eval of Komodo 8 and SF 6

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

Thanks to a PGN tool written by Ferdinand Mosca and PGN databases of high quality intermediate time control games (either 960''+9.6'' or 600''+3'') of Andreas Strangmuller and Andrey Chilantiev, I got some numerical results extracted from about 100,000 moves, which in cutechess-cli have the eval tag attached and the outcome of the games.

I wanted to map the eval of the engines to the expected result in the game (equal opponents). I had to trim the PGN, because it became apparent that the mapping for opening/middlegame is different from that in the endgame. So, the games (and their moves) are trimmed to move 40. Endgames maybe will be treated separately later, as Ferdinand upgraded again his PGN tool.

I had an empiric model for the mapping:

Expected Score = (1+tanh[eval/a])/2 for equal opponents.

Here a is an empirical parameter, different for each engine. With that model now, I was unable to fit well the data, and the fitting model has one more parameter:

Expected Score = (1+tanh[eval^b/a])/2 for equal opponents.

Now there are 2 empirical parameters, a and b, but they fit the experimental datapoints VERY well for both Komodo 8 and SF 6.


1/ Komodo 8

Image

The blue line and dots are experimental data. The red line is the fitted model.
The fit is: Expected Score = (1+tanh[eval^1.757/1.313])/2


2/ SF 6

Image

The blue line and dots are experimental data. The red line is the fitted model.
The fit is: Expected Score = (1+tanh[eval^1.4707/1.7645])/2


3/ Comparison of expected scores for Komodo 8 and SF 6

Image

The largest difference seems to be between eval = 1.0 to eval = 2.0. For eval = 1.5 Komodo 8 has an expected score of 96%, SF 6 a score of 88%, so, trice the probability something goes wrong with SF 6 compared to Komodo 8 at eval = 1.5.
Isaac
Posts: 265
Joined: Sat Feb 22, 2014 8:37 pm

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Isaac »

I remember Joseph Koss did a very similar study. He took over 30k games from the CCRL (40 moves in 40 minutes TC) from engines over 3000 elo. He then analyzed the position of each game at move 15, 30, 45, 60 and 75 for SF DD and Houdini 4 at a fixed depth (14 if I remember well). I think he also did that for Komodo 6 but I don't remember exactly.
Here are his results: https://public.bn1302.livefilestore.com ... png?psid=1, https://public.bn1.livefilestore.com/y2 ... png?psid=1.

As we can see, an eval of 1.0 at move 15 yields a higher expected result than at move 30, 45, 60 and 75. This holds for both engines and there is a general trend that a higher eval in later moves yields a lower expected result than at earlier moves in general.

I wish Joseph would post here though, I may have not remember all well.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

Isaac wrote:I remember Joseph Koss did a very similar study. He took over 30k games from the CCRL (40 moves in 40 minutes TC) from engines over 3000 elo. He then analyzed the position of each game at move 15, 30, 45, 60 and 75 for SF DD and Houdini 4 at a fixed depth (14 if I remember well). I think he also did that for Komodo 6 but I don't remember exactly.
Here are his results: https://public.bn1302.livefilestore.com ... png?psid=1, https://public.bn1.livefilestore.com/y2 ... png?psid=1.

As we can see, an eval of 1.0 at move 15 yields a higher expected result than at move 30, 45, 60 and 75. This holds for both engines and there is a general trend that a higher eval in later moves yields a lower expected result than at earlier moves in general.

I wish Joseph would post here though, I may have not remember all well.
Thanks. I see that move 15 is almost identical to move 30, so the trimming to move 40 was pretty lucky one to get a stable result. I don't fully understand the method. The graph seems normalized (0.00 is 50%), but CCRL games are against unequal opponents, and the mapping from unequal opponents to equal opponents seems non-trivial.
User avatar
Ajedrecista
Posts: 1952
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Expected performance and eval of Komodo 8 and SF 6.

Post by Ajedrecista »

Hello Kai:
Kai Laskos wrote:The largest difference seems to be between eval = 1.0 to eval = 2.0. For eval = 1.5 Komodo 8 has an expected score of 96%, SF 6 a score of 88%, so, trice the probability something goes wrong with SF 6 compared to Komodo 8 at eval = 1.5.
Rounding to centipawns, the maximum difference is at eval = 128 cp if I am not wrong. For this eval: µ_K8(eval = 1.28) ~ 91.3% and µ_SF6(eval = 1.28) ~ 83.61% according to your fits, so the difference is around 7.69%. It is curious how far the expected scores are.

@Isaac: thanks for your post.

Regards from Spain.

Ajedrecista.
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Thanks, interesting!

Post by Frank Quisinsky »

Hi Kai,

good work and interesting!
I like such stats!

Best
Frank
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

Isaac wrote:I remember Joseph Koss did a very similar study. He took over 30k games from the CCRL (40 moves in 40 minutes TC) from engines over 3000 elo. He then analyzed the position of each game at move 15, 30, 45, 60 and 75 for SF DD and Houdini 4 at a fixed depth (14 if I remember well). I think he also did that for Komodo 6 but I don't remember exactly.
Here are his results: https://public.bn1302.livefilestore.com ... png?psid=1, https://public.bn1.livefilestore.com/y2 ... png?psid=1.

As we can see, an eval of 1.0 at move 15 yields a higher expected result than at move 30, 45, 60 and 75. This holds for both engines and there is a general trend that a higher eval in later moves yields a lower expected result than at earlier moves in general.

I wish Joseph would post here though, I may have not remember all well.
Now I was curious about the issue, and performed the whole thing again (I plotted only Komodo 8 now) for different stages of the game, moves 15,25,35,50,70. Moves 15,25,35 are almost indistinguishable, then it diverges quite a bit, so endgames are a different matter.
I used a database of Andreas Strangmuller, PGN with 6,000 games at 240''+2.4'' of Komodo 8. I fitted the data-points to the model Expected Performance = (1+tanh[eval^b/a])/2, the fits went very well (similarly well to OP), and the plots are here:

Image

For those interested in fitting values:

a_15 = 1.2360692678117168;
b_15 = 1.39187206217893662;

a_25 = 1.2798656616700659;
b_25 = 1.337710753614952;

a_35 = 1.3216414009384804;
b_35 = 1.303028291611404;

a_50 = 1.6615917027683322;
b_50 = 1.518649656098744;

a_70 = 2.2563139040148186;
b_70 = 1.622861465630123;
nimh
Posts: 46
Joined: Sun Nov 30, 2014 12:06 am

Re: Expected performance and eval of Komodo 8 and SF 6

Post by nimh »

It is obvious that the reason is that the reduced amount of material makes it harder to convert advantage into full point. Could you perform the analysis again for determining the relationship between material and expected scores?

You suggested I use a logistic function instead of centipawns for analyzing the quality of chess games. I think it would be useful to have a some sort of formula to determine expected scores based on material as well.
User avatar
Steve Maughan
Posts: 1218
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Steve Maughan »

Hi Kai,

Good stuff!

If I remember correctly Houdini is tuned to win 80% of the games at +1 pawns. This matches Komodo's score profile almost perfectly.

Steve
http://www.chessprogramming.net - Maverick Chess Engine
aturri
Posts: 85
Joined: Wed Dec 30, 2009 11:35 pm

Re: Expected performance and eval of Komodo 8 and SF 6

Post by aturri »

Really interesting. Thank you very much for your study!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

nimh wrote:It is obvious that the reason is that the reduced amount of material makes it harder to convert advantage into full point. Could you perform the analysis again for determining the relationship between material and expected scores?
Yes, if Ferdinand will write such a script :), I am clumsy at that. I can do only some extrapolation on move number dependency for a and b.
You suggested I use a logistic function instead of centipawns for analyzing the quality of chess games. I think it would be useful to have a some sort of formula to determine expected scores based on material as well.
Logistic still fits somehow the curve, say to move 40, although not very well. Less material (in average after moves 35-40) changes considerably the shape.