Expected performance and eval of Komodo 8 and SF 6

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

petero2 wrote: I don't have a lot of blitz games but I do have lots of hyper-bullet (1s+0.08s/move) games: Here are 37100 such games: http://dl.dropboxusercontent.com/u/8968 ... s105a32.xz. These games are played under the same conditions as I use when tuning the evaluation function.

If you want to I can play some games at longer time control. Just specify the time control and the number of games you want.
I did see something at hyper-bullet, not sure it's relevant. 1,000 self-games of pretty equivalent versions at 60''+0.6'' would be very good. I will post soon some results on your hyper-bullet games.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Expected performance and eval of Komodo 8 and SF 6

Post by lkaufman »

Pio wrote:Hi Larry!

I guess the problem you have is that when adjusting the scores to better reflect the winning probabilities you will not search as deep into the simplified positions that are much easier to resolve to either a win, draw or a loss since the search is guided by the evaluation.

I guess you could fix this problem if you modify your search a little bit. I had an idea http://www.talkchess.com/forum/viewtopi ... 76&t=42677 how to do this.

What I want to say is that I think you should search a lot deeper in the simplified parts of the tree since it is cheaper but that you should not play those moves going into the simplified parts if you are not pretty sure that they are good.

Good luck!
I think your insight here is a very good one, and it gives me an idea how to solve this problem in Komodo. We shall see..Thank you for suggesting this line of attack.
Komodo rules!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

Laskos wrote:
petero2 wrote: I don't have a lot of blitz games but I do have lots of hyper-bullet (1s+0.08s/move) games: Here are 37100 such games: http://dl.dropboxusercontent.com/u/8968 ... s105a32.xz. These games are played under the same conditions as I use when tuning the evaluation function.

If you want to I can play some games at longer time control. Just specify the time control and the number of games you want.
I did see something at hyper-bullet, not sure it's relevant. 1,000 self-games of pretty equivalent versions at 60''+0.6'' would be very good. I will post soon some results on your hyper-bullet games.
A reliable result at hyper-bullet:

Code: Select all

Eval         Move 30       Move 70
Texel 1.05   Expe.score    Expe.score   

1.0            71%           60%        
1.5            80%           70%
2.0            87%           78%
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Adam Hair »

Hi Kai,

If you are interested, I will see if I have games played by Gaviota that are appropriate for your study. And if I don't, I do not mind spending a day or two of computer time producing them.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

Adam Hair wrote:Hi Kai,

If you are interested, I will see if I have games played by Gaviota that are appropriate for your study. And if I don't, I do not mind spending a day or two of computer time producing them.
Hi, sure, I need 1,000-5,000 games against pretty equal opponent, at blitz TC, in cutechess-cli.
petero2
Posts: 734
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Expected performance and eval of Komodo 8 and SF 6

Post by petero2 »

Laskos wrote:
petero2 wrote: I don't have a lot of blitz games but I do have lots of hyper-bullet (1s+0.08s/move) games: Here are 37100 such games: http://dl.dropboxusercontent.com/u/8968 ... s105a32.xz. These games are played under the same conditions as I use when tuning the evaluation function.

If you want to I can play some games at longer time control. Just specify the time control and the number of games you want.
I did see something at hyper-bullet, not sure it's relevant. 1,000 self-games of pretty equivalent versions at 60''+0.6'' would be very good. I will post soon some results on your hyper-bullet games.
Here are 2431 games played at time control 60+0.6: https://dl.dropboxusercontent.com/u/896 ... s105a35.xz
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

petero2 wrote:
Laskos wrote:
petero2 wrote: I don't have a lot of blitz games but I do have lots of hyper-bullet (1s+0.08s/move) games: Here are 37100 such games: http://dl.dropboxusercontent.com/u/8968 ... s105a32.xz. These games are played under the same conditions as I use when tuning the evaluation function.

If you want to I can play some games at longer time control. Just specify the time control and the number of games you want.
I did see something at hyper-bullet, not sure it's relevant. 1,000 self-games of pretty equivalent versions at 60''+0.6'' would be very good. I will post soon some results on your hyper-bullet games.
Here are 2431 games played at time control 60+0.6: https://dl.dropboxusercontent.com/u/896 ... s105a35.xz
Great, I will produce plots for move 25 and move 70.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

petero2 wrote: Here are 2431 games played at time control 60+0.6: https://dl.dropboxusercontent.com/u/896 ... s105a35.xz
With this database, the Expected Score of Texel 1.05 on move 25 and on move 70. The difference is substantial.

Move 25:
Image


Move 70:
Image


Combined fits:
Image
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

nimh wrote:It is obvious that the reason is that the reduced amount of material makes it harder to convert advantage into full point. Could you perform the analysis again for determining the relationship between material and expected scores?

You suggested I use a logistic function instead of centipawns for analyzing the quality of chess games. I think it would be useful to have a some sort of formula to determine expected scores based on material as well.
Ferdinand wrote the necessary script, I am a bit bogged analyzing data. I tried to accommodate logistic to fit data, it's pretty much hopeless. The excellent fit is a bit modified logistic:

Expected Score = (tanh[eval^b/a] + 1) / 2

Move 25:
Image
Here the blue line and dots are actual data, red line is (tanh[eval^b/a] + 1) / 2 fit, green line is the logistic fit. Logistic holds reasonably, but here:


Move 70:
Image
The logistic is pretty broken.

So, I abandoned the logistic fit for (tanh[eval^b/a] + 1) / 2, with a and b empirical parameters different for each engine (and may vary with time control or hardware). It fitted VERY well all results until now, Komodo 8, SF6, Houdini 4, Texel 1.05. A little thing to observe is that both the data and the fits have an inflection point (zero of second derivative) at some eval>0, while logistic has such a point only at eval=0.

Now, including material. With usual counting 1,3,3,5,9 for pawn, knight, bishop, rook, queen, using Ferdinand's script for moves 15,25,35,50,70 of Komodo 8 games database, I got the following material:

material(move 15)= 67.63;
material(move 25)= 57.95;
material(move 35)= 49.34;
material(move 50)= 33.12;
material(move 70)= 18.22;


With Expected Score = (tanh[eval^b/a] + 1) / 2, fitting for dependency of material, I got that a is inversely proportional to material^0.6. b is inversely proportional to material^0.3. With this scaling, normalizing for actual data, I got the following generalization including material for the fits:


Expected Score = (tanh[eval^(b1/material^0.3)/( a1/material^0.6)]+1) / 2

With a1=14.3684 and b1=4.36497 in the case of Komodo 8 fits for these blitz games. Other engines or other conditions will have different a1, b1.

The fits for Komodo to moves 15,25,35,50,70 are shown here:

Image

And they are very similar to the actual values I posted earlier:
http://www.talkchess.com/forum/viewtopi ... 10&start=5

For now I am a bit bogged improving on this to include material.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Expected performance and eval of Komodo 8 and SF 6

Post by Laskos »

Adam Hair wrote:Hi Kai,

If you are interested, I will see if I have games played by Gaviota that are appropriate for your study. And if I don't, I do not mind spending a day or two of computer time producing them.
Hi Adam, thanks for the PGNs.
Here is the first Gaviota result, TBs are to follow. The data seems a bit coarse, but the result is pretty clear. Interesting to note that Gaviota's eval is very close to logistic both to move 25 and to move 70 (the exponent b in the fit (1+tanh[eval^b/a])/2 is close to 1).

Image


Image


Image