Expected performance and eval of Komodo 8 and SF 6

Laskos · Post by **Laskos** » Sat Feb 07, 2015 11:28 pm

petero2 wrote: I don't have a lot of blitz games but I do have lots of hyper-bullet (1s+0.08s/move) games: Here are 37100 such games: http://dl.dropboxusercontent.com/u/8968 ... s105a32.xz. These games are played under the same conditions as I use when tuning the evaluation function.

If you want to I can play some games at longer time control. Just specify the time control and the number of games you want.

I did see something at hyper-bullet, not sure it's relevant. 1,000 self-games of pretty equivalent versions at 60''+0.6'' would be very good. I will post soon some results on your hyper-bullet games.

lkaufman · Post by **lkaufman** » Sat Feb 07, 2015 11:38 pm

Pio wrote:Hi Larry!

I guess the problem you have is that when adjusting the scores to better reflect the winning probabilities you will not search as deep into the simplified positions that are much easier to resolve to either a win, draw or a loss since the search is guided by the evaluation.

I guess you could fix this problem if you modify your search a little bit. I had an idea http://www.talkchess.com/forum/viewtopi ... 76&t=42677 how to do this.

What I want to say is that I think you should search a lot deeper in the simplified parts of the tree since it is cheaper but that you should not play those moves going into the simplified parts if you are not pretty sure that they are good.

Good luck!

I think your insight here is a very good one, and it gives me an idea how to solve this problem in Komodo. We shall see..Thank you for suggesting this line of attack.

Laskos · Post by **Laskos** » Sun Feb 08, 2015 1:50 am

Laskos wrote:
petero2 wrote: I don't have a lot of blitz games but I do have lots of hyper-bullet (1s+0.08s/move) games: Here are 37100 such games: http://dl.dropboxusercontent.com/u/8968 ... s105a32.xz. These games are played under the same conditions as I use when tuning the evaluation function.

If you want to I can play some games at longer time control. Just specify the time control and the number of games you want.
I did see something at hyper-bullet, not sure it's relevant. 1,000 self-games of pretty equivalent versions at 60''+0.6'' would be very good. I will post soon some results on your hyper-bullet games.

A reliable result at hyper-bullet:

Code: Select all

Eval         Move 30       Move 70
Texel 1.05   Expe.score    Expe.score   

1.0            71%           60%        
1.5            80%           70%
2.0            87%           78%

Adam Hair · Post by **Adam Hair** » Sun Feb 08, 2015 2:00 am

Hi Kai,

If you are interested, I will see if I have games played by Gaviota that are appropriate for your study. And if I don't, I do not mind spending a day or two of computer time producing them.

Laskos · Post by **Laskos** » Sun Feb 08, 2015 8:04 am

Adam Hair wrote:Hi Kai,

If you are interested, I will see if I have games played by Gaviota that are appropriate for your study. And if I don't, I do not mind spending a day or two of computer time producing them.

Hi, sure, I need 1,000-5,000 games against pretty equal opponent, at blitz TC, in cutechess-cli.

petero2 · Post by **petero2** » Sun Feb 08, 2015 8:14 am

Laskos wrote:
petero2 wrote: I don't have a lot of blitz games but I do have lots of hyper-bullet (1s+0.08s/move) games: Here are 37100 such games: http://dl.dropboxusercontent.com/u/8968 ... s105a32.xz. These games are played under the same conditions as I use when tuning the evaluation function.

If you want to I can play some games at longer time control. Just specify the time control and the number of games you want.
I did see something at hyper-bullet, not sure it's relevant. 1,000 self-games of pretty equivalent versions at 60''+0.6'' would be very good. I will post soon some results on your hyper-bullet games.

Here are 2431 games played at time control 60+0.6: https://dl.dropboxusercontent.com/u/896 ... s105a35.xz

Laskos · Post by **Laskos** » Sun Feb 08, 2015 8:30 am

petero2 wrote:
Laskos wrote:
petero2 wrote: I don't have a lot of blitz games but I do have lots of hyper-bullet (1s+0.08s/move) games: Here are 37100 such games: http://dl.dropboxusercontent.com/u/8968 ... s105a32.xz. These games are played under the same conditions as I use when tuning the evaluation function.

If you want to I can play some games at longer time control. Just specify the time control and the number of games you want.
I did see something at hyper-bullet, not sure it's relevant. 1,000 self-games of pretty equivalent versions at 60''+0.6'' would be very good. I will post soon some results on your hyper-bullet games.
Here are 2431 games played at time control 60+0.6: https://dl.dropboxusercontent.com/u/896 ... s105a35.xz

Great, I will produce plots for move 25 and move 70.

Laskos · Post by **Laskos** » Sun Feb 08, 2015 10:14 am

petero2 wrote: Here are 2431 games played at time control 60+0.6: https://dl.dropboxusercontent.com/u/896 ... s105a35.xz

With this database, the Expected Score of Texel 1.05 on move 25 and on move 70. The difference is substantial.

Move 25:

Move 70:

Combined fits:

Laskos · Post by **Laskos** » Sun Feb 08, 2015 7:18 pm

nimh wrote:It is obvious that the reason is that the reduced amount of material makes it harder to convert advantage into full point. Could you perform the analysis again for determining the relationship between material and expected scores?

You suggested I use a logistic function instead of centipawns for analyzing the quality of chess games. I think it would be useful to have a some sort of formula to determine expected scores based on material as well.

Ferdinand wrote the necessary script, I am a bit bogged analyzing data. I tried to accommodate logistic to fit data, it's pretty much hopeless. The excellent fit is a bit modified logistic:

Expected Score = (tanh[eval^b/a] + 1) / 2

Move 25:

Here the blue line and dots are actual data, red line is (tanh[eval^b/a] + 1) / 2 fit, green line is the logistic fit. Logistic holds reasonably, but here:

Move 70:

The logistic is pretty broken.

So, I abandoned the logistic fit for (tanh[eval^b/a] + 1) / 2, with a and b empirical parameters different for each engine (and may vary with time control or hardware). It fitted VERY well all results until now, Komodo 8, SF6, Houdini 4, Texel 1.05. A little thing to observe is that both the data and the fits have an inflection point (zero of second derivative) at some eval>0, while logistic has such a point only at eval=0.

Now, including material. With usual counting 1,3,3,5,9 for pawn, knight, bishop, rook, queen, using Ferdinand's script for moves 15,25,35,50,70 of Komodo 8 games database, I got the following material:

material(move 15)= 67.63;
material(move 25)= 57.95;
material(move 35)= 49.34;
material(move 50)= 33.12;
material(move 70)= 18.22;

With Expected Score = (tanh[eval^b/a] + 1) / 2, fitting for dependency of material, I got that a is inversely proportional to material^0.6. b is inversely proportional to material^0.3. With this scaling, normalizing for actual data, I got the following generalization including material for the fits:

Expected Score = (tanh[eval^(b1/material^0.3)/( a1/material^0.6)]+1) / 2

With a1=14.3684 and b1=4.36497 in the case of Komodo 8 fits for these blitz games. Other engines or other conditions will have different a1, b1.

The fits for Komodo to moves 15,25,35,50,70 are shown here:

And they are very similar to the actual values I posted earlier:
http://www.talkchess.com/forum/viewtopi ... 10&start=5

For now I am a bit bogged improving on this to include material.

Laskos · Post by **Laskos** » Mon Feb 09, 2015 4:10 pm

Adam Hair wrote:Hi Kai,

If you are interested, I will see if I have games played by Gaviota that are appropriate for your study. And if I don't, I do not mind spending a day or two of computer time producing them.

Hi Adam, thanks for the PGNs.
Here is the first Gaviota result, TBs are to follow. The data seems a bit coarse, but the result is pretty clear. Interesting to note that Gaviota's eval is very close to logistic both to move 25 and to move 70 (the exponent b in the fit (1+tanh[eval^b/a])/2 is close to 1).

Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6

Re: Expected performance and eval of Komodo 8 and SF 6