TalkChess.com

Posted: **Mon Jan 14, 2013 10:57 am**

The Houdini site contains this interesting quote:

The engine evaluations have been carefully recalibrated so that +1.00 pawn advantage gives a 80% chance of winning the game against an equal opponent at blitz time control. At +2.00 the engine will win 95% of the time, and at +3.00 about 99% of the time. If the advantage is +0.50, expect to win nearly 50% of the time.

My question is, how does one go about calibrating the eval scores to get such a scale? Just playing a lot of games, logging the score after every move and computing winning percentage as a function of score? And of course, I am assuming that the calibration is a monotonic function of the "raw" eval score.

Posted: **Mon Jan 14, 2013 1:34 pm**

http://rybkaforum.net/cgi-bin/rybkaforu ... l?tid=6012

And elsewhere at RF, search for "winning percentage".

Posted: **Mon Jan 14, 2013 2:19 pm**

Rein Halbersma wrote:My question is, how does one go about calibrating the eval scores to get such a scale? Just playing a lot of games, logging the score after every move and computing winning percentage as a function of score?

Correct, the calibration was done with about 50,000 games.
Results will depend on the opponent and the TC, you should read the percentages as no more than informed guesstimates.

Robert

Posted: **Mon Jan 14, 2013 2:59 pm**

Rein Halbersma wrote:The Houdini site contains this interesting quote:

The engine evaluations have been carefully recalibrated so that +1.00 pawn advantage gives a 80% chance of winning the game against an equal opponent at blitz time control. At +2.00 the engine will win 95% of the time, and at +3.00 about 99% of the time. If the advantage is +0.50, expect to win nearly 50% of the time.
My question is, how does one go about calibrating the eval scores to get such a scale? Just playing a lot of games, logging the score after every move and computing winning percentage as a function of score? And of course, I am assuming that the calibration is a monotonic function of the "raw" eval score.

The way I did it is to generated a few hundred high quality games with 2 versions of Komodo that are slightly different but have the same basic strength.

A quick a dirty way is find all the position where the program scored between 0.95 and 1.05 and note the win percentage. Do the same thing for the negative case. Then you can compute what a pawn is worth for your program using the inverse formula on the wiki. Or you can just use the logistic formula and fish by trial and error to get the best fit.

It will come out differently for each program of course.

Posted: **Mon Jan 14, 2013 7:24 pm**

OK, thanks everyone for the explanation. So basically a given eval function that has been tuned by hand, CLOP or any other tool has its overall score calibrated to correspond to winning percentages, e.g. those of the logistic / normal distribution. This is essentially mapping total eval scores to ELO differences, but keeps the feature parameters in their original units.

I wonder if you also could do the reverse. E.d. log all the positions from those 50,000 games, and also creat variables representing the various eval features present in those positions. One could then do a logistic regression of the game outcome on the variables representing the features. This would calibrate the eval features themselves to reflect ELO differences. Has anyone ever tried this? In Othello I know of one paper by Michael Buro (also the inventor of ProbCut) https://skatgame.net/mburo/ps/compoth.pdf

Posted: **Wed Jan 16, 2013 3:59 pm**

This looks pretty close: http://www.ratio.huji.ac.il/dp_files/dp613.pdf

TalkChess.com

eval scale in Houdini

eval scale in Houdini

Re: eval scale in Houdini

Re: eval scale in Houdini

Re: eval scale in Houdini

Re: eval scale in Houdini

Re: eval scale in Houdini