My question is, how does one go about calibrating the eval scores to get such a scale? Just playing a lot of games, logging the score after every move and computing winning percentage as a function of score? And of course, I am assuming that the calibration is a monotonic function of the "raw" eval score.The engine evaluations have been carefully recalibrated so that +1.00 pawn advantage gives a 80% chance of winning the game against an equal opponent at blitz time control. At +2.00 the engine will win 95% of the time, and at +3.00 about 99% of the time. If the advantage is +0.50, expect to win nearly 50% of the time.
eval scale in Houdini
Moderators: bob, hgm, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 685
 Joined: Tue May 22, 2007 9:13 am
eval scale in Houdini
The Houdini site contains this interesting quote:
Re: eval scale in Houdini
http://rybkaforum.net/cgibin/rybkaforu ... l?tid=6012
And elsewhere at RF, search for "winning percentage".
And elsewhere at RF, search for "winning percentage".
Re: eval scale in Houdini
Correct, the calibration was done with about 50,000 games.Rein Halbersma wrote:My question is, how does one go about calibrating the eval scores to get such a scale? Just playing a lot of games, logging the score after every move and computing winning percentage as a function of score?
Results will depend on the opponent and the TC, you should read the percentages as no more than informed guesstimates.
Robert
Re: eval scale in Houdini
The way I did it is to generated a few hundred high quality games with 2 versions of Komodo that are slightly different but have the same basic strength.Rein Halbersma wrote:The Houdini site contains this interesting quote:
My question is, how does one go about calibrating the eval scores to get such a scale? Just playing a lot of games, logging the score after every move and computing winning percentage as a function of score? And of course, I am assuming that the calibration is a monotonic function of the "raw" eval score.The engine evaluations have been carefully recalibrated so that +1.00 pawn advantage gives a 80% chance of winning the game against an equal opponent at blitz time control. At +2.00 the engine will win 95% of the time, and at +3.00 about 99% of the time. If the advantage is +0.50, expect to win nearly 50% of the time.
A quick a dirty way is find all the position where the program scored between 0.95 and 1.05 and note the win percentage. Do the same thing for the negative case. Then you can compute what a pawn is worth for your program using the inverse formula on the wiki. Or you can just use the logistic formula and fish by trial and error to get the best fit.
It will come out differently for each program of course.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.

 Posts: 685
 Joined: Tue May 22, 2007 9:13 am
Re: eval scale in Houdini
OK, thanks everyone for the explanation. So basically a given eval function that has been tuned by hand, CLOP or any other tool has its overall score calibrated to correspond to winning percentages, e.g. those of the logistic / normal distribution. This is essentially mapping total eval scores to ELO differences, but keeps the feature parameters in their original units.
I wonder if you also could do the reverse. E.d. log all the positions from those 50,000 games, and also creat variables representing the various eval features present in those positions. One could then do a logistic regression of the game outcome on the variables representing the features. This would calibrate the eval features themselves to reflect ELO differences. Has anyone ever tried this? In Othello I know of one paper by Michael Buro (also the inventor of ProbCut) https://skatgame.net/mburo/ps/compoth.pdf
I wonder if you also could do the reverse. E.d. log all the positions from those 50,000 games, and also creat variables representing the various eval features present in those positions. One could then do a logistic regression of the game outcome on the variables representing the features. This would calibrate the eval features themselves to reflect ELO differences. Has anyone ever tried this? In Othello I know of one paper by Michael Buro (also the inventor of ProbCut) https://skatgame.net/mburo/ps/compoth.pdf

 Posts: 920
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: eval scale in Houdini
This looks pretty close: http://www.ratio.huji.ac.il/dp_files/dp613.pdf