Idea for Automatic Calibration of Evaluation Function...
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
 Steve Maughan
 Posts: 1055
 Joined: Wed Mar 08, 2006 7:28 pm
 Location: Florida, USA
 Contact:
Idea for Automatic Calibration of Evaluation Function...
Sadly it would seem that I all to often find myself too busy for computer chess programming  so I'm going to share this idea with the hope that someone else may find it of value.
I've been giving some thought to how one may be able to automatically calibrate an evaluation function. I also noticed that Stockfish has a random evaluation term that can be used to simulate human play. This triggered an idea based on simulated annealing (SA), a probabilistic optimization algorithm. For those who haven't come across SA, the algorithm basically randomly varies the elements that can be optimized, keeping the best solution, and slowly decreases the amount of randomness from the current best solution as time goes by and the system "cools".
So how about this for a evaluation annealing algorithm. Create ten versions of an engine each with a different set of evaluation coefficients. Let them play against one another. After each game, if an engine wins, decrease the randomness that you apply to adjust its coefficients. If it repeatedly wins then the randomness will approach zero. After an engine loses, increase the randomness of the variation in coefficients and bias the change to come closer to the coefficients of the engine that beat it (there are a number of ways to do this). After "many" games the coefficients "should" converge on a good set of values.
Any comments?
Naturally there are a zillion ways to implement and play around with how the coefficients are adjusted after each game.
Best regards,
Steve
I've been giving some thought to how one may be able to automatically calibrate an evaluation function. I also noticed that Stockfish has a random evaluation term that can be used to simulate human play. This triggered an idea based on simulated annealing (SA), a probabilistic optimization algorithm. For those who haven't come across SA, the algorithm basically randomly varies the elements that can be optimized, keeping the best solution, and slowly decreases the amount of randomness from the current best solution as time goes by and the system "cools".
So how about this for a evaluation annealing algorithm. Create ten versions of an engine each with a different set of evaluation coefficients. Let them play against one another. After each game, if an engine wins, decrease the randomness that you apply to adjust its coefficients. If it repeatedly wins then the randomness will approach zero. After an engine loses, increase the randomness of the variation in coefficients and bias the change to come closer to the coefficients of the engine that beat it (there are a number of ways to do this). After "many" games the coefficients "should" converge on a good set of values.
Any comments?
Naturally there are a zillion ways to implement and play around with how the coefficients are adjusted after each game.
Best regards,
Steve
 hgm
 Posts: 23152
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: Idea for Automatic Calibration of Evaluation Function...
A nice idea, if you can afford a billion games, or so. Bob just showed that being off a full Pawn on the Queen value only costs about 10 Elo points. Most eval terms are likely to be much smaller than 100 cP.

 Posts: 1154
 Joined: Fri Jun 23, 2006 3:18 am
Re: Idea for Automatic Calibration of Evaluation Function...
Just to nitpick a little...you should not generalize too much from Bob's finding there as he cherrypicked that value. I am sure being off by 100cp will have a HUGE ELO effect on most eval terms. It also seems clear at this point that automated tuning of the eval function is actually worth quite a lot of ELO in general, more than traditionally suspected, say 10 years ago.hgm wrote:A nice idea, if you can afford a billion games, or so. Bob just showed that being off a full Pawn on the Queen value only costs about 10 Elo points. Most eval terms are likely to be much smaller than 100 cP.
Sam

 Posts: 718
 Joined: Fri Mar 20, 2009 7:59 pm
Re: Idea for Automatic Calibration of Evaluation Function...
Selfplay is tricky. You're optimizing the engine to beat itself, which doesn't always translated to beating other players...

 Posts: 653
 Joined: Wed Mar 08, 2006 7:08 pm
 Location: Orange County California
 Full name: Stuart Cracraft
 Contact:
Re: Idea for Automatic Calibration of Evaluation Function...
Would like to see a repeat of the Knightcap experiments
on (F)ICS. Autotuning of eval... in a context of external competitors
chosen for being approximately the same current rating.
Has that experiment been repeated?
Would like to see it repeated with all the positional term coefficients
started at zero, all the piece weights started at a pawn, etc. After
the physical piece values "settle", let the positional terms start
changing in tandem.
Use whatever multicoefficient reinforcement learning/regression/
annealing/etc. you like.
on (F)ICS. Autotuning of eval... in a context of external competitors
chosen for being approximately the same current rating.
Has that experiment been repeated?
Would like to see it repeated with all the positional term coefficients
started at zero, all the piece weights started at a pawn, etc. After
the physical piece values "settle", let the positional terms start
changing in tandem.
Use whatever multicoefficient reinforcement learning/regression/
annealing/etc. you like.

 Posts: 9491
 Joined: Wed Mar 08, 2006 7:57 pm
 Location: Redmond, WA USA
 Contact:
Re: Idea for Automatic Calibration of Evaluation Function...
Read this thread:
http://www.talkchess.com/forum/viewtopi ... 47&t=31667
The technique presented looks far more effective than tdlamda and tdleaf.
http://www.talkchess.com/forum/viewtopi ... 47&t=31667
The technique presented looks far more effective than tdlamda and tdleaf.

 Posts: 718
 Joined: Fri Mar 20, 2009 7:59 pm
Re: Idea for Automatic Calibration of Evaluation Function...
I don't recall what knightcap used initially but I do seem to recall knightcap in the 1300s on FICS when it was starting (back when the average was closer to 1600). Or maybe I'm thinking of another engine. It was comical to watch though  the learning algorithm figured out that a side that is never put in check also never loses, so it'd sac material to push being checked over the horizon.

 Posts: 1122
 Joined: Sat Dec 13, 2008 6:00 pm
 Contact:
Re: Idea for Automatic Calibration of Evaluation Function...
It's called PBIL and has been discussed at length here.Steve Maughan wrote: Any comments?