
I've been giving some thought to how one may be able to automatically calibrate an evaluation function. I also noticed that Stockfish has a random evaluation term that can be used to simulate human play. This triggered an idea based on simulated annealing (SA), a probabilistic optimization algorithm. For those who haven't come across SA, the algorithm basically randomly varies the elements that can be optimized, keeping the best solution, and slowly decreases the amount of randomness from the current best solution as time goes by and the system "cools".
So how about this for a evaluation annealing algorithm. Create ten versions of an engine each with a different set of evaluation coefficients. Let them play against one another. After each game, if an engine wins, decrease the randomness that you apply to adjust its coefficients. If it repeatedly wins then the randomness will approach zero. After an engine loses, increase the randomness of the variation in coefficients and bias the change to come closer to the coefficients of the engine that beat it (there are a number of ways to do this). After "many" games the coefficients "should" converge on a good set of values.
Any comments?
Naturally there are a zillion ways to implement and play around with how the coefficients are adjusted after each game.
Best regards,
Steve