Hi Peter & Miguel,
thanks for sharing this. I will definitely give it a try in combination with my genetic algorithm framework that I already have.
The GA will just do the optimization of the parameters but I will use your method as an alternative fitness function. Probably as Miguel I will use quiet positions as I then only have a fast eval call and not a slower qsearch call. So its a bit more pre processing to save time later.
My current algorithm uses game play in combination with solving a test set of easy positional problems. A parameter set that solves a significant amount less positions than the winner of the last tournament is not allowed to play, it will not win the tournament anyway. This saves a bit of time in the early generations, in the later (where the entropy has already dropped) all parameters sets unfortunately jump over that threshold.
This makes the fitness function very expensive and so the algorithm takes a lot of time (2-3 weeks for a reasonable convergence of about 500 bits in my evaluation weights).
When I first optimized my evaluation between iCE 0.3 and iCE 1.0 I gained more than 100 ELO, the gains by subsequent runs after eval modification now are much smaller, so it is almost not worth to effort. Your approach might give it another boost.
My game play fitness function is still beneficial for tuning the parameters that control the search (futility margins, LMR, null move margins etc.). I can't image an alternative to game play in that area.
Thomas...
How Do You Automatically Tune Your Evaluation Tables
Moderators: hgm, Rebel, chrisw
-
- Posts: 686
- Joined: Thu Mar 03, 2011 4:57 pm
- Location: Germany
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: The texel evaluation function optimization algorithm
This one I think I understand, at the very least I've tried tuning the value of an unstoppable pawn several times (in more than one program) and it always converges to slightly below a minor for me too (so a bonus of ~150cp).petero2 wrote:the pawn race bonus which is only 157cp even though it is supposed to trigger only when one side can promote a queen and the opponent can not.
Investigating the cause for these and other odd looking parameter values may suggest further improvements to the evaluation function.
The reason is that the unstoppable pawn is not worth a queen, it is worth (slightly less than) the least valuable piece that the opponent is able to sacrifice to keep it from promoting. In general, that would be a minor piece.
This does suggest that the value should depend on the material remaining on the board, and the value should approach that of a queen when the pawn can never be captured at all, and it should approach the value of a rook when the opponent has no minors (but does have rooks). Experiments in that direction never produced any measurable improvements though...
-
- Posts: 686
- Joined: Thu Mar 03, 2011 4:57 pm
- Location: Germany
Re: The texel evaluation function optimization algorithm
I only bother with unstoppable pawns when the opponent has no piece left. Unlike all my other eval code this one is dependent on the side to move (1 tempo can make the difference) so it interferes with my eval cache that stores evaluations independent of the stm.
Therefore I limit the positions where this code is executed and usually if no pieces are left my eval is pretty fast, so here it does not hurt a lot.
Thomas
Therefore I limit the positions where this code is executed and usually if no pieces are left my eval is pretty fast, so here it does not hurt a lot.
Thomas
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: The texel evaluation function optimization algorithm
I keep track of whether any part of the evaluation would depend on side to move and if it does I skip the eval cache (or rather, I skip flipping the sign and storing the position with reverse colours).tpetzke wrote:I only bother with unstoppable pawns when the opponent has no piece left. Unlike all my other eval code this one is dependent on the side to move (1 tempo can make the difference) so it interferes with my eval cache that stores evaluations independent of the stm.
Probably by the time this really matters the game is decided anyway...
-
- Posts: 689
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: The texel evaluation function optimization algorithm
Interesting. I also have the impression that it has made my engine play more "wild" moves. Sometimes that leads to beautiful crushing wins, but unfortunately at other times it leads to humiliating losses where my engine looks like a patzer.michiguel wrote:It looks almost identical to what I have been doing with Gaviota
http://www.talkchess.com/forum/viewtopi ... =&start=11
I have noticed this too, for example the protected passed pawn bonus in my engine is currently 0. In this case it is a candidate for further investigation, not removal, because I believe a protected passed pawn is supposed to be a good thing. Maybe the implementation is broken somehow.michiguel wrote:The technique has some other advantages. For instance, you can tell what parameters are really not doing anything and become candidates to be removed.
-
- Posts: 689
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: The texel evaluation function optimization algorithm
My pawn race bonus is only used in pawn endgames. It is based on a description from Robert Hyatt how the pawn race code in crafty works. It takes a number of things into account:Evert wrote:This one I think I understand, at the very least I've tried tuning the value of an unstoppable pawn several times (in more than one program) and it always converges to slightly below a minor for me too (so a bonus of ~150cp).petero2 wrote:the pawn race bonus which is only 157cp even though it is supposed to trigger only when one side can promote a queen and the opponent can not.
Investigating the cause for these and other odd looking parameter values may suggest further improvements to the evaluation function.
The reason is that the unstoppable pawn is not worth a queen, it is worth (slightly less than) the least valuable piece that the opponent is able to sacrifice to keep it from promoting. In general, that would be a minor piece.
This does suggest that the value should depend on the material remaining on the board, and the value should approach that of a queen when the pawn can never be captured at all, and it should approach the value of a rook when the opponent has no minors (but does have rooks). Experiments in that direction never produced any measurable improvements though...
1. The pawn distance to the promotion square for the most advanced white and black pawn.
2. The side to move.
3. If the opponent king is outside the "square" of the pawn.
4. If a king is blocking its own pawn.
5. If a pawn will give check when it promotes.
Things that are not taken into account:
1. Whether the promoted pawn will control the promotion square of the opponents' pawn.
2. After promotion the result could be a drawn KQKP endgame with a pawn on a2/c2/f2/h2.
3. Tactical stuff like in the Reti endgame study.
4. If one side has more than one passed pawn, only the most advanced pawn that "outsquares" the opponent king is considered. This may not be the best pawn to advance.
My assumption was that the ignored factors are rare enough not to matter, but maybe that is an invalid assumption.
-
- Posts: 689
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: The texel evaluation function optimization algorithm
AlvaroBegue wrote:There is another possible definition for the evaluation error, which is the cross-entropy of the results given the evaluation:
E = -1/N * sum(i=1..N, WhiteWon * Log(Sigmoid(qScore(pos))) + BlackWon * Log(1-Sigmoid(qScore(pos))))
where WhiteWon is 1 for a white victory, 1/2 for a draw and 0 for a black victory, and BlackWon is the obvious analogous number for black.
I wonder if the resulting weights would be very different.
Thanks for this formula. I have now tested it and have observed the following:
1. It took my local search algorithm around 19 hours to find a local minimum. For some reason the search was "spiraling" in towards the minimum instead of going more directly towards it. The Gauss Newton method could perhaps have avoided this problem.
2. Many evaluation weights were changed. Most by a small amount, but some by a large amount. See this diff
3. The effect on playing strength (-2.2 elo) in a 30000 game match was too small to say for sure which version is best:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Texel104a4 1.1 3.9 3.7 30222 50.4% -1.1 39.2%
2 TexelHead -1.1 3.7 3.9 30222 49.6% 1.1 39.2%
Tex Tex
Texel104a4 910
TexelHead 89
Code: Select all
Rank Name Elo + - games score oppo. draws
QvsRR : 1 TexelHead 21 12 12 608 55% -21 21%
QvsRM : 1 TexelHead 24 11 10 960 56% -24 14%
QvsRMM : 1 TexelHead 38 20 20 244 60% -38 17%
BvsPPP : 1 TexelHead 10 8 8 1515 53% -10 21%
NvsPPP : 1 TexelHead 12 9 9 1349 53% -12 22%
BvsPPPP: 1 TexelHead 28 15 14 443 58% -28 23%
NvsPPPP: 1 TexelHead 24 16 16 356 57% -24 21%
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: The texel evaluation function optimization algorithm
Wow, thanks for testing the alternative formula! It's fun to go through the weights to see what changes quite a bit (e.g., a white bishop on d1 and e1 during the middle game gets much more penalized, if I am reading the tables correctly). I wonder to what extent this is just reflecting the fact that those scores are not very well determined by the data (perhaps they don't happen very often?).
Your 30,000-game test seems to indicate that the exact formula you use doesn't matter a whole lot for playing performance. I wouldn't try to read too much into the performance in small subsets of that test.
Your 30,000-game test seems to indicate that the exact formula you use doesn't matter a whole lot for playing performance. I wouldn't try to read too much into the performance in small subsets of that test.
-
- Posts: 689
- Joined: Mon Apr 19, 2010 7:07 pm
- Location: Sweden
- Full name: Peter Osterlund
Re: The texel evaluation function optimization algorithm
Actually it is the score for a white bishop on d8/e8 in the middle game that is changed from -5 to -50.AlvaroBegue wrote:Wow, thanks for testing the alternative formula! It's fun to go through the weights to see what changes quite a bit (e.g., a white bishop on d1 and e1 during the middle game gets much more penalized, if I am reading the tables correctly). I wonder to what extent this is just reflecting the fact that those scores are not very well determined by the data (perhaps they don't happen very often?).
Your 30,000-game test seems to indicate that the exact formula you use doesn't matter a whole lot for playing performance. I wouldn't try to read too much into the performance in small subsets of that test.
The tables are defined for black pieces (hence the final "b" in the variable names) and the white piece tables are mirror versions implemented by the TableParamMirrored class. All evaluation values are defined so that positive values are good for white (and the score is negated as the last step in the evaluation function in case side to move == black).
All in all, this means that you can think of the tables as valid for white pieces with the usual orientation of the chess board, so that a1 = lower left corner, h1 = lower right corner, a8 = upper left corner.
I have a function that determines how often a particular evaluation weight affects the q-search score. For a white bishop on d8/e8 in the middle games, the result is 0.26% of the positions (from a total of about 8.8 million positions) and 7.2% of the games (from a total of 64000 games).
I don't know how stable the parameter estimates are for the less frequently used evaluation weights. I will readily confess that I am nowhere near finished investigating this tuning algorithm.
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: The texel evaluation function optimization algorithm
Hi Peter,petero2 wrote: The average error E is now a function of the M parameters. Find parameter values such that E is a local minimum in parameter space. The exact optimization method is not that important. It could for example use local search varying one parameter at a time, the Gauss-Newton method, the conjugate gradient method, or a hybrid approach mixing those methods.
If the evaluation function parameters are all of integer type with finite range (which they typically are in a chess engine), the local search method is guaranteed to eventually terminate since there is only a finite number of elements in the parameter space. Also in this case the conventional gradient isn't mathematically defined, but you can use difference quotients instead of the "real" gradient in the GN and CG methods.
This optimization method sounds really nice, and thanks for the really clear explanation! I'd been wanting to try something like it for a long time.
I've got the engine (Stockfish [1]) all hooked up to expose it's internal eval features, and built up the fen database with game results from recent games. The part I don't seem to be moving very fast on is the actual optimization . My math skills are on the lower side, so I was just trying out the scipy optimization toolkit. It has a congujate-gradient method built in, but it doesn't seem to be doing a great job so far [2].
Can you be more specific by what you meant by using the difference quotient?
Thanks!
Gary
[1] https://github.com/glinscott/Stockfish/ ... r...tuning
[2] https://gist.github.com/glinscott/8814218