How Do You Automatically Tune Your Evaluation Tables

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

petero2
Posts: 685
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: The texel evaluation function optimization algorithm

Post by petero2 »

gladius wrote:
petero2 wrote: The average error E is now a function of the M parameters. Find parameter values such that E is a local minimum in parameter space. The exact optimization method is not that important. It could for example use local search varying one parameter at a time, the Gauss-Newton method, the conjugate gradient method, or a hybrid approach mixing those methods.

If the evaluation function parameters are all of integer type with finite range (which they typically are in a chess engine), the local search method is guaranteed to eventually terminate since there is only a finite number of elements in the parameter space. Also in this case the conventional gradient isn't mathematically defined, but you can use difference quotients instead of the "real" gradient in the GN and CG methods.
Hi Peter,

This optimization method sounds really nice, and thanks for the really clear explanation! I'd been wanting to try something like it for a long time.

I've got the engine (Stockfish [1]) all hooked up to expose it's internal eval features, and built up the fen database with game results from recent games. The part I don't seem to be moving very fast on is the actual optimization :). My math skills are on the lower side, so I was just trying out the scipy optimization toolkit. It has a congujate-gradient method built in, but it doesn't seem to be doing a great job so far [2].

Can you be more specific by what you meant by using the difference quotient?

Thanks!
Gary

[1] https://github.com/glinscott/Stockfish/ ... r...tuning
[2] https://gist.github.com/glinscott/8814218
Hi Gary,

What I meant with difference quotient is indeed what Álvaro already explained. To be more specific, instead of trying to compute dE/dp, compute (E(p+1)-E(p-1))/2, where p is one parameter. Do this for all parameters to obtain the gradient.

I would like to stress again that so far I have not used the conjugate gradient or the Gauss Newton methods, so I don't know how well they will work. If E and its derivatives are smooth enough the methods should work well, but there is no guarantee that E is smooth enough.

What I have been doing is using local search, which basically works like this:

Code: Select all

vector<int> localOptimize&#40;const vector<int>& initialGuess&#41; &#123;
     const int nParams = initialGuess.size&#40;);
     double bestE = E&#40;initialGuess&#41;;
     vector<int> bestParValues = initialGuess;
     while &#40;true&#41; &#123;
        bool improved = false;
        for &#40;int pi = 0; pi < nParams; pi++) &#123;
            vector<int> newParValues = bestParValues;
            newParValues&#91;pi&#93; += 1;
            double newE = E&#40;newParValues&#41;;
            if &#40;newE < bestE&#41; &#123;
                bestE = newE;
                bestParValues = newParValues;
                improved = true;
            &#125; else &#123;
                newParValues&#91;pi&#93; -= 2;
                newE = E&#40;newParValues&#41;;
                if &#40;newE < bestE&#41; &#123;
                    bestE = newE;
                    bestParValues = newParValues;
                    improved = true;
                &#125;
            &#125;
        &#125;
        if (!improved&#41;
            break;
    &#125;
    return bestParValues;
&#125;
In reality my method is a bit more sofisticated though:

1. If I find that increasing a parameter improves E, I keep increasing this parameter until there is no further improvement. Similarly if I find that decreasing a parameter improves E.

2. Instead of just looping over all parameters repeatedly in the same order, I keep statistics about how often and by how much changing a parameter has improved E. Parameters with a good history are tried more often than other parameters. (I don't know how much this really helps but it seemed to be a reasonable approach.)

In texel I also have information about which parameter values are equal to each other by design. For example in the bishop endgame PST, the a1, h1, a8 and h8 squares have the same value by design, so only one parameter for all these squares is exposed to the optimization code. Reducing the number of parameters means you can get stable results with a lower number of games.

Finally I want to mention that I used a very fast time control (1+0.08) for the games I collected the FEN positions from. Even though this was partially out of convenience since I already have such games, it is also quite likely that using lower quality games improves the tuning. By using games that contains some tactical mistakes the evaluation function has a chance to learn how to score unbalanced positions, not just positions that would occur in high quality games. I have heard that this is an important property when tuning othello evaluation functions, and I guess it is important for chess too.
Sergei S. Markoff
Posts: 227
Joined: Mon Sep 12, 2011 11:27 pm
Location: Moscow, Russia

Re: How Do You Automatically Tune Your Evaluation Tables

Post by Sergei S. Markoff »

AlvaroBegue wrote:A different approach is to try to make the evaluation function be a good predictor of game result, without playing games. (Of course I would play many games after tuning to verify the new weights are better than the previous ones.)

The basic mechanism would be something like this:
(1) Get a database of [~1 million?] positions with associated results.
(2) From each position, run quiescence search and extract the position that the evaluation ultimately came from; in the process, write a new database of quiescent positions with associated results.
(3) Define the probability of winning as sigmoid(C*evaluation), where sigmoid(x):=1/(1+exp(-x)) and the constant C is chosen so the evaluation has the usual scale in pawns (I got C=0.58 or something like that, but I am quoting from memory).
(4) Use non-linear regression to estimate the parameters of the evaluation function that maximize the [log-]likelihood of the results.

One needs to do something to handle draws, but probably treating them as half a victory and half a loss would be fine.

Notice that if your evaluation function is a linear combination of terms and you are trying to figure out the coefficients, step (4) is logistic regression.

I have only done small-scale tests with this idea, but the Junior team seems to have used it extensively, as described in this paper: http://www.ratio.huji.ac.il/node/2362 . They seem to handle draws in a complicated way, but other than that I think their ideas are similar to mine (I haven't read the paper in a while).
That's very close to that what I've used for SmarThink, but I've finally changed to linear function instead of sigmoid. Positions with abs(eval) > 350 was excluded and than I've used simple 0,0026 * eval as probability.
The Force Be With You!
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: The texel evaluation function optimization algorithm

Post by gladius »

AlvaroBegue wrote:I would try with BFGS or L-BFGS (I think you can find BFGS in scipy, and there is a C library called liblbfgs for the other one; either one should be fine for this situation).
Thanks - I gave that a shot as well, but it didn't seem to be converging, just a random walk on the errors around the original value pretty much.

However, I'm trying again after removing the psq table values (two per square, per piece for SF), and it seems to be working better now. Only 520 parameters instead of 1200 now :).
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: The texel evaluation function optimization algorithm

Post by gladius »

petero2 wrote:
gladius wrote:
petero2 wrote: The average error E is now a function of the M parameters. Find parameter values such that E is a local minimum in parameter space. The exact optimization method is not that important. It could for example use local search varying one parameter at a time, the Gauss-Newton method, the conjugate gradient method, or a hybrid approach mixing those methods.

If the evaluation function parameters are all of integer type with finite range (which they typically are in a chess engine), the local search method is guaranteed to eventually terminate since there is only a finite number of elements in the parameter space. Also in this case the conventional gradient isn't mathematically defined, but you can use difference quotients instead of the "real" gradient in the GN and CG methods.
Hi Peter,

This optimization method sounds really nice, and thanks for the really clear explanation! I'd been wanting to try something like it for a long time.

I've got the engine (Stockfish [1]) all hooked up to expose it's internal eval features, and built up the fen database with game results from recent games. The part I don't seem to be moving very fast on is the actual optimization :). My math skills are on the lower side, so I was just trying out the scipy optimization toolkit. It has a congujate-gradient method built in, but it doesn't seem to be doing a great job so far [2].

Can you be more specific by what you meant by using the difference quotient?

Thanks!
Gary

[1] https://github.com/glinscott/Stockfish/ ... r...tuning
[2] https://gist.github.com/glinscott/8814218
Hi Gary,

What I meant with difference quotient is indeed what Álvaro already explained. To be more specific, instead of trying to compute dE/dp, compute (E(p+1)-E(p-1))/2, where p is one parameter. Do this for all parameters to obtain the gradient.

I would like to stress again that so far I have not used the conjugate gradient or the Gauss Newton methods, so I don't know how well they will work. If E and its derivatives are smooth enough the methods should work well, but there is no guarantee that E is smooth enough.

What I have been doing is using local search, which basically works like this:

Code: Select all

vector<int> localOptimize&#40;const vector<int>& initialGuess&#41; &#123;
     const int nParams = initialGuess.size&#40;);
     double bestE = E&#40;initialGuess&#41;;
     vector<int> bestParValues = initialGuess;
     while &#40;true&#41; &#123;
        bool improved = false;
        for &#40;int pi = 0; pi < nParams; pi++) &#123;
            vector<int> newParValues = bestParValues;
            newParValues&#91;pi&#93; += 1;
            double newE = E&#40;newParValues&#41;;
            if &#40;newE < bestE&#41; &#123;
                bestE = newE;
                bestParValues = newParValues;
                improved = true;
            &#125; else &#123;
                newParValues&#91;pi&#93; -= 2;
                newE = E&#40;newParValues&#41;;
                if &#40;newE < bestE&#41; &#123;
                    bestE = newE;
                    bestParValues = newParValues;
                    improved = true;
                &#125;
            &#125;
        &#125;
        if (!improved&#41;
            break;
    &#125;
    return bestParValues;
&#125;
In reality my method is a bit more sofisticated though:

1. If I find that increasing a parameter improves E, I keep increasing this parameter until there is no further improvement. Similarly if I find that decreasing a parameter improves E.

2. Instead of just looping over all parameters repeatedly in the same order, I keep statistics about how often and by how much changing a parameter has improved E. Parameters with a good history are tried more often than other parameters. (I don't know how much this really helps but it seemed to be a reasonable approach.)

In texel I also have information about which parameter values are equal to each other by design. For example in the bishop endgame PST, the a1, h1, a8 and h8 squares have the same value by design, so only one parameter for all these squares is exposed to the optimization code. Reducing the number of parameters means you can get stable results with a lower number of games.

Finally I want to mention that I used a very fast time control (1+0.08) for the games I collected the FEN positions from. Even though this was partially out of convenience since I already have such games, it is also quite likely that using lower quality games improves the tuning. By using games that contains some tactical mistakes the evaluation function has a chance to learn how to score unbalanced positions, not just positions that would occur in high quality games. I have heard that this is an important property when tuning othello evaluation functions, and I guess it is important for chess too.
Great, thanks for the explanation Peter!

I think there were just too many values in the original tuning that didn't have significant effect (random psq values that only affected one square).

I'm retrying without psq in the mix, and things seem to be going better, we shall see :).
asanjuan
Posts: 214
Joined: Thu Sep 01, 2011 5:38 pm
Location: Seville, Spain

Re: The texel evaluation function optimization algorithm

Post by asanjuan »

Peter,
Rhetoric will be grateful forever for your contribution :D
Thanks a lot.
Still learning how to play chess...
knigths move in "L" shape ¿right?
petero2
Posts: 685
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: The texel evaluation function optimization algorithm

Post by petero2 »

asanjuan wrote:Peter,
Rhetoric will be grateful forever for your contribution :D
Thanks a lot.
Good to hear that. Have you found a way to combine this method with your genetic algorithm approach?
asanjuan
Posts: 214
Joined: Thu Sep 01, 2011 5:38 pm
Location: Seville, Spain

Re: The texel evaluation function optimization algorithm

Post by asanjuan »

petero2 wrote:
asanjuan wrote:Peter,
Rhetoric will be grateful forever for your contribution :D
Thanks a lot.
Good to hear that. Have you found a way to combine this method with your genetic algorithm approach?
I must say that not too much, I haven't tried yet. But the real gem is that you have provided us with a fitness function. My experimental implementation of your method is simply faster that any G.A. that I tried before. This is due to 2 reasons:
- first of all I didn't know how to produce a reliable fitness function
- a G.A never ends, it can be running forever. It's up to you when to stop. A local search ends when it finds a combination of parameters that can't improve the fitness function.

But I must say that this is not a tunning algorithm. It is really a learning algoritm. It is almost identical to what is used to train neural networks.

My first experimental execution of this method produced a version of rhetoric that was playing very positional, with piece sacrifices... I'm really happy with that.
I'm running again the tunning changing the K constant to scale the evaluations to something less "insane". I was using K=1, now I'm using K=1.5. I don't bother wich constant to use since my intention is to tune every parameter.

A maybe possible experiment is to run different tunning executions starting with random values, build a population and then, do something like a GA based on real games. I haven't tried. Note that I implemented this learning algorithm 2 days ago...

Again, thank you for the fitness function. I don't want to forget Álvaro, he was who presented the subject in the current topic.

I would like to show a "wild" game played by the experimental version of Rhetoric. This game is very positional. Look at the amazing 14. Ke2!!


[pgn]
[Event "Arena tournament"]
[White "Rhetoric Dev64"]
[Black "SOS-51_Arena"]
[Result "1-0"]
[ECO "A06"]
[Opening "Reti Opening"]
[Variation "Nimzowitsch-Larsen, 2...Nf6 Nf6 3.Bb2 e6 4.e3"]
[TimeControl "40/120:40/120:40/120"]
[PlyCount "76"]

1. Nf3 d5 2. g3 c6 3. Bg2 Nf6 4. d3 Bg4 5. h3 Bh5 6. b3 e6 7. Bb2 Qa5+ 8.
Qd2 Qxd2+ 9. Nbxd2 {(Nbxd2 Bxf3 Nxf3 Bb4+ c3 Bd6 Kd2 Nbd7 a4 a5 c4 0-0 Kc2
h6 Bc3 Rfe8) +0.62/16 3} Bc5 {(Bc5 g4 Bg6 Nh4 0-0 e4 Na6 Nxg6 fxg6 d4 Bb4
exd5 Bxd2+ Kxd2 exd5) -0.47/13 12} 10. g4 {(g4 Bg6 g5 Nh5 Nh4 Na6) +0.59/14
3} Bg6 {(Bg6 Nh4 0-0 e4 Na6) -0.51/10 3} 11. Nh4 {(Nh4 Bd6 Nxg6 hxg6 e3
Nbd7 Ke2 Ke7 g5) +0.96/14 3} Na6 {(Na6 Rc1 0-0 e4 Rae8 Nxg6 fxg6 d4 Bb4 c4
Bxd2+) -0.43/11 4} 12. a3 {(a3 Bd6 Nxg6 hxg6 g5) +1.04/13 3} O-O {(0-0 e4
Rad8 Rd1 Bb6 Nxg6 fxg6 e5 Nd7) -0.45/10 3} 13. e3 {(e3 Nd7 Ke2 Bd6 Nxg6
fxg6 Nf3 e5 Bc3 Nc7 a4 e4 dxe4 dxe4 Ng5) +1.21/15 3} Rad8 {(Rad8 Ndf3 Bd6
Rd1 Nc5 Ne5 Nfd7 Nhxg6 fxg6 c4) -0.38/10 3} 14. Ke2 {(Ke2 Bd6 b4 Nd7 f4 Nb6
Nxg6 fxg6 Nf3 Na4 Bd4 c5 bxc5 N6xc5) +1.39/15 3} Bd6 {(Bd6 f4 Nc5 Rag1 a6
Nxg6 fxg6 d4 Ncd7 g5) -0.33/10 3} 15. f4 {(f4 Rfe8 Nxg6 hxg6 Nf3) +1.42/14
3} Nc5 {(Nc5 Rag1 h6 Nxg6 fxg6 c4 dxc4 dxc4 Be7 b4 Na4) -0.36/11 3} 16.
Nxg6 {(Nxg6 hxg6 Bd4 Rd7 Bf3 b5 g5 Nh5 Rhg1 Re8 Kf2 a6 a4 b4) +1.58/14 3}
fxg6 {(fxg6 Rac1 Rde8 c4 h6 d4 Ncd7 Rhf1) -0.50/11 3} 17. Nf3 {(Nf3 Rf7 Ne5
Re7 Bd4 Nfd7 a4 Bxe5 fxe5 Na6 Bxa7 Nxe5 Bd4 Nf7) +1.69/14 3} Nfd7 {(Nfd7
Raf1 Rde8 Rhg1 a6 b4 Na4 Bd4 c5 bxc5 Ndxc5) -0.51/9 3} 18. h4 {(h4 Na6 g5
c5 Raf1 Rde8 Ne5 Bxe5 fxe5 Nc7 a4 Rf5 Rxf5 gxf5) +1.56/13 3} Nf6 {(Nf6 Bh3
Nfd7 Rag1 a6 Ng5 e5 d4) -0.35/9 3} 19. Bh3 {(Bh3 Rf7 Bd4 Re8 g5 Nh5 Kf2 b5
b4 Na4 Ne5 Bxe5 Bxe5) +1.80/14 3} Nfd7 {(Nfd7 Raf1 Rde8 Rhg1 a6 Bg2 Bc7 b4
Na4 Ne5 Nxe5 Bxe5) -0.44/10 3} 20. h5 {(h5 Rde8 hxg6) +1.97/14 3} gxh5
{(gxh5 gxh5 Rfe8 Rag1 Re7 b4 Na4) -0.50/9 3} 21. gxh5 {(gxh5 Rde8 Rhg1 Nf6
h6 g6 Ng5 Bxf4 exf4 Nh5 Ke3 Nxf4 Bg7 Nxh3 Bxf8 Nxg1 Bxc5) +2.29/14 3} Rde8
{(Rde8 Rag1 Nf6 h6 g6 Bd4 Ncd7 Ng5 e5 Be6+ Rxe6 Nxe6 exd4 Nxf8 Kxf8)
-0.70/11 4} 22. Rhg1 {(Rhg1 Nf6 h6 g6 Bd4) +2.06/13 3} Nf6 {(Nf6 h6 g6 b4
Na4 Be5 Bxe5 Nxe5 Kh8 d4) -0.71/10 4} 23. h6 {(h6 g6 Bd4 Ncd7 Bxa7 e5 fxe5
Nxe5 Ng5 Re7 Be6+ Kh8 d4 Ned7 Rgf1 b5) +2.19/14 3} g6 {(g6 Ng5 Kh8 Bd4 Re7
b4 Ncd7 Nxe6 Rfe8 f5 Be5 Ng7 Bxd4 Nxe8 Bxa1) -1.70/13 8} 24. Bd4 {(Bd4 Ncd7
Bxa7) +2.19/14 3} Ncd7 {(Ncd7 Ng5 e5 Be6+ Rxe6 Nxe6 exd4 Nxf8 Kxf8) -0.10/9
3} 25. Be5 {(Be5 Bxe5 fxe5) +3.36/15 3} Bxe5 {(Bxe5 fxe5 Nh5 Ng5 Nc5 b4 Na4
Nxe6 Rf7 Raf1 Rfe7 c4) -1.29/12 8} 26. fxe5 {(fxe5 Nh5 Bg4 Ndf6 exf6 Nxf6)
+3.50/16 4} Nh5 {(Nh5 Bg4 Rf5 d4 c5 Bxf5 exf5 c4 cxd4 exd4 Nb6 c5 Nd7)
-2.50/14 7} 27. Bg4 {(Bg4 Nhf6 exf6 Nxf6) +3.83/16 3} Rf5 {(Rf5 d4 c5 Bxf5
exf5 c4 cxd4 exd4 Nb6 c5 Nd7) -2.51/11 3} 28. d4 {(d4 c5 c3 Rc8 Rac1 Re8
Ra1) +4.06/14 3} c5 {(c5 Bxf5 exf5 c4 cxd4 exd4 Nb6 c5 Nd7) -2.51/10 2} 29.
c3 {(c3 Nb6 Bxf5 exf5 Rab1 Nd7 Rb2 Nb6 Kd3 c4+ bxc4 dxc4+ Ke2 f4 Kf2 Rd8)
+4.16/13 3} Ref8 {(Ref8 Bxf5 Rxf5 Ng5 Nf8 Raf1 Rxf1 Kxf1 b6) -2.60/9 4} 30.
Bxf5 {(Bxf5 exf5 Nh4 Nb6 dxc5 Nd7 Rad1 Nxe5 Rxd5 Ng4 Rd7 Rf7 Rd8+ Rf8 Rd7)
+4.56/14 3} Rxf5 {(Rxf5 Ng5 Nf8 Raf1 Rxf1 Kxf1 b6 e4 cxd4 cxd4) -2.65/10 2}
31. Ng5 {(Ng5 Nxe5 dxe5 Rxe5 Raf1 b6 Nf7 Rf5 Rxf5 exf5 Nd6 Kf8 Nc8 Nf6 Nxa7
Ne4 Nb5) +5.42/16 3} Nf8 {(Nf8 Raf1 Rxf1 Kxf1 cxd4 cxd4 b6 Ke2 Nd7 Nxe6
Nxe5) -2.94/11 3} 32. Raf1 {(Raf1 Rxf1 Kxf1 b6 Ke2 cxd4 cxd4 a5 e4 dxe4 Ke3
b5 Nxe4 Nd7 Ng5 Nf8 Ne4) +5.98/18 4} Rxf1 {(Rxf1 Kxf1 c4 bxc4 dxc4 a4 b6
Ke2 Kh8 Kf3 Kg8 Nxh7 Nxh7 Rxg6+ Kf7 Rxe6) -3.18/12 3} 33. Kxf1 {(Kxf1 b6
Ke2) +6.02/19 3} cxd4 {(cxd4 exd4 Nf4 Kf2 b6 Ke3 Nh5 c4) -3.48/14 3} 34.
cxd4 {(cxd4 b5 Ke2) +7.41/17 3} a6 {(a6 Ke2 Nd7 Nxe6 Nxe5 dxe5 d4 exd4 Kf7
Rxg6) -3.61/12 3} 35. Kf2 {(Kf2 Ng7 hxg7 Kxg7 Rc1 h6 Rc7+ Kg8 Nf7 h5 Nh6+
Kh8 Rxb7 Nh7 Nf7+ Kg8 Kg3 g5 Nh6+ Kh8 Ra7 g4 Rxa6) +9.12/17 3} b6 {(b6 e4
dxe4 Ke3 b5 Kxe4 Nd7 Nxe6 Nxe5) -3.61/9 1} 36. Rc1 {(Rc1 b5 Rc8 Ng7 Ra8 a5
hxg7 Kxg7 Rxa5 h6 Ra7+ Kg8 Nf7 h5 Nd6 Nh7 Nxb5 Ng5 Nd6 h4) +9.31/16 3} b5
{(b5 Rc8 Ng7 hxg7 Kxg7 Rc6 Kh6 Nxe6 Nxe6 Rxe6) -6.57/10 2} 37. Rc8 {(Rc8
Ng7 Ra8) +9.78/16 3} Ng7 {(Ng7 Ra8 a5 Kf3 b4 hxg7 Kxg7 Rxa5 bxa3 Rxa3)
-6.82/9 1} 38. Ra8 {(Ra8 a5 hxg7) +10.33/16 3} a5 {(a5 hxg7 Kxg7 Ra7+ Kh8
Kf3 Kg8 Rxa5 h6) -6.97/8 1 Arena Adjudication} 1-0
[/pgn]
Still learning how to play chess...
knigths move in "L" shape ¿right?
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: The texel evaluation function optimization algorithm

Post by tpetzke »

- a G.A never ends, it can be running forever. It's up to you when to stop. A local search ends when it finds a combination of parameters that can't improve the fitness function.
That is not fully correct and probably depends on the GA you use. I use a form of PBIL and this would stop when all bits of the genom are fully converged to either 0 or 1. Then the entropy has dropped to 0. You eventually reach that state if your fitness function is deterministic and you have not a to wild mutation.

I stop the GA usually well before that because up until now I use game play as fitness function which is not deterministic. So parameters of lesser importance tend to fluctuate and it will take a very long time to converge them also. So last time I stopped after about 1400 generations.

It took ages to get there (>700k games played) and at the end I gained something like 5 ELO or so. The former trained genom was obviously not so bad.

I had dropped some parameters and thought that because of that the evaluation was somewhat imbalanced now, but it probably still was ok.

I will try Peters fitness function to, it will be much faster then the one I have now and I'm interested to see how good it performs compared with game play.

It's next on my list after some more fun with table base calculations.

Thomas...
Thomas...

=======
http://macechess.blogspot.com - iCE Chess Engine
asanjuan
Posts: 214
Joined: Thu Sep 01, 2011 5:38 pm
Location: Seville, Spain

Re: The texel evaluation function optimization algorithm

Post by asanjuan »

tpetzke wrote:
- a G.A never ends, it can be running forever. It's up to you when to stop. A local search ends when it finds a combination of parameters that can't improve the fitness function.
That is not fully correct and probably depends on the GA you use. I use a form of PBIL and this would stop when all bits of the genom are fully converged to either 0 or 1. Then the entropy has dropped to 0. You eventually reach that state if your fitness function is deterministic and you have not a to wild mutation.

I stop the GA usually well before that because up until now I use game play as fitness function which is not deterministic. So parameters of lesser importance tend to fluctuate and it will take a very long time to converge them also. So last time I stopped after about 1400 generations.

It took ages to get there (>700k games played) and at the end I gained something like 5 ELO or so. The former trained genom was obviously not so bad.

I had dropped some parameters and thought that because of that the evaluation was somewhat imbalanced now, but it probably still was ok.

I will try Peters fitness function to, it will be much faster then the one I have now and I'm interested to see how good it performs compared with game play.

It's next on my list after some more fun with table base calculations.

Thomas...
Yes, you are right, what I meant was that it's up to you to decide the stopping criteria. In practice, we don't have enough patience to see the the end of the process.
:D
Still learning how to play chess...
knigths move in "L" shape ¿right?
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: The texel evaluation function optimization algorithm

Post by mvk »

petero2 wrote: Since about one month ago, I have been using something similar to tune the evaluation function in my chess engine texel. Here is a description of the method.

Method

Take 64000 games played at a fast time control (such as 1s+0.08s/move) between the current and/or previous versions of the engine. Extract all positions from those games, except positions within the opening book and positions where the engine found a mate score during the game. This typically gives about 8.8 million positions which are stored as FEN strings in a text file.

Define the average evaluation error E:

Code: Select all

E = 1/N * sum&#40;i=1,N, &#40;result&#91;i&#93; - Sigmoid&#40;qScore&#40;pos&#91;i&#93;)))^2&#41;
This is very close to what I have been doing. The ice cream effect is certainly there, I can recognize it when my program plays a white pawn to h6. But overall it is a big win:

Image

I don't see how clop could ever come close in effectiveness.
Some details here: http://www.talkchess.com/forum/viewtopi ... 930#420930

For search parameters I use a slightly different approach: fewer positions, longer searches, and an oracle function based on long search results instead of game result.
[Account deleted]