Why computing K that minimizes the sigmoid func. value?...
Moderators: Harvey Williamson, bob, hgm
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Re: Why computing K that minimizes the sigmoid func. value?.
I found another bug and now it's
k = 1.325, e = 0.0812987569 (28.5129%)
So better and I suppose in line of what can be expected for Andscacs.
Now I started to do local search to find the local minimum. I'm tuning 1049 parameters (counting each array detailed).
k = 1.325, e = 0.0812987569 (28.5129%)
So better and I suppose in line of what can be expected for Andscacs.
Now I started to do local search to find the local minimum. I'm tuning 1049 parameters (counting each array detailed).
Daniel José  http://www.andscacs.com
Re: Why computing K that minimizes the sigmoid func. value?.
That's actually a good point.mvk wrote:For determining 'k' that is probably good enough, but I fear that 66,000 games is much too low for the tuner to go anywhere if you use only game outcomes as target function. The limitation is not the number of positions, but the number of independent signals to get out of it (game outcomes).cdani wrote:I have done it again with a serious number of games (66.000), and now it's
k = 1.495000, e = 0.1408248361 (37.526%)
For example, the bias from the test set alone is already in the order of 1/sqrt(66000) = 0.4%.
I haven't really played around with this tuning method because it's a bit of a hassle to process the games and extract suitable positions, but it's on my todo list.
Now: does the quality of the games really matter very much? I ask, because I would like to use this method for chess variants, for which it's rather hard (if not impossible) to find a suitably large set of games; I'd need to generate those first and it would be much easier if I could just use fast games...
Re: Why computing K that minimizes the sigmoid func. value?.
Every distance would be ok. But ok, you answered it, the 10 solutions are really different (and probably far enough from each other).mvk wrote:How do you calculate that distance/spread metric and what does it mean? All resulting 10 vectors are somewhat different by manual inspection.nionita wrote:What is the distance (or spread) in the parameter space of the found solutions? I guess your parameters are integers, but the found solutions are real numbers. There will be a further kind of noise you get when round the parameters.mvk wrote:The 10 resulting residuals are
average: 0.32228706 (32.2%)
3*stdev: 0.00051216 (0.05%)
I was curious, cause it looks for me that when optimizing a noisy function chances are big that you find every time another optimum, when the function is flat and the noise strong enough. At least this happens to me all the time.

 Posts: 169
 Joined: Sun Oct 28, 2012 8:46 pm
Re: Why computing K that minimizes the sigmoid func. value?.
1049 parameters is quite a lot. I have been finding success now by using a function with 3 parameters for each array. I use this for adding new evaluation terms to Nirvana. It seems to alleviate some overfitting problems that have seen. For instance, I just added a term that scales the value of bishops based on the number of blocked pawns. With the function I get a smooth range of values but if I try to tune each value individually I would get something like a small penalty when there are 3 blocked pawns but a huge bonus when there are 4 blocked pawns and then a huge penalty when there are 5 blocked pawns. Obviously this is nonsense and there simply were a small number of positions that had 4 blocked pawns that ended up as wins for the side with one or more bishops.cdani wrote:I found another bug and now it's
k = 1.325, e = 0.0812987569 (28.5129%)
So better and I suppose in line of what can be expected for Andscacs.
Now I started to do local search to find the local minimum. I'm tuning 1049 parameters (counting each array detailed).
The equation I mentioned is min + (max  min) * pow ( x , bias ) / pow ( arrSize  1, bias ). I iterate min and max as usual but for the bias I typically start off using 1 (a linear function) and then use a peturbation of bias / 4 to bias / 16.
Re: Why computing K that minimizes the sigmoid func. value?.
Thanks!cetormenter wrote:1049 parameters is quite a lot. I have been finding success now by using a function with 3 parameters for each array. I use this for adding new evaluation terms to Nirvana. It seems to alleviate some overfitting problems that have seen. For instance, I just added a term that scales the value of bishops based on the number of blocked pawns. With the function I get a smooth range of values but if I try to tune each value individually I would get something like a small penalty when there are 3 blocked pawns but a huge bonus when there are 4 blocked pawns and then a huge penalty when there are 5 blocked pawns. Obviously this is nonsense and there simply were a small number of positions that had 4 blocked pawns that ended up as wins for the side with one or more bishops.cdani wrote:I found another bug and now it's
k = 1.325, e = 0.0812987569 (28.5129%)
So better and I suppose in line of what can be expected for Andscacs.
Now I started to do local search to find the local minimum. I'm tuning 1049 parameters (counting each array detailed).
The equation I mentioned is min + (max  min) * pow ( x , bias ) / pow ( arrSize  1, bias ). I iterate min and max as usual but for the bias I typically start off using 1 (a linear function) and then use a peturbation of bias / 4 to bias / 16.
Yes, I plan to use some type of smoothing also after some other ideas. Sure I will try your formula or a similar one. Cannot invent a lot here, my maths are absolutely basic
For the moment I plan to keep different snapshots of the parameters at regular points during the optimization, and then select manually for some parameters which point I want to try. Also I will try to smooth by hand the most obvious overfittings. All combined with reoptimizing concrete parts that I think can win something after the modification of other parameters, ...
Daniel José  http://www.andscacs.com
Re: Why computing K that minimizes the sigmoid func. value?.
All is working nicely, wining strength after each iteration on all the parameters. Thanks to all!!
I have used mutithread to do the computations, because it was very slow.
Current e = 0.0796898969 (28.2294%)
Has someone computed the "e" value for example for Stockfish? It makes any sense compute it? I really don't understand well the formulas, nor why this works. If someone tries to explain it, please, don't use maths
Particularly, what is the error "e"? Where is the "error", mistake, think that is done badly?
I have used mutithread to do the computations, because it was very slow.
Current e = 0.0796898969 (28.2294%)
Has someone computed the "e" value for example for Stockfish? It makes any sense compute it? I really don't understand well the formulas, nor why this works. If someone tries to explain it, please, don't use maths
Particularly, what is the error "e"? Where is the "error", mistake, think that is done badly?
Daniel José  http://www.andscacs.com
Re: Why computing K that minimizes the sigmoid func. value?.
E is basically a measure of how well the evaluation function is able to predict the game outcome. I don't think the actual value of E is very interesting, because there are at least three factors that contribute to E:cdani wrote:All is working nicely, wining strength after each iteration on all the parameters. Thanks to all!!
I have used mutithread to do the computations, because it was very slow.
Current e = 0.0796898969 (28.2294%)
Has someone computed the "e" value for example for Stockfish? It makes any sense compute it? I really don't understand well the formulas, nor why this works. If someone tries to explain it, please, don't use maths :)
Particularly, what is the error "e"? Where is the "error", mistake, think that is done badly? :)
1. Mistakes in the game continuation causing the game outcome to not be equal to the game theoretical value of the evaluated position.
2. Tactical aspects of a position that can not reasonably be modeled by the evaluation function.
3. Bad values of evaluation function parameters.
The goal of the algorithm is to improve 3, without being mislead by the "noise" introduced by 1 and 2.
If there are lots of positions in the training data, the effects of 1 and 2 will on average not depend much on the parameter values, so by varying the parameters to reduce E, the parameter values get better. However, the individual contributions from 1, 2 and 3 to E is not known, so you can't say how good the evaluation function is based on the E value alone.
Re: Why computing K that minimizes the sigmoid func. value?.
Thanks! Now is clear. I hope this explanation is included here:petero2 wrote: E is basically a measure of how well the evaluation function is able to predict the game outcome. I don't think the actual value of E is very interesting, because there are at least three factors that contribute to E:
1. Mistakes in the game continuation causing the game outcome to not be equal to the game theoretical value of the evaluated position.
2. Tactical aspects of a position that can not reasonably be modeled by the evaluation function.
3. Bad values of evaluation function parameters.
The goal of the algorithm is to improve 3, without being mislead by the "noise" introduced by 1 and 2.
If there are lots of positions in the training data, the effects of 1 and 2 will on average not depend much on the parameter values, so by varying the parameters to reduce E, the parameter values get better. However, the individual contributions from 1, 2 and 3 to E is not known, so you can't say how good the evaluation function is based on the E value alone.
https://chessprogramming.wikispaces.com ... ing+Method
In this page there is another think I don't understand. You say:
"The 39.4 improvement came when I changed the criteria for which positions to include in the test set. Initially I removed also all positions where the qsearch score deviated too much from the search score in the actual game (which conveniently is saved by cutechesscli in PGN comments). I believed that including those positions would just raise the "noise level" of the data and cause a worse solution to be found. Apparently this is not the case. I now believe that even though including these positions causes noise, the qsearch function has to deal with them all the time in real games, so trying to learn how those positions should be evaluated on average is still beneficial."
So if I understand well you say, "I won 39.4 changing the criteria, but apparently changing the criteria was not a good idea". So if it was a bad idea, how was able to win 39.4 elo?
Another think. I'm doing standard selfplay tests at short and long time controls after each iteration of the optimizing algorithm. Until now the new parameters has always won the previous version. After one iteration the version is quite worse, but the algorithm continues to run (in another computer), so it has not finished optimizing "E". Is an expected behavior? Or is supposed to be good to wait the algorithm to finish until no further improvement of "E" is possible, i.e. each new iteration on the parameters must be good?
Thanks!
Daniel José  http://www.andscacs.com
Re: Why computing K that minimizes the sigmoid func. value?.
Another think I just thought. If I have already won 25 elo in self play, I suppose is better to run a new bunch of 60,000 games with the latest best version, as the new parameter values when tested will behave better correlation with the results of the games, so they will be optimized better.
But if this is true, why not use games of a stronger engine like Stockfish? It will be even better.
Probably I did not understand something if this is not true.
But if this is true, why not use games of a stronger engine like Stockfish? It will be even better.
Probably I did not understand something if this is not true.
Daniel José  http://www.andscacs.com
Re: Why computing K that minimizes the sigmoid func. value?.
It was a good idea. What I said was that before testing I thought the idea was bad, but after testing I realized that it was actually good.cdani wrote:https://chessprogramming.wikispaces.com ... ing+Method
In this page there is another think I don't understand. You say:
"The 39.4 improvement came when I changed the criteria for which positions to include in the test set. Initially I removed also all positions where the qsearch score deviated too much from the search score in the actual game (which conveniently is saved by cutechesscli in PGN comments). I believed that including those positions would just raise the "noise level" of the data and cause a worse solution to be found. Apparently this is not the case. I now believe that even though including these positions causes noise, the qsearch function has to deal with them all the time in real games, so trying to learn how those positions should be evaluated on average is still beneficial."
So if I understand well you say, "I won 39.4 changing the criteria, but apparently changing the criteria was not a good idea". So if it was a bad idea, how was able to win 39.4 elo?
My intention was to let the optimization run until it could not improve E any more, and only then use the computed parameters to test if the playing strength was improved.cdani wrote:Another think. I'm doing standard selfplay tests at short and long time controls after each iteration of the optimizing algorithm. Until now the new parameters has always won the previous version. After one iteration the version is quite worse, but the algorithm continues to run (in another computer), so it has not finished optimizing "E". Is an expected behavior? Or is supposed to be good to wait the algorithm to finish until no further improvement of "E" is possible, i.e. each new iteration on the parameters must be good?
If you can afford the CPU time to test playing strength after each iteration, there is however no harm in that, except that you have to spend time performing the tests. It is even possible that some of the intermediate parameter values are stronger than the final values, but I don't know if that actually happens in practice.