

k = 1.325, e = 0.0812987569 (28.5129%)
So better and I suppose in line of what can be expected for Andscacs.
Now I started to do local search to find the local minimum. I'm tuning 1049 parameters (counting each array detailed).
Moderator: Ras
That's actually a good point.mvk wrote:For determining 'k' that is probably good enough, but I fear that 66,000 games is much too low for the tuner to go anywhere if you use only game outcomes as target function. The limitation is not the number of positions, but the number of independent signals to get out of it (game outcomes).cdani wrote:I have done it again with a serious number of games (66.000), and now it's
k = 1.495000, e = 0.1408248361 (37.526%)
For example, the bias from the test set alone is already in the order of 1/sqrt(66000) = 0.4%.
Every distance would be ok. But ok, you answered it, the 10 solutions are really different (and probably far enough from each other).mvk wrote:How do you calculate that distance/spread metric and what does it mean? All resulting 10 vectors are somewhat different by manual inspection.nionita wrote:What is the distance (or spread) in the parameter space of the found solutions? I guess your parameters are integers, but the found solutions are real numbers. There will be a further kind of noise you get when round the parameters.mvk wrote:The 10 resulting residuals are
average: 0.32228706 (32.2%)
3*stdev: 0.00051216 (0.05%)
1049 parameters is quite a lot. I have been finding success now by using a function with 3 parameters for each array. I use this for adding new evaluation terms to Nirvana. It seems to alleviate some overfitting problems that have seen. For instance, I just added a term that scales the value of bishops based on the number of blocked pawns. With the function I get a smooth range of values but if I try to tune each value individually I would get something like a small penalty when there are 3 blocked pawns but a huge bonus when there are 4 blocked pawns and then a huge penalty when there are 5 blocked pawns. Obviously this is nonsense and there simply were a small number of positions that had 4 blocked pawns that ended up as wins for the side with one or more bishops.cdani wrote:I found another bug![]()
and now it's
k = 1.325, e = 0.0812987569 (28.5129%)
So better and I suppose in line of what can be expected for Andscacs.
Now I started to do local search to find the local minimum. I'm tuning 1049 parameters (counting each array detailed).
Thanks!cetormenter wrote:1049 parameters is quite a lot. I have been finding success now by using a function with 3 parameters for each array. I use this for adding new evaluation terms to Nirvana. It seems to alleviate some overfitting problems that have seen. For instance, I just added a term that scales the value of bishops based on the number of blocked pawns. With the function I get a smooth range of values but if I try to tune each value individually I would get something like a small penalty when there are 3 blocked pawns but a huge bonus when there are 4 blocked pawns and then a huge penalty when there are 5 blocked pawns. Obviously this is nonsense and there simply were a small number of positions that had 4 blocked pawns that ended up as wins for the side with one or more bishops.cdani wrote:I found another bug![]()
and now it's
k = 1.325, e = 0.0812987569 (28.5129%)
So better and I suppose in line of what can be expected for Andscacs.
Now I started to do local search to find the local minimum. I'm tuning 1049 parameters (counting each array detailed).
The equation I mentioned is min + (max - min) * pow ( x , bias ) / pow ( arrSize - 1, bias ). I iterate min and max as usual but for the bias I typically start off using 1 (a linear function) and then use a peturbation of bias / 4 to bias / 16.
E is basically a measure of how well the evaluation function is able to predict the game outcome. I don't think the actual value of E is very interesting, because there are at least three factors that contribute to E:cdani wrote:All is working nicely, wining strength after each iteration on all the parameters. Thanks to all!!
I have used mutithread to do the computations, because it was very slow.
Current e = 0.0796898969 (28.2294%)
Has someone computed the "e" value for example for Stockfish? It makes any sense compute it? I really don't understand well the formulas, nor why this works. If someone tries to explain it, please, don't use maths :-)
Particularly, what is the error "e"? Where is the "error", mistake, think that is done badly? :-)
Thanks! Now is clear. I hope this explanation is included here:petero2 wrote: E is basically a measure of how well the evaluation function is able to predict the game outcome. I don't think the actual value of E is very interesting, because there are at least three factors that contribute to E:
1. Mistakes in the game continuation causing the game outcome to not be equal to the game theoretical value of the evaluated position.
2. Tactical aspects of a position that can not reasonably be modeled by the evaluation function.
3. Bad values of evaluation function parameters.
The goal of the algorithm is to improve 3, without being mislead by the "noise" introduced by 1 and 2.
If there are lots of positions in the training data, the effects of 1 and 2 will on average not depend much on the parameter values, so by varying the parameters to reduce E, the parameter values get better. However, the individual contributions from 1, 2 and 3 to E is not known, so you can't say how good the evaluation function is based on the E value alone.
It was a good idea. What I said was that before testing I thought the idea was bad, but after testing I realized that it was actually good.cdani wrote:https://chessprogramming.wikispaces.com ... ing+Method
In this page there is another think I don't understand. You say:
"The 39.4 improvement came when I changed the criteria for which positions to include in the test set. Initially I removed also all positions where the q-search score deviated too much from the search score in the actual game (which conveniently is saved by cutechess-cli in PGN comments). I believed that including those positions would just raise the "noise level" of the data and cause a worse solution to be found. Apparently this is not the case. I now believe that even though including these positions causes noise, the q-search function has to deal with them all the time in real games, so trying to learn how those positions should be evaluated on average is still beneficial."
So if I understand well you say, "I won 39.4 changing the criteria, but apparently changing the criteria was not a good idea". So if it was a bad idea, how was able to win 39.4 elo?
My intention was to let the optimization run until it could not improve E any more, and only then use the computed parameters to test if the playing strength was improved.cdani wrote:Another think. I'm doing standard selfplay tests at short and long time controls after each iteration of the optimizing algorithm. Until now the new parameters has always won the previous version. After one iteration the version is quite worse, but the algorithm continues to run (in another computer), so it has not finished optimizing "E". Is an expected behavior? Or is supposed to be good to wait the algorithm to finish until no further improvement of "E" is possible, i.e. each new iteration on the parameters must be good?