Why computing K that minimizes the sigmoid func. value?...
Moderators: bob, hgm, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Re: Why computing K that minimizes the sigmoid func. value?.
I have done it again with a serious number of games (66.000), and now it's
k = 1.495000, e = 0.1408248361 (37.526%)
k = 1.495000, e = 0.1408248361 (37.526%)
Daniel José  http://www.andscacs.com
Re: Why computing K that minimizes the sigmoid func. value?.
For determining 'k' that is probably good enough, but I fear that 66,000 games is much too low for the tuner to go anywhere if you use only game outcomes as target function. The limitation is not the number of positions, but the number of independent signals to get out of it (game outcomes).cdani wrote:I have done it again with a serious number of games (66.000), and now it's
k = 1.495000, e = 0.1408248361 (37.526%)
For example, the bias from the test set alone is already in the order of 1/sqrt(66000) = 0.4%.
[Account deleted]
Re: Why computing K that minimizes the sigmoid func. value?.
Aha. I followed the advice present heremvk wrote:For determining 'k' that is probably good enough, but I fear that 66,000 games is much too low for the tuner to go anywhere if you use only game outcomes as target function. The limitation is not the number of positions, but the number of independent signals to get out of it (game outcomes).cdani wrote:I have done it again with a serious number of games (66.000), and now it's
k = 1.495000, e = 0.1408248361 (37.526%)
For example, the bias from the test set alone is already in the order of 1/sqrt(66000) = 0.4%.
https://chessprogramming.wikispaces.com ... ing+Method
It says 64000 games.
As I follow the procedure presented there, I use the qsearch of every position to tune, so some millions. For Texel this worked. I will give it a try...
Daniel José  http://www.andscacs.com
Re: Why computing K that minimizes the sigmoid func. value?.
Selecting the test set size is still a bit of a black art. For my first experiments in 2010 I used 20,000 positions. When I did a retune with the same amount, it lost about 10 elo. When I increased to 150,000 positions I lost 20 elo. But I was using GM games at the time, and the noise from such games is quite high. When I did the most tuning in 2011 I used 1M positions. I have 3M in Floyd now, sampled from CCRL games only.
I ran a quick test last week using 10 disjoint test sets of 1M positions each. For each test set I let the tuner run for the same number of iterations, starting from the same initial vector of zeroes. The 10 resulting residuals are
average: 0.32228706 (32.2%)
3*stdev: 0.00051216 (0.05%)
This is a much smaller spread than I expected to see. I was expecting ~0.3%
What I still have to do is run a cutechess tournament with these 10 vectors and see what is the spread in elo. I will start one later this week as now I don't have an idle PC available.
I ran a quick test last week using 10 disjoint test sets of 1M positions each. For each test set I let the tuner run for the same number of iterations, starting from the same initial vector of zeroes. The 10 resulting residuals are
average: 0.32228706 (32.2%)
3*stdev: 0.00051216 (0.05%)
This is a much smaller spread than I expected to see. I was expecting ~0.3%
What I still have to do is run a cutechess tournament with these 10 vectors and see what is the spread in elo. I will start one later this week as now I don't have an idle PC available.
[Account deleted]
Re: Why computing K that minimizes the sigmoid func. value?.
What criteria do you use to consider that 10 sets are disjoint? For example if there is position in set 1 with rook in open file, does this mean the other 9 sets have no position with rook in open file?mvk wrote:I ran a quick test last week using 10 disjoint test sets of 1M positions each.
Re: Why computing K that minimizes the sigmoid func. value?.
To be precise: I sampled 10M positions from CCRL games by dumping all positions and select every 23rd or so (some odd number) position from the dump, then shuffle all selected positions and keep the first 10M of the result. Then I split it into 10 equal parts of 1M each.Ferdy wrote:What criteria do you use to consider that 10 sets are disjoint? For example if there is position in set 1 with rook in open file, does this mean the other 9 sets have no position with rook in open file?mvk wrote:I ran a quick test last week using 10 disjoint test sets of 1M positions each.
The only overlap is due to positions occurring more than once. But then: I also take the elo difference into consideration in my fit, because that is one of the main predictors of game outcome and I want to use it in my contempt model later.
The idea is to find out what is the impact of position selection bias.
[Account deleted]
Re: Why computing K that minimizes the sigmoid func. value?.
Nice idea.mvk wrote:To be precise: I sampled 10M positions from CCRL games by dumping all positions and select every 23rd or so (some odd number) position from the dump, then shuffle all selected positions and keep the first 10M of the result. Then I split it into 10 equal parts of 1M each.Ferdy wrote:What criteria do you use to consider that 10 sets are disjoint? For example if there is position in set 1 with rook in open file, does this mean the other 9 sets have no position with rook in open file?mvk wrote:I ran a quick test last week using 10 disjoint test sets of 1M positions each.
The only overlap is due to positions occurring more than once. But then: I also take the elo difference into consideration in my fit, because that is one of the main predictors of game outcome and I want to use it in my contempt model later.
The idea is to find out what is the impact of position selection bias.
I am thinking of controlling extreme presence of features in a training sets. For example, positions with white rook in open file (res 10) should not dominate the sets, there must also be some positions with res 01 or res draw.
Re: Why computing K that minimizes the sigmoid func. value?.
What is the distance (or spread) in the parameter space of the found solutions? I guess your parameters are integers, but the found solutions are real numbers. There will be a further kind of noise you get when round the parameters.mvk wrote:The 10 resulting residuals are
average: 0.32228706 (32.2%)
3*stdev: 0.00051216 (0.05%)

 Posts: 922
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Re: Why computing K that minimizes the sigmoid func. value?.
You would get much more independent results if you divided the games into 10 bins instead.mvk wrote:To be precise: I sampled 10M positions from CCRL games by dumping all positions and select every 23rd or so (some odd number) position from the dump, then shuffle all selected positions and keep the first 10M of the result. Then I split it into 10 equal parts of 1M each.
Re: Why computing K that minimizes the sigmoid func. value?.
How do you calculate that distance/spread metric and what does it mean? All resulting 10 vectors are somewhat different by manual inspection.nionita wrote:What is the distance (or spread) in the parameter space of the found solutions? I guess your parameters are integers, but the found solutions are real numbers. There will be a further kind of noise you get when round the parameters.mvk wrote:The 10 resulting residuals are
average: 0.32228706 (32.2%)
3*stdev: 0.00051216 (0.05%)
In Floyd, "K" is really effectively 0.1 because I'm aiming at millipawn resolution evaluation and not centipawns. Roughly speaking, a single integer point change on an arbitrary parameter has less than 1e7 effect on the overall residual. I believe integer rounding plays no role in the results.
Maybe that is the case, but maybe not: To make the set truly independent one needs as many CCRL games, but there are just about 1.9M available.AlvaroBegue wrote:You would get much more independent results if you divided the games into 10 bins instead.
Anyway, here are the results of the round robin tournament:
version 0.7 is the version I tuned using 3M positions. Version .[09] are exactly the same code, except that each was compiled with a vector that was tuned using one of the 10 disjoint 1M sets.
Code: Select all
110000 game(s) loaded, 0 game(s) with unknown result ignored.
00:00:00,00
Rank Name Elo +  games score oppo. draws
1 floyd0.7 5 4 4 20000 51% 1 23% < 3M (default)
2 floyd.3 3 4 4 20000 50% 0 24% < 1M
3 floyd.4 3 4 4 20000 50% 0 24% < 1M
4 floyd.7 2 4 4 20000 50% 0 24% < 1M
5 floyd.1 2 4 4 20000 50% 0 24% < 1M
6 floyd.6 1 4 4 20000 50% 0 23% < 1M
7 floyd.8 0 4 4 20000 50% 0 24% < 1M
8 floyd.9 1 4 4 20000 50% 0 23% < 1M
9 floyd.5 3 4 4 20000 50% 0 24% < 1M
10 floyd.0 3 4 4 20000 49% 0 23% < 1M
11 floyd.2 7 4 4 20000 49% 1 23% < 1M
LoS matrix:
fl fl fl fl fl fl fl fl fl fl fl
floyd0.7 82 83 89 91 95 96 98 99 99 99
floyd.3 17 51 62 65 77 80 90 98 98 99
floyd.4 16 48 61 64 77 79 90 98 98 99
floyd.7 10 37 38 53 67 70 84 96 96 99
floyd.1 8 34 35 46 64 67 82 95 95 99
floyd.6 4 22 22 32 35 53 71 90 91 99
floyd.8 3 19 20 29 32 46 67 89 89 99
floyd.9 1 9 9 15 17 28 32 77 78 98
floyd.5 0 1 1 3 4 9 10 22 51 91
floyd.0 0 1 1 3 4 8 10 21 48 90
floyd.2 0 0 0 0 0 0 0 1 8 9
1. None of the versions tuned with 1M can beat the version tuned with 3M positions. I don't expect this outcome when there is no difference between using 1M and 3M.
2. There is probably no real elo difference among the 10 versions tuned with 1M positions.
3. The differences are really small. In practice, it might be ok to use 1M for daily work and win some computing time while tuning. Then prior to a release increase the set size.
[Account deleted]