best way to determine elos of a group

Daniel Shawul · Post by **Daniel Shawul** » Fri Aug 02, 2019 2:39 am

Laskos wrote: ↑Fri Aug 02, 2019 1:07 am
Laskos wrote: ↑Thu Aug 01, 2019 8:43 pm
Laskos wrote: ↑Tue Jul 30, 2019 4:40 pm
Michel wrote: ↑Tue Jul 30, 2019 1:08 pm To get an elo estimate with fixed variance you should test against a fixed engine (or group of engines). But this then not good for comparing engines (it is well known that the variance of elo difference measured against
a 3d engine is 4 times as big as when measured in direct play).

Well, 2 times as big if he starts the procedure from the beginning of his line of nets. The issue might become whether that engine as an opponent is not peculiar in some ways and what really we want to measure ("strength", I suppose, but in relation to "something"). How the gating would look now? For example, would it be an orderly gating to require that each successive net performs better against "fixed opponents" than the previous net?
I am on the phone now, on a vacation. I would propose an ad-hoc approach adjusted to your needs.
First, you have to have a rough estimate of how many nets you will build. Say N=400. The error margins towards the end of the run will explode as N^(1/2) times the error margins of the first net after the anchor net if you play successive nets.
Set N^(1/2) "anchor" nets, in your case 400^(1/2) = 20 nets from ID20, 40, 60,..., to ID400. The true anchor is ID0. For ratings play the net against the last "anchor". If you hit another "anchor" ID, say ID80, play it against the anchor ID60 four time more games compared to usual nets (setting so the new "anchor" ID80).
This way, the final nets of the run will have error margins larger only by a factor of less than N^(1/4) times the initial error margins, better than previous N^(1/2). ID400 will have, if I am not completely wrong, N^(1/4)/2 or, in your case with 400 nets whole run, only some 2 times larger error margins for absolute Elo (no gating) than the error margins of the first net.

I am not sure if it's close to optimum use of resources (the effort is only slightly larger than playing successive nets, with a significant improvement of precision in absolute Elo measurement). The optimal use of resources will surely depend on the task you have to accomplish.
Forgot to mention. All this ad-hoc procedure is if you don't want or cannot (too large an Elo span) play all the games against ID0 true anchor. Here you have some diversity of opponents, and Elo differences in games shouldn't be too large. The final error margins are square root tamed compared to the usual successive play. If you want even tamer error margins, sparser say N^(1/4) number of "anchors" can be projected, but maybe the Elo differences between them is too large and the number of these "anchors" is too small.

The problem with testing against ID-0 is after a while it is meaningless.
e.g.
ID-0 vs ID-600
Score of scorpio-nn1 vs scorpio-nn2: 2 - 63 - 1 [0.038] 66
Elo difference: -561.93 +/- nan
ID-0 vs ID-800
Score of scorpio-nn1 vs scorpio-nn2: 4 - 77 - 1 [0.055] 82
Elo difference: -494.44 +/- 240.49

The few wins of ID-0 is through sheer luck where the game finished in 5 moves or so.

But
ID-600 vs ID-800
Score of scorpio-nn1 vs scorpio-nn2: 10 - 40 - 20 [0.286] 70
Elo difference: -159.18 +/- 74.25

Anchor needs to be moved as you suggested.

Michel · Post by **Michel** » Fri Aug 02, 2019 7:42 am

The problem with testing against ID-0 is after a while it is meaningless.
e.g.
ID-0 vs ID-600
Score of scorpio-nn1 vs scorpio-nn2: 2 - 63 - 1 [0.038] 66
Elo difference: -561.93 +/- nan
ID-0 vs ID-800
Score of scorpio-nn1 vs scorpio-nn2: 4 - 77 - 1 [0.055] 82
Elo difference: -494.44 +/- 240.49

The few wins of ID-0 is through sheer luck where the game finished in 5 moves or so.

Yes for large elo differences the derivative of the function elo->expected score goes to zero. So you have to measure the score much more accurately to get reasonable error bars on elo, hence many games (if we assume the elo model continues to hold for very large elo differences).

So at some point it presumably becomes advantageous to use intermediate engines to measure the elo difference, but of course not too many since that would blow up the variance again. This is something that can be calculated (starting from an elo model). I'll be busy the whole day but tonight I can have a look if someone doesn't beat me to it.

Michel · Post by **Michel** » Fri Aug 02, 2019 9:37 am

Of course for each new engine we can determine the prior engine ID-i it would be best to play against to get the lowest possible variance for the absolute elo (absolute elo=elo difference with ID-0). It is a trade off between the variance of ID-i and the elo difference of the new engine (which can be guessed, it does not have to be accurate) with ID-i. In that way the tournament graph will be a tree.

Ultimately the path to the root will also become long (though slower than in the linear case) and then it should be "firmed up" with more games.

Laskos · Post by **Laskos** » Fri Aug 02, 2019 11:12 am

Daniel Shawul wrote: ↑Fri Aug 02, 2019 2:39 am
Laskos wrote: ↑Fri Aug 02, 2019 1:07 am
Laskos wrote: ↑Thu Aug 01, 2019 8:43 pm
Laskos wrote: ↑Tue Jul 30, 2019 4:40 pm
Michel wrote: ↑Tue Jul 30, 2019 1:08 pm To get an elo estimate with fixed variance you should test against a fixed engine (or group of engines). But this then not good for comparing engines (it is well known that the variance of elo difference measured against
a 3d engine is 4 times as big as when measured in direct play).

Well, 2 times as big if he starts the procedure from the beginning of his line of nets. The issue might become whether that engine as an opponent is not peculiar in some ways and what really we want to measure ("strength", I suppose, but in relation to "something"). How the gating would look now? For example, would it be an orderly gating to require that each successive net performs better against "fixed opponents" than the previous net?
I am on the phone now, on a vacation. I would propose an ad-hoc approach adjusted to your needs.
First, you have to have a rough estimate of how many nets you will build. Say N=400. The error margins towards the end of the run will explode as N^(1/2) times the error margins of the first net after the anchor net if you play successive nets.
Set N^(1/2) "anchor" nets, in your case 400^(1/2) = 20 nets from ID20, 40, 60,..., to ID400. The true anchor is ID0. For ratings play the net against the last "anchor". If you hit another "anchor" ID, say ID80, play it against the anchor ID60 four time more games compared to usual nets (setting so the new "anchor" ID80).
This way, the final nets of the run will have error margins larger only by a factor of less than N^(1/4) times the initial error margins, better than previous N^(1/2). ID400 will have, if I am not completely wrong, N^(1/4)/2 or, in your case with 400 nets whole run, only some 2 times larger error margins for absolute Elo (no gating) than the error margins of the first net.

I am not sure if it's close to optimum use of resources (the effort is only slightly larger than playing successive nets, with a significant improvement of precision in absolute Elo measurement). The optimal use of resources will surely depend on the task you have to accomplish.
Forgot to mention. All this ad-hoc procedure is if you don't want or cannot (too large an Elo span) play all the games against ID0 true anchor. Here you have some diversity of opponents, and Elo differences in games shouldn't be too large. The final error margins are square root tamed compared to the usual successive play. If you want even tamer error margins, sparser say N^(1/4) number of "anchors" can be projected, but maybe the Elo differences between them is too large and the number of these "anchors" is too small.
The problem with testing against ID-0 is after a while it is meaningless.
e.g.
ID-0 vs ID-600
Score of scorpio-nn1 vs scorpio-nn2: 2 - 63 - 1 [0.038] 66
Elo difference: -561.93 +/- nan
ID-0 vs ID-800
Score of scorpio-nn1 vs scorpio-nn2: 4 - 77 - 1 [0.055] 82
Elo difference: -494.44 +/- 240.49

The few wins of ID-0 is through sheer luck where the game finished in 5 moves or so.

But
ID-600 vs ID-800
Score of scorpio-nn1 vs scorpio-nn2: 10 - 40 - 20 [0.286] 70
Elo difference: -159.18 +/- 74.25

Anchor needs to be moved as you suggested.

So, you do have large Elo progress, and choosing the single opponent ID-0 is not only undesirable, but harmful and practically impossible.

As I see it, your test progression is similar to Lc0 runs. About Lc0 "Ego" progression, it looks like net ID N to some power, N^a, with 0<a<1. I will propose a modified ad-hoc procedure, sorry for my silly "applied science crap", I hope Michel will come with something more general and meaningful.

Say the Elo progression is proportional to N^0.5. Instead of uniformly spacing in IDs, we will uniformly space in Elo. In this case, the uniform spacing in Elo will be satisfied by net ID N to the power of 2. As before, you will have the total number of "anchors" as the square root of total number of nets, N^(1/2). The "anchor" IDs are:

0 --- true anchor
1
4
9
16
25
....
....
961
1024

For a run totaling about 1000 nets. The hope is these "anchors" are close to be uniformly separated Elo-wise, say a 50 Elo points difference between two successive "anchors". You still have some diversity of "anchors", although towards the end of the run they are sparse. It's important that fixing the "anchors", say "anchor" ID-25, one has to play ID-25 versus "anchor" ID-16 significantly more games than an usual ID-24 vs "anchor" ID-16. About at least 4 times more games for fixing the "anchor" ID-25. Maybe even 8 times more games, if you will have a long run numbering thousands of nets. The total effort is not much larger than net by net playing as you did before, but you will have much tamer Elo margins towards the end of the run. You should have a reliable rating tool, without truly inverting the Hessian, the Elo margins shown with Bayeselo are not the Elo margins of absolute Elo for a given ID, having only ID-0 as a true anchor. I think in Ordo you can set ID-0 as the sole anchor and compute absolute Elo and error margins (via simulations) for all IDs. In your old procedure, to 100-200 nets, the Elo margins for the latter nets for the absolute Elo should explode. I would have checked with Ordo whether it gives the correct error margins, but I am on the phone for some time. With your old methodology, Ordo should show exploding Elo margins for absolute Elo to net IDs in hundreds.

What I propose will be much better than just serially testing successive IDs. But how close is it to the optimal use of resources, I don't know. It depends on particularities of the task to be accomplished. Hope Michel comes with something meaningful.

Michel · Post by **Michel** » Sat Aug 03, 2019 10:06 am

What I propose will be much better than just serially testing successive IDs. But how close is it to the optimal use of resources, I don't know. It depends on particularities of the task to be accomplished. Hope Michel comes with something meaningful.

Well if the goal is _really_ to determine for every n the elo-difference of ID-n with ID-0 with the lowest possible variance then I think what I was proposing (a self organizing tournament tree) doesn't look all
that unreasonable.

I do not know if it is optimal in any way (this is tricky to answer since one can not optimize for multiple objectives at once, so one would have to define a loss function with a weighing of the different objectives). However it looks locally optimal at stage n if at that stage one is only allowed to play games of ID-n against ID-k for some k<n.

Laskos · Post by **Laskos** » Sat Aug 03, 2019 6:18 pm

Michel wrote: ↑Sat Aug 03, 2019 10:06 am
What I propose will be much better than just serially testing successive IDs. But how close is it to the optimal use of resources, I don't know. It depends on particularities of the task to be accomplished. Hope Michel comes with something meaningful.
Well if the goal is _really_ to determine for every n the elo-difference of ID-n with ID-0 with the lowest possible variance then I think what I was proposing (a self organizing tournament tree) doesn't look all
that unreasonable.

I do not know if it is optimal in any way (this is tricky to answer since one can not optimize for multiple objectives at once, so one would have to define a loss function with a weighing of the different objectives). However it looks locally optimal at stage n if at that stage one is only allowed to play games of ID-n against ID-k for some k<n.

Yes, it's a bit involved for me. How does this consider how may engines have to be "firmed up" with additional games (I used "anchors" inside quotes to separate them from the true anchor ID-0)? I have chosen pretty arbitrarily the square root of intended run length, for no very clear reason.
I just came back home and run Ordo to see how it computes the error margins in these cases, and it is fine. The command line is here:

Code: Select all

ordo-win64.exe -a 0 -A "SF0" -W -p Ordo_check_2.pgn -s 4000 -D -J -E -N 2 -F 95.4 -z 200.241 -o rating.txt

I used 26 SF engines, from SF0 (true anchor) to SF25, with "anchores" as in my first proposal, uniformly spaced in the number equal to the square root, so 5 "anchors", SF5,10,15,20,25. Each of them has 4 times more games against the previous "anchor" than an usual engine testws. So, no any way adaptive, I fix the "firming up" IDs from the start. But it turns out that writing on a phone from a sunny beach in Crete, I was not that off. The error margins for sequential method used previously by Daniel is proportional the number of successive engines N to power 1/2. The error margins with my proposal is going as about 0.5 * N^(1/4). For 25 engines, this factor is almost 1.

Compare the errors of SF25 compared to those of SF1.

Here are the Ordo results for

1/ Old sequential method:

Code: Select all

   # PLAYER    : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 SF25      : 985.05 237.94     127.5     200    63.8     100    
   2 SF24      : 886.95 231.53     184.5     400    46.1      96    
   3 SF23      : 845.04 225.85     200.5     400    50.1      97    
   4 SF22      : 801.36 221.92     196.5     400    49.1      91    
   5 SF21      : 770.00 216.03     197.5     400    49.4      83    
   6 SF20      : 747.37 209.95     204.5     400    51.1      95    
   7 SF19      : 708.98 205.85     200.0     400    50.0      86    
   8 SF17      : 674.07 196.30     208.5     400    52.1      56    
   9 SF18      : 670.60 200.51     188.0     400    47.0      75    
  10 SF16      : 647.96 189.93     200.0     400    50.0      87    
  11 SF15      : 621.84 183.48     200.0     400    50.0      87    
  12 SF14      : 595.72 178.10     196.5     400    49.1      72    
  13 SF13      : 581.81 170.95     212.0     400    53.0      99    
  14 SF12      : 525.72 164.51     204.0     400    51.0     100    
  15 SF11      : 455.26 156.55     185.5     400    46.4      79    
  16 SF10      : 436.12 148.84     207.5     400    51.9      97    
  17 SF9       : 390.68 141.58     191.0     400    47.8      72    
  18 SF8       : 376.77 133.86     201.5     400    50.4      80    
  19 SF7       : 357.63 125.08     210.5     400    52.6      99    
  20 SF6       : 301.54 115.02     191.5     400    47.9      87    
  21 SF5       : 275.42 104.75     213.0     400    53.3     100    
  22 SF4       : 203.15  93.97     198.0     400    49.5     100    
  23 SF3       : 138.10  82.96     195.5     400    48.9      98    
  24 SF2       :  89.12  67.64     199.5     400    49.9      98    
  25 SF1       :  41.91  47.39     198.5     400    49.6      96    
  26 SF0       :   0.00   ----      88.0     200    44.0     ---

We see that SF25 compared to SF1 has about 25^(1/2) = 5 times larger error margins. This is VERY bad.

2/ My proposal:

Code: Select all


   # PLAYER    : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 SF23      : 164.46  66.01     115.0     200    57.5      60    
   2 SF22      : 155.61  65.71     112.5     200    56.3      66    
   3 SF25      : 144.62  52.43     437.5     800    54.7      60    
   4 SF21      : 138.06  67.15     107.5     200    53.8      62    
   5 SF24      : 127.59  66.74     104.5     200    52.3      58    
   6 SF19      : 120.26  62.42     109.5     200    54.8      52    
   7 SF11      : 118.54  58.57     117.5     200    58.8      59    
   8 SF20      : 111.95  46.50    1151.5    2400    48.0      71    
   9 SF17      :  97.58  62.51     103.0     200    51.5      52    
  10 SF18      :  95.84  62.42     102.5     200    51.3      50    
  11 SF14      :  95.48  57.39     111.0     200    55.5      52    
  12 SF13      :  93.72  58.37     110.5     200    55.3      54    
  13 SF7       :  89.72  52.97     121.0     200    60.5      54    
  14 SF15      :  87.15  41.27    1191.0    2400    49.6      50    
  15 SF16      :  87.15  62.49     100.0     200    50.0      55    
  16 SF12      :  83.21  56.92     107.5     200    53.8      50    
  17 SF1       :  83.21  48.58     123.5     200    61.8      53    
  18 SF9       :  80.68  52.48     118.5     200    59.3      65    
  19 SF3       :  66.83  47.85     119.0     200    59.5      54    
  20 SF4       :  63.23  47.82     118.0     200    59.0      58    
  21 SF10      :  57.10  33.79    1166.5    2400    48.6      57    
  22 SF8       :  52.27  53.25     110.5     200    55.3      50    
  23 SF6       :  52.27  51.67     110.5     200    55.3      65    
  24 SF2       :  38.38  47.22     111.0     200    55.5      81    
  25 SF5       :  15.65  23.92    1110.0    2400    46.3      90    
  26 SF0       :   0.00   ----     710.5    1600    44.4     ---

SF25 compared to SF1 has about 0.5 * 25^(1/4) = 1.1 times larger error margins.

The number of games played in the first case in 5000, in the second 8000. With more engines, say in hundreds, this difference will diminish to only 10% or so larger effort (square root proportion of additional effort), with dramatic increase of precision for absolute Elo (the difference to SF0). With more engines, in hundreds, Elo margins for absolute Elo will slowly increase with (1/4) power of the number of engines. But "firming up" can be even stronger for very long runs (say in thousands of nets). The effort won't be large (square root), and "firming up" by a factor of 4 for "anchors" will diminish the errors by a factor of 2, so it can become 0.25 * N^(1/4) for very long runs.

I do not know how close this proposal is to a a good one.

Daniel Shawul · Post by **Daniel Shawul** » Sat Aug 03, 2019 6:35 pm

Ok I am starting new training run since the previous one had a very low initial learning rate of 0.01 and was not able to capture
value of pieces quickly enough.
Training run-2 now uses initial lr of 0.15 and does 32k games before a new net is trained (huge difference compared to 512 games of run-1).
Also the sliding window size is increased to 500k games or 15 nets. The first net of the new run resulted in +250 elo so
ID-0 will probably stop being a good anchor soon enought. If I do 16 mill games for this 2x32 net, then I will produce a total of 1000 nets.
So with Kai's method, I need to change anchor's every 30 nets roughly.

Idea: Isn't it possible to just do as many match games as are required for the newer nets to bring down their elo delta's to the desired level ?
Meaning the number of match games increases with newer nets.

Laskos · Post by **Laskos** » Sat Aug 03, 2019 6:51 pm

Daniel Shawul wrote: ↑Sat Aug 03, 2019 6:35 pm
Idea: Isn't it possible to just do as many match games as are required for the newer nets to bring down their elo delta's to the desired level ?
Meaning the number of match games increases with newer nets.

You mean again playing sequentially? This would increase tremendously the effort proportionally to N^2 of total number of nets (if I am not wrong). You have to have a sparse number of "anchors" or "firmed up" nets as Michel called them. I have chosen the square root, and it looks reasonable. How many "anchors" also depends on the Elo difference there is in the progression and what diversity you want (again that peculiarity of certain nets or engines). The increase in effort with sparse "anchors" is moderate, and proportionally decreases with the length of the run.

Laskos · Post by **Laskos** » Sun Aug 04, 2019 9:41 am

Daniel Shawul wrote: ↑Sat Aug 03, 2019 6:35 pm Ok I am starting new training run since the previous one had a very low initial learning rate of 0.01 and was not able to capture
value of pieces quickly enough.
Training run-2 now uses initial lr of 0.15 and does 32k games before a new net is trained (huge difference compared to 512 games of run-1).
Also the sliding window size is increased to 500k games or 15 nets. The first net of the new run resulted in +250 elo so
ID-0 will probably stop being a good anchor soon enought. If I do 16 mill games for this 2x32 net, then I will produce a total of 1000 nets.
So with Kai's method, I need to change anchor's every 30 nets roughly.

Idea: Isn't it possible to just do as many match games as are required for the newer nets to bring down their elo delta's to the desired level ?
Meaning the number of match games increases with newer nets.

If your first net is already +250 Elo points, I think you have to use my second proposal, as net no.30 (uniformly spaced from 30 to 30) will be too far Elo-wise from ID-0 to be a useful anchor (for the same number of games, error margins for say 800 Elo points difference are large factors larger than for Elo difference of 100, as the derivative of logistic in 800 Elo region is close to 0). With initial LR of 0.15 and 32k games before a new net, use k^2 anchor IDs, that is, 1,4,9,16,25,...,961,1024 IDs. The anchors are still globally sparse (square root of N IDs), but agglomerated towards the larger Elo gains in the beginning. Otherwise, you will get a large variance right with the first anchor, and it is bad for the whole run. Don't forget each anchor to play say 4 times more games against the previous anchor, to "firm it up", as Michel put it.

Here is an example with 25 IDs with simulated games having large initial gains. Keep in mind that with hundreds of nets, error margins for absolute Elo will explode as N^(1/2) using the "naive approach" (your old methodology). The true anchor is ID-0, here SF0.

1/ Naive successive net playing for determining the absolute Elo:

Code: Select all

   # PLAYER    : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 SF25      : 1500.97 212.81     110.0     200    55.0      95    
   2 SF24      : 1466.03 207.71     204.5     400    51.1      86    
   3 SF21      : 1427.37 194.40     212.0     400    53.0      66    
   4 SF23      : 1415.17 203.08     186.0     400    46.5      53    
   5 SF22      : 1413.43 199.56     195.5     400    48.9      69    
   6 SF20      : 1399.45 189.38     201.0     400    50.3      94    
   7 SF19      : 1368.02 186.21     191.5     400    47.9      53    
   8 SF18      : 1366.28 182.40     203.0     400    50.8      72    
   9 SF17      : 1354.08 178.54     201.0     400    50.3      78    
  10 SF16      : 1338.40 173.55     204.0     400    51.0      93    
  11 SF15      : 1308.73 168.55     201.0     400    50.3      95    
  12 SF14      : 1275.54 163.98     194.5     400    48.6      75    
  13 SF13      : 1261.60 158.03     212.0     400    53.0     100    
  14 SF12      : 1205.40 152.64     196.5     400    49.1      98    
  15 SF11      : 1161.64 146.54     203.5     400    50.9     100    
  16 SF10      : 1105.43 139.54     199.0     400    49.8      99    
  17 SF9       : 1052.80 133.51     209.5     400    52.4     100    
  18 SF8       : 965.71 125.99     193.0     400    48.3     100    
  19 SF7       : 904.13 118.55     198.0     400    49.5     100    
  20 SF6       : 849.71 110.28     222.5     400    55.6     100    
  21 SF5       : 710.40 100.23     188.0     400    47.0     100    
  22 SF4       : 617.73  91.10     198.5     400    49.6     100    
  23 SF3       : 530.64  81.56     210.5     400    52.6     100    
  24 SF2       : 403.38  69.35     206.5     400    51.6     100    
  25 SF1       : 249.59  51.57     220.0     400    55.0     100    
  26 SF0       :   0.00   ----      38.5     200    19.3     ---

2/ Uniformly spaced anchors (a square root of N of them): no. 5,10,15,20,25

Code: Select all

   # PLAYER    : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 SF25      : 1425.68  88.58     478.0     800    59.8      78    
   2 SF23      : 1407.69  94.48     114.5     200    57.3      52    
   3 SF24      : 1405.91  95.99     114.0     200    57.0      86    
   4 SF22      : 1374.17  93.49     105.0     200    52.5      51    
   5 SF19      : 1373.52  94.70     136.5     200    68.3      76    
   6 SF20      : 1356.69  86.10    1219.0    2400    50.8      60    
   7 SF21      : 1351.45  95.61      98.5     200    49.3      94    
   8 SF16      : 1301.66  93.83     117.5     200    58.8      55    
   9 SF18      : 1298.07  94.41     116.5     200    58.3      77    
  10 SF17      : 1276.72  93.73     110.5     200    55.3      96    
  11 SF15      : 1239.91  84.03    1143.5    2400    47.6      56    
  12 SF14      : 1236.34  91.61     137.5     200    68.8      72    
  13 SF13      : 1218.40  92.66     133.0     200    66.5      94    
  14 SF11      : 1171.31  90.50     120.5     200    60.3      60    
  15 SF12      : 1164.05  91.01     118.5     200    59.3     100    
  16 SF10      : 1098.69  81.19    1254.5    2400    52.3      93    
  17 SF9       : 1049.26  94.48     173.5     200    86.8      90    
  18 SF8       : 998.15  92.58     166.0     200    83.0      93    
  19 SF7       : 944.94  87.63     156.5     200    78.3     100    
  20 SF6       : 794.12  84.60     120.5     200    60.3     100    
  21 SF5       : 721.50  74.38    1053.5    2400    43.9      87    
  22 SF4       : 638.15 126.49     195.0     200    97.5      81    
  23 SF3       : 565.43 101.68     192.5     200    96.3      98    
  24 SF2       : 437.98  71.80     185.0     200    92.5     100    
  25 SF1       : 241.92  50.54     160.0     200    80.0     100    
  26 SF0       :   0.00   ----      80.0    1600     5.0     ---

3/ k^2 anchors in ID (a square root of N of them again): no. 1,4,9,16,25

Code: Select all

   # PLAYER    : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 SF25      : 1496.41  65.60     580.5     800    72.6      92    
   2 SF24      : 1460.33  76.10     136.5     200    68.3      52    
   3 SF22      : 1458.33  75.94     136.0     200    68.0      75    
   4 SF23      : 1436.79  75.11     130.5     200    65.3      71    
   5 SF21      : 1419.75  75.07     126.0     200    63.0      52    
   6 SF20      : 1417.89  75.30     125.5     200    62.8      52    
   7 SF19      : 1416.03  75.56     125.0     200    62.5      93    
   8 SF18      : 1372.61  74.94     113.0     200    56.5      98    
   9 SF16      : 1327.07  60.99    1478.0    3200    46.2      80    
  10 SF17      : 1309.64  74.61      95.0     200    47.5      87    
  11 SF15      : 1271.06  71.95     150.5     200    75.3      63    
  12 SF14      : 1259.57  73.04     148.0     200    74.0      81    
  13 SF12      : 1231.25  71.50     141.5     200    70.8      67    
  14 SF13      : 1216.78  71.11     138.0     200    69.0      72    
  15 SF11      : 1198.81  70.14     133.5     200    66.8     100    
  16 SF10      : 1119.45  69.76     112.0     200    56.0      97    
  17 SF9       : 1077.45  55.00    1270.5    2800    45.4      62    
  18 SF8       : 1065.29  85.79     184.0     200    92.0     100    
  19 SF7       : 887.11  67.84     161.0     200    80.5      98    
  20 SF6       : 813.46  64.42     146.0     200    73.0     100    
  21 SF5       : 689.36  60.67     114.0     200    57.0      99    
  22 SF4       : 640.27  42.77     980.0    2400    40.8      99    
  23 SF3       : 561.43  64.74     172.0     200    86.0     100    
  24 SF2       : 399.26  52.47     141.5     200    70.8     100    
  25 SF1       : 245.46  26.18     804.5    2000    40.2     100    
  26 SF0       :   0.00   ----     157.0     800    19.6     ---

=================================

You can see that in your case, it's probably the best to adopt k^2 approach (my second proposal).

Here are the plots for cases 1/, 2/, 3/.

Elo_Ordo_01.jpg

Elo_Ordo_02.jpg

Elo_Ordo_03.jpg

Laskos · Post by **Laskos** » Sun Aug 04, 2019 10:12 am

To add:

The effort in the first, "naive" case, is 5000 games. In the second, uniformly spaced --- 8000 games. In the third, k^2 spaced --- 8000 games again.
The effort with hundreds of nets will be only slightly, say 10% larger than the "naive" case, and the effort goes to "firm up" the anchors. More nets, sparser the anchors are (I have chosen somewhat arbitrarily the square root sparseness).

best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group