The problem with testing against ID-0 is after a while it is meaningless.Laskos wrote: ↑Thu Aug 01, 2019 11:07 pmForgot to mention. All this ad-hoc procedure is if you don't want or cannot (too large an Elo span) play all the games against ID0 true anchor. Here you have some diversity of opponents, and Elo differences in games shouldn't be too large. The final error margins are square root tamed compared to the usual successive play. If you want even tamer error margins, sparser say N^(1/4) number of "anchors" can be projected, but maybe the Elo differences between them is too large and the number of these "anchors" is too small.Laskos wrote: ↑Thu Aug 01, 2019 6:43 pmI am on the phone now, on a vacation. I would propose an ad-hoc approach adjusted to your needs.Laskos wrote: ↑Tue Jul 30, 2019 2:40 pmWell, 2 times as big if he starts the procedure from the beginning of his line of nets. The issue might become whether that engine as an opponent is not peculiar in some ways and what really we want to measure ("strength", I suppose, but in relation to "something"). How the gating would look now? For example, would it be an orderly gating to require that each successive net performs better against "fixed opponents" than the previous net?Michel wrote: ↑Tue Jul 30, 2019 11:08 amTo get an elo estimate with fixed variance you should test against a fixed engine (or group of engines). But this then not good for comparing engines (it is well known that the variance of elo difference measured against
a 3d engine is 4 times as big as when measured in direct play).
First, you have to have a rough estimate of how many nets you will build. Say N=400. The error margins towards the end of the run will explode as N^(1/2) times the error margins of the first net after the anchor net if you play successive nets.
Set N^(1/2) "anchor" nets, in your case 400^(1/2) = 20 nets from ID20, 40, 60,..., to ID400. The true anchor is ID0. For ratings play the net against the last "anchor". If you hit another "anchor" ID, say ID80, play it against the anchor ID60 four time more games compared to usual nets (setting so the new "anchor" ID80).
This way, the final nets of the run will have error margins larger only by a factor of less than N^(1/4) times the initial error margins, better than previous N^(1/2). ID400 will have, if I am not completely wrong, N^(1/4)/2 or, in your case with 400 nets whole run, only some 2 times larger error margins for absolute Elo (no gating) than the error margins of the first net.
I am not sure if it's close to optimum use of resources (the effort is only slightly larger than playing successive nets, with a significant improvement of precision in absolute Elo measurement). The optimal use of resources will surely depend on the task you have to accomplish.
ID-0 vs ID-600
Score of scorpio-nn1 vs scorpio-nn2: 2 - 63 - 1 [0.038] 66
Elo difference: -561.93 +/- nan
ID-0 vs ID-800
Score of scorpio-nn1 vs scorpio-nn2: 4 - 77 - 1 [0.055] 82
Elo difference: -494.44 +/- 240.49
The few wins of ID-0 is through sheer luck where the game finished in 5 moves or so.
ID-600 vs ID-800
Score of scorpio-nn1 vs scorpio-nn2: 10 - 40 - 20 [0.286] 70
Elo difference: -159.18 +/- 74.25
Anchor needs to be moved as you suggested.