best way to determine elos of a group

Michel · Post by **Michel** » Tue Jul 30, 2019 10:25 pm

Daniel Shawul wrote: ↑Tue Jul 30, 2019 8:51 pm There are two more options for computing intervals

jointdist [p] ... compute intervals from joint distribution
exactdist [p] ... compute intervals assuming exact opponent Elos

With exactdist

Code: Select all

Rank Name          Elo    +    - games score oppo. draws 
   1 scorpio-146   415   40   40   200   56%   379   36% 
   2 scorpio-147   383   56   56   100   45%   415   34% 
   3 scorpio-145   374   40   40   200   50%   375   38% 
   4 scorpio-134   347   39   38   200   55%   317   43% 
   5 scorpio-144   335   39   39   200   48%   350   40% 
   6 scorpio-141   329   39   39   200   52%   315   38% 
   7 scorpio-143   326   40   40   200   49%   330   35% 
   8 scorpio-142   325   40   39   200   50%   327   37% 
   9 scorpio-133   319   40   39   200   51%   310   37% 
  10 scorpio-135   315   39   39   200   48%   327   43% 
  11 scorpio-137   311   40   40   200   52%   298   34% 
  12 scorpio-136   308   39   39   200   49%   313   37% 
  13 scorpio-139   308   39   39   200   52%   297   37% 
  14 scorpio-140   306   40   40   200   48%   318   34% 
  15 scorpio-138   288   39   40   200   47%   309   37% 
  16 scorpio-131   283   40   39   200   56%   247   39% 
  17 scorpio-132   273   40   40   200   46%   301   36% 
  18 scorpio-125   243   40   40   200   55%   210   32% 
  19 scorpio-128   224   39   39   200   51%   217   37% 
  20 scorpio-129   222   39   39   200   50%   222   40% 
  21 scorpio-130   221   39   39   200   45%   253   41% 
  22 scorpio-124   218   40   40   200   50%   218   34% 
  23 scorpio-127   212   39   39   200   50%   213   39% 
  24 scorpio-121   203   40   40   200   57%   154   35% 
  25 scorpio-126   201   40   40   200   46%   227   35% 
  26 scorpio-123   193   39   39   200   51%   185   38% 
  27 scorpio-118   172   39   39   200   54%   149   41% 
  28 scorpio-120   156   39   39   200   47%   176   38% 
  29 scorpio-122   152   39   40   200   43%   198   38% 
  30 scorpio-117   150   54   54   101   47%   170   42% 
  31 scorpio-119   149   39   39   200   48%   164   42% 
  32 scorpio-17     62   38   38   200   56%    24   50% 
  33 scorpio-16     38   38   38   200   50%    40   50% 
  34 scorpio-14     35   39   39   200   55%     5   41% 
  35 scorpio-15     18   38   38   200   47%    36   47% 
  36 scorpio-18     10   38   38   200   47%    27   50% 
  37 scorpio-115    10   39   39   200   53%   -11   38% 
  38 scorpio-101     8   38   38   200   56%   -25   48% 
  39 scorpio-116     8   55   55   101   50%    11   40% 
  40 scorpio-11      6   28   28   400   52%    -8   42% 
  41 scorpio-21      6   38   38   200   53%   -11   49% 
  42 scorpio-10      3   27   27   400   51%    -2   44% 
  43 scorpio-99      0   39   38   200   55%   -28   43% 
  44 scorpio-20     -6   38   38   200   49%     0   50% 
  45 scorpio-19     -7   37   37   200   49%     2   52% 
  46 scorpio-8      -7   28   28   400   52%   -17   41% 
  47 scorpio-13     -8   38   38   200   48%     8   46% 
  48 scorpio-86    -11   39   39   200   53%   -30   41% 
  49 scorpio-9     -11   27   27   400   49%    -2   43% 
  50 scorpio-22    -17   37   37   200   50%   -16   55% 
  51 scorpio-12    -20   31   31   300   47%     2   45% 
  52 scorpio-109   -21   39   39   200   55%   -53   39% 
  53 scorpio-104   -23   39   39   200   55%   -52   39% 
  54 scorpio-7     -24   27   27   400   51%   -28   45% 
  55 scorpio-112   -24   39   38   200   52%   -37   43% 
  56 scorpio-102   -24   38   38   200   49%   -18   48% 
  57 scorpio-100   -25   38   38   200   45%     4   48% 
  58 scorpio-87    -27   38   38   200   49%   -21   46% 
  59 scorpio-97    -27   39   39   200   53%   -45   42% 
  60 scorpio-114   -30   39   40   200   48%   -15   36% 
  61 scorpio-98    -30   39   39   200   48%   -14   42% 
  62 scorpio-88    -30   38   38   200   51%   -34   50% 
  63 scorpio-55    -32   38   38   200   53%   -50   48% 
  64 scorpio-85    -33   39   39   200   52%   -43   42% 
  65 scorpio-56    -35   38   38   200   51%   -41   48% 
  66 scorpio-111   -35   39   39   200   51%   -37   40% 
  67 scorpio-107   -35   39   39   200   54%   -60   43% 
  68 scorpio-37    -36   37   37   200   51%   -44   60% 
  69 scorpio-24    -37   38   37   200   54%   -61   51% 
  70 scorpio-47    -38   38   37   200   54%   -63   52% 
  71 scorpio-23    -39   37   37   200   48%   -27   53% 
  72 scorpio-113   -39   39   39   200   48%   -27   42% 
  73 scorpio-90    -40   39   39   200   51%   -43   43% 
  74 scorpio-33    -40   38   38   200   54%   -64   48% 
  75 scorpio-89    -42   38   38   200   49%   -35   47% 
  76 scorpio-38    -43   36   36   200   50%   -42   62% 
  77 scorpio-103   -44   39   39   200   47%   -23   42% 
  78 scorpio-36    -44   37   37   200   51%   -48   56% 
  79 scorpio-91    -44   38   38   200   51%   -47   45% 
  80 scorpio-39    -47   37   37   200   50%   -47   59% 
  81 scorpio-6     -48   27   27   400   49%   -45   46% 
  82 scorpio-48    -48   38   38   200   52%   -60   46% 
  83 scorpio-40    -50   37   37   200   53%   -67   54% 
  84 scorpio-43    -50   37   37   200   53%   -66   55% 
  85 scorpio-57    -50   38   38   200   52%   -60   51% 
  86 scorpio-110   -51   39   40   200   47%   -28   37% 
  87 scorpio-34    -51   37   37   200   50%   -50   59% 
  88 scorpio-1     -53   29   29   400   56%   -96   30% 
  89 scorpio-108   -54   38   38   200   46%   -28   48% 
  90 scorpio-92    -55   38   38   200   50%   -54   47% 
  91 scorpio-35    -60   37   37   200   48%   -48   60% 
  92 scorpio-105   -60   38   38   200   47%   -44   45% 
  93 scorpio-95    -61   38   38   200   51%   -66   50% 
  94 scorpio-96    -61   38   39   200   48%   -44   43% 
  95 scorpio-93    -63   39   39   200   50%   -62   40% 
  96 scorpio-44    -64   37   37   200   50%   -62   54% 
  97 scorpio-4     -64   28   28   400   50%   -66   39% 
  98 scorpio-54    -65   37   37   200   49%   -58   54% 
  99 scorpio-106   -65   39   39   200   48%   -48   39% 
 100 scorpio-5     -66   28   28   400   49%   -56   42% 
 101 scorpio-3     -66   28   28   400   51%   -72   39% 
 102 scorpio-42    -68   37   37   200   50%   -69   57% 
 103 scorpio-94    -70   38   38   200   49%   -62   45% 
 104 scorpio-31    -73   37   37   200   52%   -84   57% 
 105 scorpio-59    -74   38   38   200   56%  -112   46% 
 106 scorpio-45    -74   37   37   200   50%   -71   57% 
 107 scorpio-84    -75   38   38   200   49%   -70   44% 
 108 scorpio-32    -77   38   38   200   47%   -57   49% 
 109 scorpio-46    -78   37   37   200   46%   -56   57% 
 110 scorpio-2     -80   28   28   400   47%   -59   38% 
 111 scorpio-49    -83   38   38   200   51%   -87   49% 
 112 scorpio-53    -83   38   38   200   50%   -86   49% 
 113 scorpio-25    -84   38   38   200   47%   -65   50% 
 114 scorpio-58    -86   38   38   200   46%   -62   46% 
 115 scorpio-41    -87   37   37   200   45%   -59   54% 
 116 scorpio-62    -90   39   39   200   58%  -138   39% 
 117 scorpio-30    -91   37   37   200   49%   -85   55% 
 118 scorpio-26    -93   38   38   200   52%  -106   50% 
 119 scorpio-29    -96   38   38   200   54%  -118   47% 
 120 scorpio-52   -107   38   38   200   50%  -104   51% 
 121 scorpio-83   -107   38   38   200   53%  -123   47% 
 122 scorpio-0    -111   41   42   200   42%   -53   24% 
 123 scorpio-51   -125   37   37   200   49%  -116   55% 
 124 scorpio-50   -125   37   37   200   46%  -104   54% 
 125 scorpio-27   -128   38   38   200   48%  -119   50% 
 126 scorpio-63   -134   39   39   200   49%  -131   41% 
 127 scorpio-60   -137   38   38   200   45%  -108   50% 
 128 scorpio-61   -141   38   39   200   46%  -114   44% 
 129 scorpio-71   -141   39   39   200   53%  -157   38% 
 130 scorpio-28   -144   38   38   200   45%  -112   47% 
 131 scorpio-66   -150   38   38   200   52%  -162   44% 
 132 scorpio-72   -155   39   39   200   52%  -164   41% 
 133 scorpio-65   -156   38   38   200   51%  -161   48% 
 134 scorpio-70   -159   38   38   200   51%  -165   48% 
 135 scorpio-67   -168   39   39   200   49%  -160   40% 
 136 scorpio-68   -171   39   39   200   51%  -178   42% 
 137 scorpio-82   -171   38   39   200   46%  -146   45% 
 138 scorpio-64   -172   38   39   200   46%  -145   43% 
 139 scorpio-76   -179   39   39   200   54%  -206   39% 
 140 scorpio-81   -185   39   39   200   53%  -205   42% 
 141 scorpio-73   -186   38   38   200   48%  -176   46% 
 142 scorpio-69   -188   38   38   200   46%  -165   47% 
 143 scorpio-74   -196   39   39   200   50%  -196   38% 
 144 scorpio-79   -203   37   37   200   55%  -233   52% 
 145 scorpio-75   -205   39   39   200   47%  -188   39% 
 146 scorpio-77   -207   40   40   200   50%  -203   35% 
 147 scorpio-78   -228   38   39   200   46%  -205   44% 
 148 scorpio-80   -238   38   38   200   43%  -194   47%

jointdist takes a lot of time. Will post again if it finishes.

I don't actually know what jointdist and exactdist stand for (given that we already have the two covariance options).

Michel · Post by **Michel** » Tue Jul 30, 2019 11:28 pm

Maybe these terms refer to the exact posterior distribution suitably discretized? But how would one avoid dimensional explosion? I looked in the source of BayesElo but I could not understand it.

I am not an expert in Bayesian statistics, but I thought the mathematically exact methods use Monte Carlo sampling from the posterior e.g. using MCMC (Markov Chain Monte Carlo). Such methods scale well with the dimension.

Michel · Post by **Michel** » Tue Jul 30, 2019 11:50 pm

Ok I am guessing that exactdist uses the (1-dimensional) posterior for one elo assuming the other elos are exact.

And jointdist uses the true posterior. It seems to me that for more than a few players the naive (non-Monte Carlo) implementation would take a lot of memory and would be slow.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 31, 2019 5:49 am

Michel wrote: ↑Tue Jul 30, 2019 11:50 pm Ok I am guessing that exactdist uses the (1-dimensional) posterior for one elo assuming the other elos are exact.

And jointdist uses the true posterior. It seems to me that for more than a few players the naive (non-Monte Carlo) implementation would take a lot of memory and would be slow.

Yes, the exactdist methods is same us elostat's assumption opponent's elos are their true elos. Maybe Remi has exactdist as a benchmark to evaluate
the other methods. The results from my quick test also seem to confirm that.

The joinntdist was still running after 10 minutes before i had to stop it. I will try again later to see what kind of error bounds it produces compared to
the full hessian inverse method which seem to be the better approach so far IMO.

Michel · Post by **Michel** » Wed Jul 31, 2019 8:32 am

The joinntdist was still running after 10 minutes before i had to stop it. I will try again later to see what kind of error bounds it produces compared to
the full hessian inverse method which seem to be the better approach so far IMO.

Well near its maximum the posterior is multivariate Gaussian but the posterior (if derived from a logistic function) has fatter tails (they are e^{-ax} instead of e^{-ax^2}) so it is not inconceivable that in degenerate situations (i.e. a poorly connected tournament graph and few games) the true Bayesian credibility intervals would be different from those estimated with a multivariate Gaussian (I am not saying they will be, just that it seems possible).

As usual I am interested in the mathematical side of this. Standard Bayesian practice to obtain point estimates and credibility intervals is to sample from the posterior. By accident I happen to have some experience with MCMC. It is a wonderful method that you can throw at anything but it needs fine tuning for the particular problem at hand (one issue is that it is not so easy to see when it has converged). Since in this case we already have a good approximation of the posterior (a Gaussian) perhaps there are better methods to sample from the posterior.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jul 31, 2019 5:24 pm

Michel wrote: ↑Wed Jul 31, 2019 8:32 am
The joinntdist was still running after 10 minutes before i had to stop it. I will try again later to see what kind of error bounds it produces compared to
the full hessian inverse method which seem to be the better approach so far IMO.
Well near its maximum the posterior is multivariate Gaussian but the posterior (if derived from a logistic function) has fatter tails (they are e^{-ax} instead of e^{-ax^2}) so it is not inconceivable that in degenerate situations (i.e. a poorly connected tournament graph and few games) the true Bayesian credibility intervals would be different from those estimated with a multivariate Gaussian (I am not saying they will be, just that it seems possible).

As usual I am interested in the mathematical side of this. Standard Bayesian practice to obtain point estimates and credibility intervals is to sample from the posterior. By accident I happen to have some experience with MCMC. It is a wonderful method that you can throw at anything but it needs fine tuning for the particular problem at hand (one issue is that it is not so easy to see when it has converged). Since in this case we already have a good approximation of the posterior (a Gaussian) perhaps there are better methods to sample from the posterior.

I run the it overnight and it still didn't finish! Note that I did not use the poorly connected graph we were discussing about, but something else with just 10 players. It looks like even a 3-dimension problem takes too long with the joint probablity distribution method. The only thing i managed to get a result out of it is for 2 players and the results are similar to the rest. Both inverse diagonal and full inverse are also indistinguishable with two players only.
In any case, this method is impractical in the current non-montecarlo implementation form. Since you have MCMC experience, maybe you can implement something for comparing it to the full hessian inverse method?

Michel · Post by **Michel** » Wed Jul 31, 2019 6:48 pm

Daniel Shawul wrote: ↑Wed Jul 31, 2019 5:24 pm
Michel wrote: ↑Wed Jul 31, 2019 8:32 am
The joinntdist was still running after 10 minutes before i had to stop it. I will try again later to see what kind of error bounds it produces compared to
the full hessian inverse method which seem to be the better approach so far IMO.
Well near its maximum the posterior is multivariate Gaussian but the posterior (if derived from a logistic function) has fatter tails (they are e^{-ax} instead of e^{-ax^2}) so it is not inconceivable that in degenerate situations (i.e. a poorly connected tournament graph and few games) the true Bayesian credibility intervals would be different from those estimated with a multivariate Gaussian (I am not saying they will be, just that it seems possible).

As usual I am interested in the mathematical side of this. Standard Bayesian practice to obtain point estimates and credibility intervals is to sample from the posterior. By accident I happen to have some experience with MCMC. It is a wonderful method that you can throw at anything but it needs fine tuning for the particular problem at hand (one issue is that it is not so easy to see when it has converged). Since in this case we already have a good approximation of the posterior (a Gaussian) perhaps there are better methods to sample from the posterior.
I run the it overnight and it still didn't finish! Note that I did not use the poorly connected graph we were discussing about, but something else with just 10 players. It looks like even a 3-dimension problem takes too long with the joint probablity distribution method. The only thing i managed to get a result out of it is for 2 players and the results are similar to the rest. Both inverse diagonal and full inverse are also indistinguishable with two players only.
In any case, this method is impractical in the current non-montecarlo implementation form. Since you have MCMC experience, maybe you can implement something for comparing it to the full hessian inverse method?

It is an enticing idea but I am rather busy professionally right now. I'll see what I can do.

In any case Gibbs sampling https://en.wikipedia.org/wiki/Gibbs_sampling appears to be well adapted to
elo estimation. It reduces the problem to sampling from the posterior for a single player against n-1 other
players with given fixed elo (and then one cycles through the players each time updating the elo with the latest sample).

The case of a single player is maybe easiest to do by discretization of the posterior.

Laskos · Post by **Laskos** » Thu Aug 01, 2019 8:43 pm

Laskos wrote: ↑Tue Jul 30, 2019 4:40 pm
Michel wrote: ↑Tue Jul 30, 2019 1:08 pm To get an elo estimate with fixed variance you should test against a fixed engine (or group of engines). But this then not good for comparing engines (it is well known that the variance of elo difference measured against
a 3d engine is 4 times as big as when measured in direct play).

Well, 2 times as big if he starts the procedure from the beginning of his line of nets. The issue might become whether that engine as an opponent is not peculiar in some ways and what really we want to measure ("strength", I suppose, but in relation to "something"). How the gating would look now? For example, would it be an orderly gating to require that each successive net performs better against "fixed opponents" than the previous net?

I am on the phone now, on a vacation. I would propose an ad-hoc approach adjusted to your needs.
First, you have to have a rough estimate of how many nets you will build. Say N=400. The error margins towards the end of the run will explode as N^(1/2) times the error margins of the first net after the anchor net if you play successive nets.
Set N^(1/2) "anchor" nets, in your case 400^(1/2) = 20 nets from ID20, 40, 60,..., to ID400. The true anchor is ID0. For ratings play the net against the last "anchor". If you hit another "anchor" ID, say ID80, play it against the anchor ID60 four time more games compared to usual nets (setting so the new "anchor" ID80).
This way, the final nets of the run will have error margins larger only by a factor of less than N^(1/4) times the initial error margins, better than previous N^(1/2). ID400 will have, if I am not completely wrong, N^(1/4)/2 or, in your case with 400 nets whole run, only some 2 times larger error margins for absolute Elo (no gating) than the error margins of the first net.

I am not sure if it's close to optimum use of resources (the effort is only slightly larger than playing successive nets, with a significant improvement of precision in absolute Elo measurement). The optimal use of resources will surely depend on the task you have to accomplish.

Laskos · Post by **Laskos** » Fri Aug 02, 2019 1:07 am

Laskos wrote: ↑Thu Aug 01, 2019 8:43 pm
Laskos wrote: ↑Tue Jul 30, 2019 4:40 pm
Michel wrote: ↑Tue Jul 30, 2019 1:08 pm To get an elo estimate with fixed variance you should test against a fixed engine (or group of engines). But this then not good for comparing engines (it is well known that the variance of elo difference measured against
a 3d engine is 4 times as big as when measured in direct play).

Well, 2 times as big if he starts the procedure from the beginning of his line of nets. The issue might become whether that engine as an opponent is not peculiar in some ways and what really we want to measure ("strength", I suppose, but in relation to "something"). How the gating would look now? For example, would it be an orderly gating to require that each successive net performs better against "fixed opponents" than the previous net?
I am on the phone now, on a vacation. I would propose an ad-hoc approach adjusted to your needs.
First, you have to have a rough estimate of how many nets you will build. Say N=400. The error margins towards the end of the run will explode as N^(1/2) times the error margins of the first net after the anchor net if you play successive nets.
Set N^(1/2) "anchor" nets, in your case 400^(1/2) = 20 nets from ID20, 40, 60,..., to ID400. The true anchor is ID0. For ratings play the net against the last "anchor". If you hit another "anchor" ID, say ID80, play it against the anchor ID60 four time more games compared to usual nets (setting so the new "anchor" ID80).
This way, the final nets of the run will have error margins larger only by a factor of less than N^(1/4) times the initial error margins, better than previous N^(1/2). ID400 will have, if I am not completely wrong, N^(1/4)/2 or, in your case with 400 nets whole run, only some 2 times larger error margins for absolute Elo (no gating) than the error margins of the first net.

I am not sure if it's close to optimum use of resources (the effort is only slightly larger than playing successive nets, with a significant improvement of precision in absolute Elo measurement). The optimal use of resources will surely depend on the task you have to accomplish.

Forgot to mention. All this ad-hoc procedure is if you don't want or cannot (too large an Elo span) play all the games against ID0 true anchor. Here you have some diversity of opponents, and Elo differences in games shouldn't be too large. The final error margins are square root tamed compared to the usual successive play. If you want even tamer error margins, sparser say N^(1/4) number of "anchors" can be projected, but maybe the Elo differences between them is too large and the number of these "anchors" is too small.

Daniel Shawul · Post by **Daniel Shawul** » Fri Aug 02, 2019 2:35 am

Laskos wrote: ↑Thu Aug 01, 2019 8:43 pm
Laskos wrote: ↑Tue Jul 30, 2019 4:40 pm
Michel wrote: ↑Tue Jul 30, 2019 1:08 pm To get an elo estimate with fixed variance you should test against a fixed engine (or group of engines). But this then not good for comparing engines (it is well known that the variance of elo difference measured against
a 3d engine is 4 times as big as when measured in direct play).

Well, 2 times as big if he starts the procedure from the beginning of his line of nets. The issue might become whether that engine as an opponent is not peculiar in some ways and what really we want to measure ("strength", I suppose, but in relation to "something"). How the gating would look now? For example, would it be an orderly gating to require that each successive net performs better against "fixed opponents" than the previous net?
I am on the phone now, on a vacation. I would propose an ad-hoc approach adjusted to your needs.
First, you have to have a rough estimate of how many nets you will build. Say N=400. The error margins towards the end of the run will explode as N^(1/2) times the error margins of the first net after the anchor net if you play successive nets.
Set N^(1/2) "anchor" nets, in your case 400^(1/2) = 20 nets from ID20, 40, 60,..., to ID400. The true anchor is ID0. For ratings play the net against the last "anchor". If you hit another "anchor" ID, say ID80, play it against the anchor ID60 four time more games compared to usual nets (setting so the new "anchor" ID80).
This way, the final nets of the run will have error margins larger only by a factor of less than N^(1/4) times the initial error margins, better than previous N^(1/2). ID400 will have, if I am not completely wrong, N^(1/4)/2 or, in your case with 400 nets whole run, only some 2 times larger error margins for absolute Elo (no gating) than the error margins of the first net.

I am not sure if it's close to optimum use of resources (the effort is only slightly larger than playing successive nets, with a significant improvement of precision in absolute Elo measurement). The optimal use of resources will surely depend on the task you have to accomplish.

Ok Kai, I will try your suggestion once the current run ends -- or maybe i start doing it from where it stopped now. I have given up on the idea of matching ID-x and x+1 once we figured bayeselo gives really high variance.

From my quick tests, It looks like every 100 net (i.e. each net with just 512 games) I am getting 150 elos so far.

best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group

Re: best way to determine elos of a group