I don't actually know what jointdist and exactdist stand for (given that we already have the two covariance options).Daniel Shawul wrote: ↑Tue Jul 30, 2019 8:51 pm There are two more options for computing intervalsWith exactdistCode: Select all
jointdist [p] ... compute intervals from joint distribution exactdist [p] ... compute intervals assuming exact opponent Elos
jointdist takes a lot of time. Will post again if it finishes.Code: Select all
Rank Name Elo + - games score oppo. draws 1 scorpio-146 415 40 40 200 56% 379 36% 2 scorpio-147 383 56 56 100 45% 415 34% 3 scorpio-145 374 40 40 200 50% 375 38% 4 scorpio-134 347 39 38 200 55% 317 43% 5 scorpio-144 335 39 39 200 48% 350 40% 6 scorpio-141 329 39 39 200 52% 315 38% 7 scorpio-143 326 40 40 200 49% 330 35% 8 scorpio-142 325 40 39 200 50% 327 37% 9 scorpio-133 319 40 39 200 51% 310 37% 10 scorpio-135 315 39 39 200 48% 327 43% 11 scorpio-137 311 40 40 200 52% 298 34% 12 scorpio-136 308 39 39 200 49% 313 37% 13 scorpio-139 308 39 39 200 52% 297 37% 14 scorpio-140 306 40 40 200 48% 318 34% 15 scorpio-138 288 39 40 200 47% 309 37% 16 scorpio-131 283 40 39 200 56% 247 39% 17 scorpio-132 273 40 40 200 46% 301 36% 18 scorpio-125 243 40 40 200 55% 210 32% 19 scorpio-128 224 39 39 200 51% 217 37% 20 scorpio-129 222 39 39 200 50% 222 40% 21 scorpio-130 221 39 39 200 45% 253 41% 22 scorpio-124 218 40 40 200 50% 218 34% 23 scorpio-127 212 39 39 200 50% 213 39% 24 scorpio-121 203 40 40 200 57% 154 35% 25 scorpio-126 201 40 40 200 46% 227 35% 26 scorpio-123 193 39 39 200 51% 185 38% 27 scorpio-118 172 39 39 200 54% 149 41% 28 scorpio-120 156 39 39 200 47% 176 38% 29 scorpio-122 152 39 40 200 43% 198 38% 30 scorpio-117 150 54 54 101 47% 170 42% 31 scorpio-119 149 39 39 200 48% 164 42% 32 scorpio-17 62 38 38 200 56% 24 50% 33 scorpio-16 38 38 38 200 50% 40 50% 34 scorpio-14 35 39 39 200 55% 5 41% 35 scorpio-15 18 38 38 200 47% 36 47% 36 scorpio-18 10 38 38 200 47% 27 50% 37 scorpio-115 10 39 39 200 53% -11 38% 38 scorpio-101 8 38 38 200 56% -25 48% 39 scorpio-116 8 55 55 101 50% 11 40% 40 scorpio-11 6 28 28 400 52% -8 42% 41 scorpio-21 6 38 38 200 53% -11 49% 42 scorpio-10 3 27 27 400 51% -2 44% 43 scorpio-99 0 39 38 200 55% -28 43% 44 scorpio-20 -6 38 38 200 49% 0 50% 45 scorpio-19 -7 37 37 200 49% 2 52% 46 scorpio-8 -7 28 28 400 52% -17 41% 47 scorpio-13 -8 38 38 200 48% 8 46% 48 scorpio-86 -11 39 39 200 53% -30 41% 49 scorpio-9 -11 27 27 400 49% -2 43% 50 scorpio-22 -17 37 37 200 50% -16 55% 51 scorpio-12 -20 31 31 300 47% 2 45% 52 scorpio-109 -21 39 39 200 55% -53 39% 53 scorpio-104 -23 39 39 200 55% -52 39% 54 scorpio-7 -24 27 27 400 51% -28 45% 55 scorpio-112 -24 39 38 200 52% -37 43% 56 scorpio-102 -24 38 38 200 49% -18 48% 57 scorpio-100 -25 38 38 200 45% 4 48% 58 scorpio-87 -27 38 38 200 49% -21 46% 59 scorpio-97 -27 39 39 200 53% -45 42% 60 scorpio-114 -30 39 40 200 48% -15 36% 61 scorpio-98 -30 39 39 200 48% -14 42% 62 scorpio-88 -30 38 38 200 51% -34 50% 63 scorpio-55 -32 38 38 200 53% -50 48% 64 scorpio-85 -33 39 39 200 52% -43 42% 65 scorpio-56 -35 38 38 200 51% -41 48% 66 scorpio-111 -35 39 39 200 51% -37 40% 67 scorpio-107 -35 39 39 200 54% -60 43% 68 scorpio-37 -36 37 37 200 51% -44 60% 69 scorpio-24 -37 38 37 200 54% -61 51% 70 scorpio-47 -38 38 37 200 54% -63 52% 71 scorpio-23 -39 37 37 200 48% -27 53% 72 scorpio-113 -39 39 39 200 48% -27 42% 73 scorpio-90 -40 39 39 200 51% -43 43% 74 scorpio-33 -40 38 38 200 54% -64 48% 75 scorpio-89 -42 38 38 200 49% -35 47% 76 scorpio-38 -43 36 36 200 50% -42 62% 77 scorpio-103 -44 39 39 200 47% -23 42% 78 scorpio-36 -44 37 37 200 51% -48 56% 79 scorpio-91 -44 38 38 200 51% -47 45% 80 scorpio-39 -47 37 37 200 50% -47 59% 81 scorpio-6 -48 27 27 400 49% -45 46% 82 scorpio-48 -48 38 38 200 52% -60 46% 83 scorpio-40 -50 37 37 200 53% -67 54% 84 scorpio-43 -50 37 37 200 53% -66 55% 85 scorpio-57 -50 38 38 200 52% -60 51% 86 scorpio-110 -51 39 40 200 47% -28 37% 87 scorpio-34 -51 37 37 200 50% -50 59% 88 scorpio-1 -53 29 29 400 56% -96 30% 89 scorpio-108 -54 38 38 200 46% -28 48% 90 scorpio-92 -55 38 38 200 50% -54 47% 91 scorpio-35 -60 37 37 200 48% -48 60% 92 scorpio-105 -60 38 38 200 47% -44 45% 93 scorpio-95 -61 38 38 200 51% -66 50% 94 scorpio-96 -61 38 39 200 48% -44 43% 95 scorpio-93 -63 39 39 200 50% -62 40% 96 scorpio-44 -64 37 37 200 50% -62 54% 97 scorpio-4 -64 28 28 400 50% -66 39% 98 scorpio-54 -65 37 37 200 49% -58 54% 99 scorpio-106 -65 39 39 200 48% -48 39% 100 scorpio-5 -66 28 28 400 49% -56 42% 101 scorpio-3 -66 28 28 400 51% -72 39% 102 scorpio-42 -68 37 37 200 50% -69 57% 103 scorpio-94 -70 38 38 200 49% -62 45% 104 scorpio-31 -73 37 37 200 52% -84 57% 105 scorpio-59 -74 38 38 200 56% -112 46% 106 scorpio-45 -74 37 37 200 50% -71 57% 107 scorpio-84 -75 38 38 200 49% -70 44% 108 scorpio-32 -77 38 38 200 47% -57 49% 109 scorpio-46 -78 37 37 200 46% -56 57% 110 scorpio-2 -80 28 28 400 47% -59 38% 111 scorpio-49 -83 38 38 200 51% -87 49% 112 scorpio-53 -83 38 38 200 50% -86 49% 113 scorpio-25 -84 38 38 200 47% -65 50% 114 scorpio-58 -86 38 38 200 46% -62 46% 115 scorpio-41 -87 37 37 200 45% -59 54% 116 scorpio-62 -90 39 39 200 58% -138 39% 117 scorpio-30 -91 37 37 200 49% -85 55% 118 scorpio-26 -93 38 38 200 52% -106 50% 119 scorpio-29 -96 38 38 200 54% -118 47% 120 scorpio-52 -107 38 38 200 50% -104 51% 121 scorpio-83 -107 38 38 200 53% -123 47% 122 scorpio-0 -111 41 42 200 42% -53 24% 123 scorpio-51 -125 37 37 200 49% -116 55% 124 scorpio-50 -125 37 37 200 46% -104 54% 125 scorpio-27 -128 38 38 200 48% -119 50% 126 scorpio-63 -134 39 39 200 49% -131 41% 127 scorpio-60 -137 38 38 200 45% -108 50% 128 scorpio-61 -141 38 39 200 46% -114 44% 129 scorpio-71 -141 39 39 200 53% -157 38% 130 scorpio-28 -144 38 38 200 45% -112 47% 131 scorpio-66 -150 38 38 200 52% -162 44% 132 scorpio-72 -155 39 39 200 52% -164 41% 133 scorpio-65 -156 38 38 200 51% -161 48% 134 scorpio-70 -159 38 38 200 51% -165 48% 135 scorpio-67 -168 39 39 200 49% -160 40% 136 scorpio-68 -171 39 39 200 51% -178 42% 137 scorpio-82 -171 38 39 200 46% -146 45% 138 scorpio-64 -172 38 39 200 46% -145 43% 139 scorpio-76 -179 39 39 200 54% -206 39% 140 scorpio-81 -185 39 39 200 53% -205 42% 141 scorpio-73 -186 38 38 200 48% -176 46% 142 scorpio-69 -188 38 38 200 46% -165 47% 143 scorpio-74 -196 39 39 200 50% -196 38% 144 scorpio-79 -203 37 37 200 55% -233 52% 145 scorpio-75 -205 39 39 200 47% -188 39% 146 scorpio-77 -207 40 40 200 50% -203 35% 147 scorpio-78 -228 38 39 200 46% -205 44% 148 scorpio-80 -238 38 38 200 43% -194 47%
best way to determine elos of a group
Moderators: hgm, Rebel, chrisw
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: best way to determine elos of a group
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: best way to determine elos of a group
Maybe these terms refer to the exact posterior distribution suitably discretized? But how would one avoid dimensional explosion? I looked in the source of BayesElo but I could not understand it.
I am not an expert in Bayesian statistics, but I thought the mathematically exact methods use Monte Carlo sampling from the posterior e.g. using MCMC (Markov Chain Monte Carlo). Such methods scale well with the dimension.
I am not an expert in Bayesian statistics, but I thought the mathematically exact methods use Monte Carlo sampling from the posterior e.g. using MCMC (Markov Chain Monte Carlo). Such methods scale well with the dimension.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: best way to determine elos of a group
Ok I am guessing that exactdist uses the (1-dimensional) posterior for one elo assuming the other elos are exact.
And jointdist uses the true posterior. It seems to me that for more than a few players the naive (non-Monte Carlo) implementation would take a lot of memory and would be slow.
And jointdist uses the true posterior. It seems to me that for more than a few players the naive (non-Monte Carlo) implementation would take a lot of memory and would be slow.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: best way to determine elos of a group
Yes, the exactdist methods is same us elostat's assumption opponent's elos are their true elos. Maybe Remi has exactdist as a benchmark to evaluateMichel wrote: ↑Tue Jul 30, 2019 11:50 pm Ok I am guessing that exactdist uses the (1-dimensional) posterior for one elo assuming the other elos are exact.
And jointdist uses the true posterior. It seems to me that for more than a few players the naive (non-Monte Carlo) implementation would take a lot of memory and would be slow.
the other methods. The results from my quick test also seem to confirm that.
The joinntdist was still running after 10 minutes before i had to stop it. I will try again later to see what kind of error bounds it produces compared to
the full hessian inverse method which seem to be the better approach so far IMO.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: best way to determine elos of a group
Well near its maximum the posterior is multivariate Gaussian but the posterior (if derived from a logistic function) has fatter tails (they are e^{-ax} instead of e^{-ax^2}) so it is not inconceivable that in degenerate situations (i.e. a poorly connected tournament graph and few games) the true Bayesian credibility intervals would be different from those estimated with a multivariate Gaussian (I am not saying they will be, just that it seems possible).The joinntdist was still running after 10 minutes before i had to stop it. I will try again later to see what kind of error bounds it produces compared to
the full hessian inverse method which seem to be the better approach so far IMO.
As usual I am interested in the mathematical side of this. Standard Bayesian practice to obtain point estimates and credibility intervals is to sample from the posterior. By accident I happen to have some experience with MCMC. It is a wonderful method that you can throw at anything but it needs fine tuning for the particular problem at hand (one issue is that it is not so easy to see when it has converged). Since in this case we already have a good approximation of the posterior (a Gaussian) perhaps there are better methods to sample from the posterior.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: best way to determine elos of a group
I run the it overnight and it still didn't finish! Note that I did not use the poorly connected graph we were discussing about, but something else with just 10 players. It looks like even a 3-dimension problem takes too long with the joint probablity distribution method. The only thing i managed to get a result out of it is for 2 players and the results are similar to the rest. Both inverse diagonal and full inverse are also indistinguishable with two players only.Michel wrote: ↑Wed Jul 31, 2019 8:32 amWell near its maximum the posterior is multivariate Gaussian but the posterior (if derived from a logistic function) has fatter tails (they are e^{-ax} instead of e^{-ax^2}) so it is not inconceivable that in degenerate situations (i.e. a poorly connected tournament graph and few games) the true Bayesian credibility intervals would be different from those estimated with a multivariate Gaussian (I am not saying they will be, just that it seems possible).The joinntdist was still running after 10 minutes before i had to stop it. I will try again later to see what kind of error bounds it produces compared to
the full hessian inverse method which seem to be the better approach so far IMO.
As usual I am interested in the mathematical side of this. Standard Bayesian practice to obtain point estimates and credibility intervals is to sample from the posterior. By accident I happen to have some experience with MCMC. It is a wonderful method that you can throw at anything but it needs fine tuning for the particular problem at hand (one issue is that it is not so easy to see when it has converged). Since in this case we already have a good approximation of the posterior (a Gaussian) perhaps there are better methods to sample from the posterior.
In any case, this method is impractical in the current non-montecarlo implementation form. Since you have MCMC experience, maybe you can implement something for comparing it to the full hessian inverse method?
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: best way to determine elos of a group
It is an enticing idea but I am rather busy professionally right now. I'll see what I can do.Daniel Shawul wrote: ↑Wed Jul 31, 2019 5:24 pmI run the it overnight and it still didn't finish! Note that I did not use the poorly connected graph we were discussing about, but something else with just 10 players. It looks like even a 3-dimension problem takes too long with the joint probablity distribution method. The only thing i managed to get a result out of it is for 2 players and the results are similar to the rest. Both inverse diagonal and full inverse are also indistinguishable with two players only.Michel wrote: ↑Wed Jul 31, 2019 8:32 amWell near its maximum the posterior is multivariate Gaussian but the posterior (if derived from a logistic function) has fatter tails (they are e^{-ax} instead of e^{-ax^2}) so it is not inconceivable that in degenerate situations (i.e. a poorly connected tournament graph and few games) the true Bayesian credibility intervals would be different from those estimated with a multivariate Gaussian (I am not saying they will be, just that it seems possible).The joinntdist was still running after 10 minutes before i had to stop it. I will try again later to see what kind of error bounds it produces compared to
the full hessian inverse method which seem to be the better approach so far IMO.
As usual I am interested in the mathematical side of this. Standard Bayesian practice to obtain point estimates and credibility intervals is to sample from the posterior. By accident I happen to have some experience with MCMC. It is a wonderful method that you can throw at anything but it needs fine tuning for the particular problem at hand (one issue is that it is not so easy to see when it has converged). Since in this case we already have a good approximation of the posterior (a Gaussian) perhaps there are better methods to sample from the posterior.
In any case, this method is impractical in the current non-montecarlo implementation form. Since you have MCMC experience, maybe you can implement something for comparing it to the full hessian inverse method?
In any case Gibbs sampling https://en.wikipedia.org/wiki/Gibbs_sampling appears to be well adapted to
elo estimation. It reduces the problem to sampling from the posterior for a single player against n-1 other
players with given fixed elo (and then one cycles through the players each time updating the elo with the latest sample).
The case of a single player is maybe easiest to do by discretization of the posterior.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: best way to determine elos of a group
I am on the phone now, on a vacation. I would propose an ad-hoc approach adjusted to your needs.Laskos wrote: ↑Tue Jul 30, 2019 4:40 pmWell, 2 times as big if he starts the procedure from the beginning of his line of nets. The issue might become whether that engine as an opponent is not peculiar in some ways and what really we want to measure ("strength", I suppose, but in relation to "something"). How the gating would look now? For example, would it be an orderly gating to require that each successive net performs better against "fixed opponents" than the previous net?Michel wrote: ↑Tue Jul 30, 2019 1:08 pm To get an elo estimate with fixed variance you should test against a fixed engine (or group of engines). But this then not good for comparing engines (it is well known that the variance of elo difference measured against
a 3d engine is 4 times as big as when measured in direct play).
First, you have to have a rough estimate of how many nets you will build. Say N=400. The error margins towards the end of the run will explode as N^(1/2) times the error margins of the first net after the anchor net if you play successive nets.
Set N^(1/2) "anchor" nets, in your case 400^(1/2) = 20 nets from ID20, 40, 60,..., to ID400. The true anchor is ID0. For ratings play the net against the last "anchor". If you hit another "anchor" ID, say ID80, play it against the anchor ID60 four time more games compared to usual nets (setting so the new "anchor" ID80).
This way, the final nets of the run will have error margins larger only by a factor of less than N^(1/4) times the initial error margins, better than previous N^(1/2). ID400 will have, if I am not completely wrong, N^(1/4)/2 or, in your case with 400 nets whole run, only some 2 times larger error margins for absolute Elo (no gating) than the error margins of the first net.
I am not sure if it's close to optimum use of resources (the effort is only slightly larger than playing successive nets, with a significant improvement of precision in absolute Elo measurement). The optimal use of resources will surely depend on the task you have to accomplish.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: best way to determine elos of a group
Forgot to mention. All this ad-hoc procedure is if you don't want or cannot (too large an Elo span) play all the games against ID0 true anchor. Here you have some diversity of opponents, and Elo differences in games shouldn't be too large. The final error margins are square root tamed compared to the usual successive play. If you want even tamer error margins, sparser say N^(1/4) number of "anchors" can be projected, but maybe the Elo differences between them is too large and the number of these "anchors" is too small.Laskos wrote: ↑Thu Aug 01, 2019 8:43 pmI am on the phone now, on a vacation. I would propose an ad-hoc approach adjusted to your needs.Laskos wrote: ↑Tue Jul 30, 2019 4:40 pmWell, 2 times as big if he starts the procedure from the beginning of his line of nets. The issue might become whether that engine as an opponent is not peculiar in some ways and what really we want to measure ("strength", I suppose, but in relation to "something"). How the gating would look now? For example, would it be an orderly gating to require that each successive net performs better against "fixed opponents" than the previous net?Michel wrote: ↑Tue Jul 30, 2019 1:08 pm To get an elo estimate with fixed variance you should test against a fixed engine (or group of engines). But this then not good for comparing engines (it is well known that the variance of elo difference measured against
a 3d engine is 4 times as big as when measured in direct play).
First, you have to have a rough estimate of how many nets you will build. Say N=400. The error margins towards the end of the run will explode as N^(1/2) times the error margins of the first net after the anchor net if you play successive nets.
Set N^(1/2) "anchor" nets, in your case 400^(1/2) = 20 nets from ID20, 40, 60,..., to ID400. The true anchor is ID0. For ratings play the net against the last "anchor". If you hit another "anchor" ID, say ID80, play it against the anchor ID60 four time more games compared to usual nets (setting so the new "anchor" ID80).
This way, the final nets of the run will have error margins larger only by a factor of less than N^(1/4) times the initial error margins, better than previous N^(1/2). ID400 will have, if I am not completely wrong, N^(1/4)/2 or, in your case with 400 nets whole run, only some 2 times larger error margins for absolute Elo (no gating) than the error margins of the first net.
I am not sure if it's close to optimum use of resources (the effort is only slightly larger than playing successive nets, with a significant improvement of precision in absolute Elo measurement). The optimal use of resources will surely depend on the task you have to accomplish.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: best way to determine elos of a group
Ok Kai, I will try your suggestion once the current run ends -- or maybe i start doing it from where it stopped now. I have given up on the idea of matching ID-x and x+1 once we figured bayeselo gives really high variance.Laskos wrote: ↑Thu Aug 01, 2019 8:43 pmI am on the phone now, on a vacation. I would propose an ad-hoc approach adjusted to your needs.Laskos wrote: ↑Tue Jul 30, 2019 4:40 pmWell, 2 times as big if he starts the procedure from the beginning of his line of nets. The issue might become whether that engine as an opponent is not peculiar in some ways and what really we want to measure ("strength", I suppose, but in relation to "something"). How the gating would look now? For example, would it be an orderly gating to require that each successive net performs better against "fixed opponents" than the previous net?Michel wrote: ↑Tue Jul 30, 2019 1:08 pm To get an elo estimate with fixed variance you should test against a fixed engine (or group of engines). But this then not good for comparing engines (it is well known that the variance of elo difference measured against
a 3d engine is 4 times as big as when measured in direct play).
First, you have to have a rough estimate of how many nets you will build. Say N=400. The error margins towards the end of the run will explode as N^(1/2) times the error margins of the first net after the anchor net if you play successive nets.
Set N^(1/2) "anchor" nets, in your case 400^(1/2) = 20 nets from ID20, 40, 60,..., to ID400. The true anchor is ID0. For ratings play the net against the last "anchor". If you hit another "anchor" ID, say ID80, play it against the anchor ID60 four time more games compared to usual nets (setting so the new "anchor" ID80).
This way, the final nets of the run will have error margins larger only by a factor of less than N^(1/4) times the initial error margins, better than previous N^(1/2). ID400 will have, if I am not completely wrong, N^(1/4)/2 or, in your case with 400 nets whole run, only some 2 times larger error margins for absolute Elo (no gating) than the error margins of the first net.
I am not sure if it's close to optimum use of resources (the effort is only slightly larger than playing successive nets, with a significant improvement of precision in absolute Elo measurement). The optimal use of resources will surely depend on the task you have to accomplish.
From my quick tests, It looks like every 100 net (i.e. each net with just 512 games) I am getting 150 elos so far.