Derivation of bayeselo formula

Rémi Coulom · Post by **Rémi Coulom** » Tue Aug 07, 2012 6:59 pm

Hi,

I have been asked for a derivation of MM for bayeselo. I uploaded a scan of some hand-made notes I made when reading Hunter's papers. I started to make a clean latex version of those notes, but gave up because it is a bit complicated. There are 12 pages in total:

pages 1: notes taken while reading the MM paper. Hunter goes fast, so I had to make simple calculations to follow the paper.
pages 2-3-4-5: derivation of the MM formula for first-play advantage and draw at the same time.
pages 6-7-8-9-10: Hessian matrix and LOS.
pages 11-12: notes about multi-dimensional Elo (never really used).

http://remi.coulom.free.fr/Bayesian-Elo/MMNotes.pdf

Rémi

ZirconiumX · Post by **ZirconiumX** » Tue Aug 07, 2012 7:11 pm

Your handwriting is very neat, much neater than mine!

Matthew:out

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 07, 2012 7:18 pm

Thanks a lot! That was exactly what I needed a white paper.
I hope every one was as generous as you to share their work.
I am beginning to see all the things I missed even starting on the first page. Needless to say I wouldn't have finished it anyway.. The only helpful explanation I found were in your other Go patterns paper. The rest from other sources were either too cryptic for me or do only for the non-draw,non-home advantage case.
Again it is very much appreciated!
Daniel

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 14, 2012 5:32 am

Hi Remi
I derived new model equations using Davidson draw model following your example "monkey see monkey do" style. There is one difference where geometric mean of gammas is minorized with an arthimetic mean but other than that it is straight forward. I don't know if you have already tested that model in bayeselo but anyway I implemented it. The result is bad _most probably_ because of a mistake in my derivation. From tests I did with it, it converges quickly and the elos seem to be highly inflated but the order of ranking is still sensible. The modified update equations are given below. If this is correct I will finish up the derivation for covariant matrix and finish the implementation. If (big if) I did not make a mistake it may require big scaling factors. You can replace the relevant part of CBradleyTerry.cpp with the below code.

Code: Select all

/////////////////////////////////////////////////////////////////////////////
//
// Rémi Coulom
//
// December, 2004
//
/////////////////////////////////////////////////////////////////////////////
#include "CBradleyTerry.h"
#include "CCondensedResults.h"
#include "CResultSet.h"
#include "CCDistribution.h"
#include "CMatrix.h"
#include "CLUDecomposition.h"
#include "random.h"

#include <cmath>

#include <iostream>
#include "CMatrixIO.h"

#define TEST

/////////////////////////////////////////////////////////////////////////////
// One iteration of the MM algorithm on gammas
/////////////////////////////////////////////////////////////////////////////
void CBradleyTerry&#58;&#58;UpdateGammas&#40;)
&#123;
 //
 // Loop over players
 //
 for &#40;int Player = crs.GetPlayers&#40;); --Player >= 0;)
 &#123;
  double A = 0;
  double B = 0;
#ifdef TEST
  double D1 = 0,D2 = 0;
#endif
  for &#40;int j = crs.GetOpponents&#40;Player&#41;; --j >= 0;)
  &#123;
   const CCondensedResult &cr = crs.GetCondensedResult&#40;Player, j&#41;;

   double OpponentGamma;
   if &#40;cr.Opponent > Player&#41;
    OpponentGamma = pNextGamma&#91;cr.Opponent&#93;;
   else
    OpponentGamma = pGamma&#91;cr.Opponent&#93;;

#ifndef TEST
   A += cr.w_ij + cr.d_ij + cr.l_ji + cr.d_ji;
   B += &#40;cr.d_ij + cr.w_ij&#41; * ThetaW /
        &#40;ThetaW * pGamma&#91;Player&#93; + ThetaD * OpponentGamma&#41; +
        &#40;cr.d_ij + cr.l_ij&#41; * ThetaD * ThetaW /
        &#40;ThetaD * ThetaW * pGamma&#91;Player&#93; + OpponentGamma&#41; +
        &#40;cr.d_ji + cr.w_ji&#41; * ThetaD /
        &#40;ThetaW * OpponentGamma + ThetaD * pGamma&#91;Player&#93;) +
        &#40;cr.d_ji + cr.l_ji&#41; / 
        &#40;ThetaD * ThetaW * OpponentGamma + pGamma&#91;Player&#93;);
#else
   D1 = ThetaD * std&#58;&#58;sqrt&#40;ThetaW * OpponentGamma / pGamma&#91;Player&#93;) / 2;
   D2 = ThetaD * std&#58;&#58;sqrt&#40;ThetaW * OpponentGamma * pGamma&#91;Player&#93;);

   A += cr.w_ij + cr.d_ij / 2 + cr.l_ji + cr.d_ji / 2;
   B +=	&#40;ThetaW + D1&#41; * &#40;cr.w_ij + cr.d_ij + cr.l_ij&#41; / 
	    &#40;ThetaW * pGamma&#91;Player&#93; + OpponentGamma + D2&#41; +
		&#40;1      + D1&#41; * &#40;cr.w_ji + cr.d_ji + cr.l_ji&#41; / 
		&#40;ThetaW * OpponentGamma + pGamma&#91;Player&#93; + D2&#41;;
#endif
  &#125;
  pNextGamma&#91;Player&#93; = A / B;
 &#125;

 //
 // Swap buffers to prepare next iteration
 //
 &#123;
  double *pTemp = pGamma;
  pGamma = pNextGamma;
  pNextGamma = pTemp;
 &#125;
&#125;

/////////////////////////////////////////////////////////////////////////////
// MM on ThetaW
/////////////////////////////////////////////////////////////////////////////
double CBradleyTerry&#58;&#58;UpdateThetaW&#40;)
&#123;
#ifndef TEST
 double Numerator = 0;
 double Denominator = 0;
#else
 double C = 0,D = 0,E = 0,D2;
#endif

 for &#40;int Player = crs.GetPlayers&#40;); --Player >= 0;)
  for &#40;int j = crs.GetOpponents&#40;Player&#41;; --j >= 0;)
  &#123;
   const CCondensedResult &cr = crs.GetCondensedResult&#40;Player, j&#41;;
   double OpponentGamma = pGamma&#91;cr.Opponent&#93;;

#ifndef TEST
   Numerator += cr.w_ij + cr.d_ij;
   Denominator += &#40;cr.d_ij + cr.w_ij&#41; * pGamma&#91;Player&#93; /
                  &#40;ThetaW * pGamma&#91;Player&#93; + ThetaD * OpponentGamma&#41; +
                  &#40;cr.d_ij + cr.l_ij&#41; * ThetaD * pGamma&#91;Player&#93; /
                  &#40;ThetaD * ThetaW * pGamma&#91;Player&#93; + OpponentGamma&#41;;
#else
   D2 = ThetaD * std&#58;&#58;sqrt&#40;ThetaW * OpponentGamma * pGamma&#91;Player&#93;);

   C += cr.w_ij + cr.d_ij / 2;
   D += &#40;cr.l_ij + cr.d_ij + cr.w_ij&#41; * pGamma&#91;Player&#93; /
        &#40;ThetaW * pGamma&#91;Player&#93; + OpponentGamma + D2&#41;;
   E += &#40;cr.l_ij + cr.d_ij + cr.w_ij&#41; * ThetaD * sqrt&#40;OpponentGamma * pGamma&#91;Player&#93;) /
        &#40;ThetaW * pGamma&#91;Player&#93; + OpponentGamma + D2&#41;;
#endif
  &#125;
#ifndef TEST
 return Numerator / Denominator;
#else
  C = ((-E / 2&#41; + std&#58;&#58;sqrt&#40;&#40;E * E&#41;/ 4 + 4 * C * D&#41;) / &#40;2 * D&#41;;
  return C * C;
#endif
&#125;

/////////////////////////////////////////////////////////////////////////////
// MM on ThetaD
/////////////////////////////////////////////////////////////////////////////
double CBradleyTerry&#58;&#58;UpdateThetaD&#40;)
&#123;
 double Numerator = 0;
 double Denominator = 0;
#ifdef TEST
 double D2 = 0;
#endif

 for &#40;int Player = crs.GetPlayers&#40;); --Player >= 0;)
  for &#40;int j = crs.GetOpponents&#40;Player&#41;; --j >= 0;)
  &#123;
   const CCondensedResult &cr = crs.GetCondensedResult&#40;Player, j&#41;;
   double OpponentGamma = pGamma&#91;cr.Opponent&#93;;
#ifndef TEST
   Numerator += cr.d_ij;
   Denominator += &#40;cr.d_ij + cr.w_ij&#41; * OpponentGamma /
                  &#40;ThetaW * pGamma&#91;Player&#93; + ThetaD * OpponentGamma&#41; +
                  &#40;cr.d_ij + cr.l_ij&#41; * ThetaW * pGamma&#91;Player&#93; /
                  &#40;ThetaD * ThetaW * pGamma&#91;Player&#93; + OpponentGamma&#41;;
#else
   D2 = ThetaD * std&#58;&#58;sqrt&#40;ThetaW * OpponentGamma * pGamma&#91;Player&#93;);

   Numerator += cr.d_ij;
   Denominator += &#40;cr.l_ij + cr.d_ij + cr.w_ij&#41; * &#40;D2 / ThetaD&#41; /
                  &#40;ThetaW * pGamma&#91;Player&#93; + OpponentGamma + D2&#41;;
#endif
  &#125;

 double C = Numerator / Denominator;

#ifndef TEST
 return C + std&#58;&#58;sqrt&#40;C * C + 1&#41;;
#else
 return C;
#endif
&#125;

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 14, 2012 8:47 am

Well the problem of inflation is due to the assumption of 2 draws = 1 win + 1 draw. Most of the time this doubles the range of elos compared to the old 1draw = 1 win + 1 draw. Out of curiosity, I applied a factor of 2 to most of the draw terms in the davidson model and the elo range became very close to the rao-kapper model. Unless I misunderstood how the draws are counted ,the models give very different results by default. Anyway I will finish the covariance matrix so that los can be calculated.

Rémi Coulom · Post by **Rémi Coulom** » Tue Aug 14, 2012 9:45 am

Very interesting. I am curious to see the results.

I had started to implement alternative models in bayeselo at the time I wrote the unfinished paper I posted here earlier. But I did not try to MM them. My plan was to use Newton's method or Conjugate Gradient. I don't expect it will be possible to apply MM to Glenn-David.

I recommend normalizing elo scales by having the same derivative at zero of the expected gain (p(win)+p(draw)/2). That's how I did for the original bayeselo.

Rémi

diep · Post by **diep** » Tue Aug 14, 2012 10:37 am

Rémi Coulom wrote:Very interesting. I am curious to see the results.

I had started to implement alternative models in bayeselo at the time I wrote the unfinished paper I posted here earlier. But I did not try to MM them. My plan was to use Newton's method or Conjugate Gradient. I don't expect it will be possible to apply MM to Glenn-David.

I recommend normalizing elo scales by having the same derivative at zero of the expected gain (p(win)+p(draw)/2). That's how I did for the original bayeselo.

Rémi

hi Remi,

What seems very popular nowadays is that all sorts of engines do learning in a rather hard manner. Hard i mean: difficult to turn off.

Basically most follow roughly this pattern: if you lose a line, or even draw it, pick another line. If you win a line, repeat it.

A small difference in objective elostrength already can completely dominate the outcome of the match and enlarge the difference.

We still assumed the same book of course for both engines, which isn't realistic either.

One of the reasons for the huge difference in outcome is simply the fact that most books have a very thin 'tournament book line'. Just a few moves are inside that.

If an opponent engine happens to be stronger in one of those tournament lines, then all lines around it probably have a similar outcome as well (assumption). That renders the tournament book suddenly as less useful in such case. Now you can of course suddenly move then to an entire different opening - which is what most booklearners already for 15+ years do.

You get not seldom in old sidelines then.

A problem of old sidelines is that there usually is a refutation for it, or some line that kind of gives practical high chances to beat that old line.

In short winning the first few games of a match in different openings is really important.

For computerchess it would be important to model this. How would you do that?

It's pretty important in this: if you play a 3000 game match like Ernst A Heinz did years ago, in a GUI where you cannot turn off learning (in Fritz you could turn it off for 1 game, but then the 2nd and further it would be turned on again), so the 3000 game match gets heavily influenced by learning; in short objective statistical independant measurements of a 3000 game match is not even close to what practical happens there.

Kind Regards,
Vincent

p.s. several engines when you turn off in their UCI settings the learning they still learn. Note that booklearning overlaps here with positionlearning.

On this CCC i have seen several guys post outputs from positions on engines where they all always fall for the learning trick.

It's really really effective in fooling even the most advanced users.

Even in the random book matches people get fooled.
Think of this: you have a score for a position P with moves made P+1, P+2 moves etc. You lose that position having the white colors because of response X from the opponent.

Next game, you have reversed colors, matches get played a lot like that nowadays; the score simply gets used in your hashtable now of your own engine.

It might not help much, but sometimes it does; engines are world champion in making similar mistakes. Especially with todays nearly identical evaluation functions of a lot of engines.

So even position learning influences the outcome. It's not just the plain booklearning only. It's a scala of tricks.

How do you model that?

Rémi Coulom · Post by **Rémi Coulom** » Tue Aug 14, 2012 11:33 am

diep wrote:What seems very popular nowadays is that all sorts of engines do learning in a rather hard manner. Hard i mean: difficult to turn off.

Hi Vincent,

Bayeselo assumes players of constant strength. Measuring changes in strength caused by learning is much more difficult. It may be possible to adapt WHR to do it:
http://remi.coulom.free.fr/WHR/
But if you want to measure accurately the change in strength caused by a change in your algorithm, it is considerably more efficient to use bayeselo with opponents that don't learn.

Rémi

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 14, 2012 11:48 am

Rémi Coulom wrote:Very interesting. I am curious to see the results.

I had started to implement alternative models in bayeselo at the time I wrote the unfinished paper I posted here earlier. But I did not try to MM them. My plan was to use Newton's method or Conjugate Gradient. I don't expect it will be possible to apply MM to Glenn-David.

Yes the Glenn-David model is gaussian so it can not be represented by the bradley terry strenght ratio. I have plans to implement a true skill type approach i.e incremental updates of sigma & mu to compare it with this methods. But I need to study a bit about factor graphs and other machine learning topics before doing that ..

I recommend normalizing elo scales by having the same derivative at zero of the expected gain (p(win)+p(draw)/2). That's how I did for the original bayeselo.

Rémi

The inflation is after using the scale parameter. For logistic df(x)/dx = f(x) (1 - f(x)) from which you derived your equation to match a slope of 1/4. But this will not cure it because in this case the draws are almost halved. The scaling I was talking about is to be applied to the draws indepndently, and I only get close results when I multiply draws by 2. Anyway I have implemented variance and los now so here are some results. For the new models the 'scale' parameter after mm is close to 1.0

Old Model

Code: Select all

ResultSet-EloRating>ratings
Rank Name                      Elo    +    - games score oppo. draws
   1 Bobcat 3.25-x64-2cpu      225   60   60    92   80%   -10   26%
   2 Tornado 4.88-x64-2cpu     204   57   57    92   78%    -9   30%
   3 Delfi 5.4-2cpu             73   53   53    92   61%    -3   37%
   4 ChessTiger 2007.1          68   52   52    92   61%    -3   46%
   5 Movei 0.08.438             42   51   51    92   57%    -2   51%
   6 Deuterium 12.01.30.15      28   52   52    92   55%    -1   45%
   7 Hamsters 0.7.1-2cpu        20   52   52    92   53%    -1   48%
   8 WildCat 8.0                14   52   52    92   52%    -1   48%
   9 Arasan 13.4-x64-2cpu       12   54   54    92   51%    -1   30%
  10 Ruffian 2.1.0               7   52   52    92   51%     0   40%
  11 Philou 3.70-x64             1   53   53    92   50%     0   39%
  12 Brutus 8.05-x64-JA          0   53   53    92   49%     0   42%
  13 Pro Deo 1.74               -4   53   53    92   49%     0   41%
  14 Zarkov 6.44                -5   52   52    92   49%     0   47%
  15 Alaric 707                 -8   54   54    92   49%     0   33%
  16 SlowChess blitz WV2.1     -44   52   52    92   43%     2   43%
  17 Garbochesss 3.00-x64-JA   -45   54   54    92   44%     2   32%
  18 E.T.Chess 130108          -58   52   52    92   42%     3   42%
  19 DanaSah 4.88              -60   52   52    92   41%     3   41%
  20 Amyan 1.7.2               -65   55   55    92   40%     3   33%
  21 Cheng3 1.07a-x64          -69   54   54    92   40%     3   37%
  22 Cyrano 0.6b17-x64         -86   55   55    92   38%     4   29%
  23 Hermann 2.8-x64-2cpu     -113   57   57    92   35%     5   24%
  24 Pseudo 0.7c              -138   55   55    92   30%     6   33%
ResultSet-EloRating>los
                         Bo To De Ch Mo De Ha Wi Ar Ru Ph Br Pr Za Al Sl Ga E. Da Am Ch Cy He Ps
Bobcat 3.25-x64-2cpu        69 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99100
Tornado 4.88-x64-2cpu    30    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99100
Delfi 5.4-2cpu            0  0    54 79 89 92 94 94 96 97 97 98 98 98 99 99 99 99 99 99 99 99 99
ChessTiger 2007.1         0  0 45    76 86 90 93 93 95 96 96 97 97 97 99 99 99 99 99 99 99 99 99
Movei 0.08.438            0  0 20 23    65 72 78 79 83 86 87 89 90 91 99 99 99 99 99 99 99 99 99
Deuterium 12.01.30.15     0  0 10 13 34    57 64 66 70 75 77 80 81 82 97 97 99 99 99 99 99 99 99
Hamsters 0.7.1-2cpu       0  0  7  9 27 42    56 59 63 69 71 74 75 77 96 95 98 98 98 99 99 99 99
WildCat 8.0               0  0  5  6 21 35 43    52 57 63 65 68 69 72 94 94 97 97 98 98 99 99 99
Arasan 13.4-x64-2cpu      0  0  5  6 20 33 40 47    54 60 62 66 67 69 93 93 96 97 97 98 99 99 99
Ruffian 2.1.0             0  0  3  4 16 29 36 42 45    56 58 61 62 65 92 91 96 96 97 97 99 99 99
Philou 3.70-x64           0  0  2  3 13 24 30 36 39 43    51 55 56 59 89 88 94 95 96 96 98 99 99
Brutus 8.05-x64-JA        0  0  2  3 12 22 28 34 37 41 48    53 55 57 88 88 94 94 95 96 98 99 99
Pro Deo 1.74              0  0  1  2 10 19 25 31 33 38 44 46    51 54 86 86 92 93 94 95 98 99 99
Zarkov 6.44               0  0  1  2  9 18 24 30 32 37 43 44 48    52 85 85 92 93 94 95 98 99 99
Alaric 707                0  0  1  2  8 17 22 27 30 34 40 42 45 47    83 83 91 91 93 94 97 99 99
SlowChess blitz WV2.1     0  0  0  0  0  2  3  5  6  7 10 11 13 14 16    50 64 65 70 74 86 96 99
Garbochesss 3.00-x64-JA   0  0  0  0  0  2  4  5  6  8 11 11 13 14 16 49    63 65 70 73 85 95 99
E.T.Chess 130108          0  0  0  0  0  0  1  2  3  3  5  5  7  7  8 35 36    51 57 61 76 92 98
DanaSah 4.88              0  0  0  0  0  0  1  2  2  3  4  5  6  6  8 34 34 48    55 59 75 91 98
Amyan 1.7.2               0  0  0  0  0  0  1  1  2  2  3  4  5  5  6 29 29 42 44    53 69 88 96
Cheng3 1.07a-x64          0  0  0  0  0  0  0  1  1  2  3  3  4  4  5 25 26 38 40 46    67 86 96
Cyrano 0.6b17-x64         0  0  0  0  0  0  0  0  0  0  1  1  1  1  2 13 14 23 24 30 32    75 90
Hermann 2.8-x64-2cpu      0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  3  4  7  8 11 13 24    73
Pseudo 0.7c               0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  3  3  9 26

New model default (factor = 1 for draws)

Code: Select all


ResultSet-EloRating>ratings
Rank Name                      Elo    +    - games score oppo. draws
   1 Bobcat 3.25-x64-2cpu      412  115  115    92   80%   -18   26%
   2 Tornado 4.88-x64-2cpu     374  111  111    92   78%   -16   30%
   3 Delfi 5.4-2cpu            128   95   95    92   61%    -6   37%
   4 ChessTiger 2007.1         128   95   95    92   61%    -6   46%
   5 Movei 0.08.438             81   94   94    92   57%    -4   51%
   6 Deuterium 12.01.30.15      55   93   93    92   55%    -2   45%
   7 Hamsters 0.7.1-2cpu        35   93   93    92   53%    -2   48%
   8 WildCat 8.0                23   93   93    92   52%    -1   48%
   9 Arasan 13.4-x64-2cpu       10   93   93    92   51%     0   30%
  10 Ruffian 2.1.0               3   93   93    92   51%     0   40%
  11 Philou 3.70-x64            -3   93   93    92   50%     0   39%
  12 Zarkov 6.44               -10   93   93    92   49%     0   47%
  13 Brutus 8.05-x64-JA        -10   93   93    92   49%     0   42%
  14 Alaric 707                -16   93   93    92   49%     1   33%
  15 Pro Deo 1.74              -16   93   93    92   49%     1   41%
  16 Garbochesss 3.00-x64-JA   -75   93   93    92   44%     3   32%
  17 SlowChess blitz WV2.1     -81   93   93    92   43%     4   43%
  18 E.T.Chess 130108         -101   94   94    92   42%     4   42%
  19 DanaSah 4.88             -107   94   94    92   41%     5   41%
  20 Cheng3 1.07a-x64         -121   94   94    92   40%     5   37%
  21 Amyan 1.7.2              -121   94   94    92   40%     5   33%
  22 Cyrano 0.6b17-x64        -154   95   95    92   38%     7   29%
  23 Hermann 2.8-x64-2cpu     -189   96   96    92   35%     8   24%
  24 Pseudo 0.7c              -246   99   99    92   30%    11   33%
ResultSet-EloRating>los
                         Bo To De Ch Mo De Ha Wi Ar Ru Ph Za Br Al Pr Ga Sl E. Da Ch Am Cy He Ps
Bobcat 3.25-x64-2cpu        67 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99100 99 99 99100
Tornado 4.88-x64-2cpu    32    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99100 99 99 99 99
Delfi 5.4-2cpu            0  0    50 74 85 90 93 95 96 97 97 97 98 98 99 99 99 99 99 99 99 99 99
ChessTiger 2007.1         0  0 49    74 85 90 93 95 96 97 97 97 98 98 99 99 99 99 99 99 99 99 99
Movei 0.08.438            0  0 25 25    64 74 80 85 87 89 90 90 92 92 98 99 99 99 99 99 99 99 99
Deuterium 12.01.30.15     0  0 14 14 35    61 68 74 77 80 82 82 85 85 97 97 98 99 99 99 99 99 99
Hamsters 0.7.1-2cpu       0  0  9  9 25 38    57 64 68 71 74 74 77 77 94 95 97 98 99 98 99 99 99
WildCat 8.0               0  0  6  6 19 31 42    57 61 64 68 68 71 71 92 93 96 97 99 98 99 99 99
Arasan 13.4-x64-2cpu      0  0  4  4 14 25 35 42    53 57 61 61 64 64 89 90 94 95 99 97 99 99 99
Ruffian 2.1.0             0  0  3  3 12 22 31 38 46    53 57 57 61 61 87 89 93 94 99 96 98 99 99
Philou 3.70-x64           0  0  2  2 10 19 28 35 42 46    53 53 57 57 85 87 92 93 99 95 98 99 99
Zarkov 6.44               0  0  2  2  9 17 25 31 38 42 46    50 53 53 82 85 90 92 98 94 98 99 99
Brutus 8.05-x64-JA        0  0  2  2  9 17 25 31 38 42 46 49    53 53 82 85 90 92 98 94 98 99 99
Alaric 707                0  0  1  1  7 14 22 28 35 38 42 46 46    50 80 82 89 90 98 93 97 99 99
Pro Deo 1.74              0  0  1  1  7 14 22 28 35 38 42 46 46 49    80 82 89 90 98 93 97 99 99
Garbochesss 3.00-x64-JA   0  0  0  0  1  2  5  7 10 12 14 17 17 19 19    53 64 68 83 74 87 94 99
SlowChess blitz WV2.1     0  0  0  0  0  2  4  6  9 10 12 14 14 17 17 46    61 64 79 71 85 93 98
E.T.Chess 130108          0  0  0  0  0  1  2  3  5  6  7  9  9 10 10 35 38    53 65 61 77 89 97
DanaSah 4.88              0  0  0  0  0  0  1  2  4  5  6  7  7  9  9 31 35 46    60 57 74 87 97
Cheng3 1.07a-x64          0  0  0  0  0  0  0  0  0  0  0  1  1  1  1 16 20 34 39    50 75 91 99
Amyan 1.7.2               0  0  0  0  0  0  1  1  2  3  4  5  5  6  6 25 28 38 42 49    68 83 96
Cyrano 0.6b17-x64         0  0  0  0  0  0  0  0  0  1  1  1  1  2  2 12 14 22 25 24 31    68 89
Hermann 2.8-x64-2cpu      0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  5  6 10 12  8 16 31    78
Pseudo 0.7c               0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  2  2  0  3 10 21

New model (factor = 2 for draws but same equation as davidson's)

Code: Select all

ResultSet-EloRating>ratings
Rank Name                      Elo    +    - games score oppo. draws
   1 Bobcat 3.25-x64-2cpu      250   86   86    92   80%   -11   26%
   2 Tornado 4.88-x64-2cpu     225   82   82    92   78%   -10   30%
   3 Delfi 5.4-2cpu             75   73   73    92   61%    -3   37%
   4 ChessTiger 2007.1          71   71   71    92   61%    -3   46%
   5 Movei 0.08.438             42   69   69    92   57%    -2   51%
   6 Deuterium 12.01.30.15      26   70   70    92   55%    -1   45%
   7 Hamsters 0.7.1-2cpu        19   70   70    92   53%    -1   48%
   8 WildCat 8.0                14   70   70    92   52%    -1   48%
   9 Arasan 13.4-x64-2cpu       10   74   74    92   51%     0   30%
  10 Ruffian 2.1.0               8   71   71    92   51%     0   40%
  11 Brutus 8.05-x64-JA          0   71   71    92   49%     0   42%
  12 Philou 3.70-x64            -1   71   71    92   50%     0   39%
  13 Pro Deo 1.74               -5   71   71    92   49%     0   41%
  14 Zarkov 6.44                -6   70   70    92   49%     0   47%
  15 Alaric 707                 -9   73   73    92   49%     0   33%
  16 SlowChess blitz WV2.1     -48   71   71    92   43%     2   43%
  17 Garbochesss 3.00-x64-JA   -49   74   74    92   44%     2   32%
  18 E.T.Chess 130108          -63   71   71    92   42%     3   42%
  19 DanaSah 4.88              -64   71   71    92   41%     3   41%
  20 Amyan 1.7.2               -69   74   74    92   40%     3   33%
  21 Cheng3 1.07a-x64          -72   73   73    92   40%     3   37%
  22 Cyrano 0.6b17-x64         -90   75   75    92   38%     4   29%
  23 Hermann 2.8-x64-2cpu     -118   78   78    92   35%     5   24%
  24 Pseudo 0.7c              -148   76   76    92   30%     6   33%
ResultSet-EloRating>los
                         Bo To De Ch Mo De Ha Wi Ar Ru Br Ph Pr Za Al Sl Ga E. Da Am Ch Cy He Ps
Bobcat 3.25-x64-2cpu        65 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
Tornado 4.88-x64-2cpu    34    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
Delfi 5.4-2cpu            0  0    52 73 82 85 87 88 89 91 92 93 93 94 98 98 99 99 99 99 99 99 99
ChessTiger 2007.1         0  0 47    71 80 84 86 87 88 91 91 92 92 93 98 98 99 99 99 99 99 99 99
Movei 0.08.438            0  0 26 28    62 66 70 72 73 78 79 81 81 83 95 95 97 97 98 99 99 99 99
Deuterium 12.01.30.15     0  0 17 19 37    54 58 61 62 68 69 71 72 74 92 91 95 95 96 99 98 99 99
Hamsters 0.7.1-2cpu       0  0 14 15 33 45    53 56 58 64 64 67 68 70 90 90 94 94 95 99 97 99 99
WildCat 8.0               0  0 12 13 29 41 46    52 54 60 61 64 65 66 88 88 92 93 94 99 97 99 99
Arasan 13.4-x64-2cpu      0  0 11 12 27 38 43 47    51 57 58 61 61 63 86 86 91 91 92 98 96 98 99
Ruffian 2.1.0             0  0 10 11 26 37 41 45 48    55 56 59 60 62 85 85 91 91 92 98 96 98 99
Brutus 8.05-x64-JA        0  0  8  8 21 31 35 39 42 44    50 53 54 57 81 82 88 89 90 97 95 98 99
Philou 3.70-x64           0  0  7  8 20 30 35 38 41 43 49    52 53 56 81 81 88 88 89 97 94 98 99
Pro Deo 1.74              0  0  6  7 18 28 32 35 38 40 46 47    50 53 79 79 86 87 88 96 94 98 99
Zarkov 6.44               0  0  6  7 18 27 31 34 38 39 45 46 49    52 79 79 86 86 88 96 94 98 99
Alaric 707                0  0  5  6 16 25 29 33 36 37 42 43 46 47    76 76 84 84 86 95 92 97 99
SlowChess blitz WV2.1     0  0  1  1  4  7  9 11 13 14 18 18 20 20 23    50 61 62 65 74 77 89 96
Garbochesss 3.00-x64-JA   0  0  1  1  4  8  9 11 13 14 17 18 20 20 23 49    60 61 64 72 77 89 96
E.T.Chess 130108          0  0  0  0  2  4  5  7  8  8 11 11 13 13 15 38 39    51 54 59 68 84 94
DanaSah 4.88              0  0  0  0  2  4  5  6  8  8 10 11 12 13 15 37 38 48    53 58 68 83 93
Amyan 1.7.2               0  0  0  0  1  3  4  5  7  7  9 10 11 11 13 34 35 45 46    53 64 80 92
Cheng3 1.07a-x64          0  0  0  0  0  0  0  0  1  1  2  2  3  3  4 25 27 40 41 46    67 87 97
Cyrano 0.6b17-x64         0  0  0  0  0  1  2  2  3  3  4  5  5  5  7 22 22 31 31 35 32    69 85
Hermann 2.8-x64-2cpu      0  0  0  0  0  0  0  0  1  1  1  1  1  1  2 10 10 15 16 19 12 30    70
Pseudo 0.7c               0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  3  3  5  6  7  2 14 29
ResultSet-EloRating>

Kempelen · Post by **Kempelen** » Tue Aug 14, 2012 11:55 am

Hello Remi

Last week looking at the bayeselo pages I read Bayeselo has the adventage of provide a much better accuracy in elo estimation over elostat because 1+/1- is less significative thant 10+/10-, so you adjust this in the final result.

I may be wrong, but for me this is innacurate. In fact elo_stat has this into account because the error bar is in function of games played. It possible both be the same: being correct with elo+big margen than adjusted elo+better less margin, but for me this is unsound conceptually, an estimated elo should not be based in number of games played. Suppose I have played 100 games, does that mean that if a result of 100 games and not 90 I would be better? no, what should vary is error bar.

Also, this problems has an adverse issue, can I compared a tournament with 4000 games with other with 5000 games?. I suspect I will, taking into account that I must compare elo+error and not only elo, but could be a "tendency" to look at the elo, and it has consecuences as it is a function of number of games played.

This raise a question for me, which is the main point to write this post: can I compare elo+error in tournament of different number of games? Possible you are going to say yes, but, what is "the effect" the number of games has in the estimated elo? could be noise?

I hope I have explained myself well, my english in not good sometimes and it is difficult matter for me to explain.

Regards
Fermin

Derivation of bayeselo formula

Derivation of bayeselo formula

Re: Derivation of bayeselo formula

Re: Derivation of bayeselo formula

Re: Derivation of bayeselo formula

Re: Derivation of bayeselo formula

Re: Derivation of bayeselo formula

Re: Derivation of bayeselo formula

Re: Derivation of bayeselo formula

Re: Derivation of bayeselo formula

Re: Derivation of bayeselo formula