about error margins?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: about error margins?

Post by Daniel Shawul »

Laskos wrote:
Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:How? Note that I am calculating sd of winning percentage. When you go to elo calculation with bayeselo, there is ofcourse elodraw and eloadvantage. So a DDDDDD is more variable when you consider those things... Anyway I was just trying to demostrate why one can't tell margin of error of elo from the number of games and winning percentage alone.
That thing you wrote is the error (SD) in the normal approximation of the trinomial. You can still use it for 10 draws match, but keep in mind that after that match, probability of W,L = 1/13, for D = 11/13, and input these into your formula.

Kai
The probability of a W,L would surely be lowered a lot after that odd observation.. That formula is for calculating standard deviation of a given sample that does not assume probablities for WDL. The rewards are fixed at 0,0.5,1 ofcourse. This was just a quick example , but I know it does not directly translate to elo because there you have mix of players with different strength, white elo advantage etc..
I think the formula you wrote was derived using normal approximation and probabilities, which were translated into games by multiplying times n. The formula gives the SD as percentages, then you can convert to Elo by using the logistic. That 1/13 was derived from (w+1)/(n+3), for w=0, n=10 (10 draws, n=10, d=10)

Kai
The formula is the regular standard deviation formula. It may be a bit confusing since I grouped the wins in one term by multiplying with number of wins. Anyway apparently Elostat uses it according to this post. I will quote the steps here. Bayeselo's improvements arise from the discussions in that thread.

Code: Select all

1) Use number of wins, loss, and draws
   W = number of wins, L = number of lost, D = number of draws
   n = number of games (W + L + D)
   m = mean value

2) Apply the following formulas to compute s 
  ( SQRT: square root of. )

   x = W*(1-m)*(1-m) + D*(0.5-m)*(0.5-m) + L*(0-m)*(0-m)   
   s = SQRT( x/(n-1) )

3) Compute error margin A (use 1.96  for 95% confidence)
   
   A = 1.96 * s / SQRT(n)

4) State with 95% confidence:
   The 'real' result should be somewhere in between m-A to m+A

5) Lookup the ELO figures with the win% from m-A and m+A to get the lower and higher values in the error margin.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: about error margins?

Post by Laskos »

Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:How? Note that I am calculating sd of winning percentage. When you go to elo calculation with bayeselo, there is ofcourse elodraw and eloadvantage. So a DDDDDD is more variable when you consider those things... Anyway I was just trying to demostrate why one can't tell margin of error of elo from the number of games and winning percentage alone.
That thing you wrote is the error (SD) in the normal approximation of the trinomial. You can still use it for 10 draws match, but keep in mind that after that match, probability of W,L = 1/13, for D = 11/13, and input these into your formula.

Kai
The probability of a W,L would surely be lowered a lot after that odd observation.. That formula is for calculating standard deviation of a given sample that does not assume probablities for WDL. The rewards are fixed at 0,0.5,1 ofcourse. This was just a quick example , but I know it does not directly translate to elo because there you have mix of players with different strength, white elo advantage etc..
I think the formula you wrote was derived using normal approximation and probabilities, which were translated into games by multiplying times n. The formula gives the SD as percentages, then you can convert to Elo by using the logistic. That 1/13 was derived from (w+1)/(n+3), for w=0, n=10 (10 draws, n=10, d=10)

Kai
The formula is the regular standard deviation formula. It may be a bit confusing since I grouped the wins in one term by multiplying with number of wins. Anyway apparently Elostat uses it according to this post. I will quote the steps here. Bayeselo's improvements arise from the discussions in that thread.

Code: Select all

1) Use number of wins, loss, and draws
   W = number of wins, L = number of lost, D = number of draws
   n = number of games (W + L + D)
   m = mean value

2) Apply the following formulas to compute s 
  ( SQRT: square root of. )

   x = W*(1-m)*(1-m) + D*(0.5-m)*(0.5-m) + L*(0-m)*(0-m)   
   s = SQRT( x/(n-1) )

3) Compute error margin A (use 1.96  for 95% confidence)
   
   A = 1.96 * s / SQRT(n)

4) State with 95% confidence:
   The 'real' result should be somewhere in between m-A to m+A

5) Lookup the ELO figures with the win% from m-A and m+A to get the lower and higher values in the error margin.
This formula is an approximation using normal distributions. There is no closed form exact formula for error margins neither for binomial nor for trinomial.

Kai
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: about error margins?

Post by Daniel Shawul »

Laskos wrote:
Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:How? Note that I am calculating sd of winning percentage. When you go to elo calculation with bayeselo, there is ofcourse elodraw and eloadvantage. So a DDDDDD is more variable when you consider those things... Anyway I was just trying to demostrate why one can't tell margin of error of elo from the number of games and winning percentage alone.
That thing you wrote is the error (SD) in the normal approximation of the trinomial. You can still use it for 10 draws match, but keep in mind that after that match, probability of W,L = 1/13, for D = 11/13, and input these into your formula.

Kai
The probability of a W,L would surely be lowered a lot after that odd observation.. That formula is for calculating standard deviation of a given sample that does not assume probablities for WDL. The rewards are fixed at 0,0.5,1 ofcourse. This was just a quick example , but I know it does not directly translate to elo because there you have mix of players with different strength, white elo advantage etc..
I think the formula you wrote was derived using normal approximation and probabilities, which were translated into games by multiplying times n. The formula gives the SD as percentages, then you can convert to Elo by using the logistic. That 1/13 was derived from (w+1)/(n+3), for w=0, n=10 (10 draws, n=10, d=10)

Kai
The formula is the regular standard deviation formula. It may be a bit confusing since I grouped the wins in one term by multiplying with number of wins. Anyway apparently Elostat uses it according to this post. I will quote the steps here. Bayeselo's improvements arise from the discussions in that thread.

Code: Select all

1) Use number of wins, loss, and draws
   W = number of wins, L = number of lost, D = number of draws
   n = number of games (W + L + D)
   m = mean value

2) Apply the following formulas to compute s 
  ( SQRT: square root of. )

   x = W*(1-m)*(1-m) + D*(0.5-m)*(0.5-m) + L*(0-m)*(0-m)   
   s = SQRT( x/(n-1) )

3) Compute error margin A (use 1.96  for 95% confidence)
   
   A = 1.96 * s / SQRT(n)

4) State with 95% confidence:
   The 'real' result should be somewhere in between m-A to m+A

5) Lookup the ELO figures with the win% from m-A and m+A to get the lower and higher values in the error margin.
This formula is an approximation using normal distributions. There is no closed form exact formula for error margins neither for binomial nor for trinomial.

Kai
I don't understand what you are saying? The elo and its error margin are calculated only in step 5. But the formula I gave does not use any approximations (normal or otherwise). It doesn't make sense if it did so because it is one of statistical parameters used to describe a distribution.

Code: Select all

sd = sqrt(sum(x - xbar)^2 / (N - 1))
Binomial/multi-nomial/normal all have typical mean and sd values calculated this way, not the other way round as you seem to imply.
Actually Elostat calculates two standard deviations as Dieter pointed out there so it is not just m - A and m + A, but m - A and m + B. Ofcourse in step 5 you assume some kind of distribution for the Elo model, which I have mentioned repeatedly that I did not want to include since it clouds the point I am trying to make. Is that what you mean by the formula is for normal?
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: about error margins?

Post by abulmo »

Daniel Shawul wrote:
Laskos wrote:
Daniel Shawul wrote:

Code: Select all

3) Compute error margin A (use 1.96  for 95% confidence)
   
   A = 1.96 * s / SQRT(n)

4) State with 95% confidence:
   The 'real' result should be somewhere in between m-A to m+A


This formula is an approximation using normal distributions. There is no closed form exact formula for error margins neither for binomial nor for trinomial.

Kai
I don't understand what you are saying? The elo and its error margin are calculated only in step 5. But the formula I gave does not use any approximations (normal or otherwise).
The "1.96" in your 95% confidence error margin assume a normal distribution.
Richard
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: about error margins?

Post by Daniel Shawul »

The "1.96" in your 95% confidence error margin assume a normal distribution.
What a weird comment. Are you sure you read the discussion?
Ofcourse that distribution is normal and it is not an assumption. It is due to the central limit theorem. Anyway whats that got to do with the way standard deviation is calculated ?? The discussion was about the formula I used to calculate the sd. It doesn't make sense to say that formula is for normal or anything else.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: about error margins?

Post by Laskos »

Daniel Shawul wrote:
The "1.96" in your 95% confidence error margin assume a normal distribution.
What a weird comment. Are you sure you read the discussion?
Ofcourse that distribution is normal and it is not an assumption. It is due to the central limit theorem. Anyway whats that got to do with the way standard deviation is calculated ?? The discussion was about the formula I used to calculate the sd. It doesn't make sense to say that formula is for normal or anything else.
Yes, but to talk about confidence intervals here you can only for normal. Could you show me the derivation of the variance for trinomial distribution? I am unable quickly.

Kai
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: about error margins?

Post by Daniel Shawul »

Laskos wrote:
Daniel Shawul wrote:
The "1.96" in your 95% confidence error margin assume a normal distribution.
What a weird comment. Are you sure you read the discussion?
Ofcourse that distribution is normal and it is not an assumption. It is due to the central limit theorem. Anyway whats that got to do with the way standard deviation is calculated ?? The discussion was about the formula I used to calculate the sd. It doesn't make sense to say that formula is for normal or anything else.
Yes, but to talk about confidence intervals here you can only for normal. Could you show me the derivation of the variance for trinomial distribution? I am unable quickly.

Kai
It was just a simple misunderstanding that didn't need to go this far. I think you will find what you are looking for here multinomial
User avatar
Ajedrecista
Posts: 1971
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: About error margins?

Post by Ajedrecista »

Hello Kai:
Laskos wrote:
Daniel Shawul wrote:
The "1.96" in your 95% confidence error margin assume a normal distribution.
What a weird comment. Are you sure you read the discussion?
Ofcourse that distribution is normal and it is not an assumption. It is due to the central limit theorem. Anyway whats that got to do with the way standard deviation is calculated ?? The discussion was about the formula I used to calculate the sd. It doesn't make sense to say that formula is for normal or anything else.
Yes, but to talk about confidence intervals here you can only for normal. Could you show me the derivation of the variance for trinomial distribution? I am unable quickly.

Kai
I give you a link from a post by Ernest Bonnem:

http://www.talkchess.com/forum/viewtopi ... 66&t=44100

According to him, if the results between two engines are close to 50%-50%, then a good approximation of the standard deviation is sqrt(wins + loses)/[2·(wins + draws + loses)]; so the variance is (wins + loses)/[4·(wins + draws + loses)²]. I do not know how he reached this formula, but it seems a good aproximation; of course, it does not work if wins = loses = 0 (as other models shown in this thread), but reasonable tests with lots of games do not end with 100% of draws. I hope that this info will be useful for you!

Regards from Spain.

Ajedrecista.
User avatar
Ajedrecista
Posts: 1971
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: About error margins?

Post by Ajedrecista »

Hello Fermín:
Kempelen wrote:Could someone explain to me where the current formulas to calculate error margins according to number of games cames from? (i.e. +-10 elo points for 4000 games). I mean, Why those numbers and no others?. What observations has been done to arrive to that conclusions?
thx Fermin
I am not very gifted in Statistics, but I managed to reach some plausible results regarding error bars. I give you some links of my posts:

http://talkchess.com/forum/viewtopic.ph ... 69&t=41773

http://talkchess.com/forum/viewtopic.ph ... 43&t=41773

http://www.talkchess.com/forum/viewtopi ... 65&t=44100

I use a model with means and standard deviations of a normal distribution. It will not work with a draw ratio of 100% or near it, but it is fine for more less balanced matches. I see that you put ± 10 Elo for 4000 games: it depends of the confidence interval you want. I uploaded three Fortran programmes (by myself) around three weeks ago here. As they are open source, you can take a look in source codes to see my approach to error bars, confidence intervals, likelihood of superiority, etc. These programmes are only useful for matches between two engines only. As a side note, I say that my programme LOS_and_Elo_uncertainties_calculator was improved in the sense of elapsed time for calculations for big amounts of (wins + loses), for example (wins + loses ) > 10,000 (in my PC, from more than 1.1 seconds to around 0.49 seconds), but strangely it is slower for low amounts of (wins + loses), for example (wins + loses) ~ 100, 200 or 300 (in my PC, from ~ 55 ms to ~ 90 ms). I do not know why. This new version is unreleased because I do not consider it a major change with respect of the last public version.

Please ask if you have more doubts. I will try to answer the best I can.

Regards from Spain.

Ajedrecista.