Don't forget the simulation I ran which make this completely clear. So you are thinking of this wrong. I was able to predict which program was superior about 87.3 percent of the time when I played 20,000 head to head simulated matches (the superiority was 2 ELO.)Daniel Shawul wrote:Yes but you too are forgetting covariance.Logically if I play A vs B and use 20,000 games, both player have played 20,000 games. If I play A vs C, and then play a second match B vs C, I have to play at least twice as many games, 20,000 for the first match and 20,000 for the second match. In other words I have wasted a lot of testing resources involving a 3rd party. So I think it's pretty obvious that the answer is at least 2x.
In order to duplicate 87% when each played foreign programs I had to run each of the 2 foreign matches to 40,000 games, in other words 4x the effort.
Here is the source code to this simulation in C. You will have to provide a function that returns a random value between 0 and 1 (I'm using the MT RNG.) I also simulate 50% draws since that is about what we get when we test Komodo vs Komodo.
Code: Select all
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
#include <stdint.h>
#include "mt.h" // random number generator
#define DRAW 0.50
int good = 0;
int bad = 0;
double expectation( double me, double you )
{
double x = (you - me) / 400.0;
double d = 1.0 + pow(10.0, x);
return 1.0 / d;
}
double genGame( double wELO, double bELO, double ex )
{
ex = ex - 0.5;
ex = ex * (1.0 / (1.0 - DRAW));
ex += 0.5;
/* generates a random number on [0,1) with 53-bit resolution*/
if (genrand_res53() < DRAW) return 0.5;
if (genrand_res53() < ex) return 1.0;
return 0.0;
}
// ----------------------------------------------------
// evaluate player a with match, return result of match
// ----------------------------------------------------
double eval(double a, double b, int matchsize)
{
int i;
double sc = 0.0;
double ex = expectation(a, b);
for (i=0; i<matchsize; i++) {
sc += genGame( a, b, ex );
}
sc = sc / (double) matchsize;
return sc;
}
int simHeadToHead(int matchsize)
{
double a = 1502.0;
double b = 1500.0;
int i;
good = 0;
bad = 0;
for (i=0; i<100000; i++) {
double e = eval(a, b, matchsize);
if (e > 0.5) good++; else bad++;
}
printf(" (Heads up) correct: %9d incorrect: %9d %10.4f\n", good, bad, 100.0 * good / (double)(good + bad));
return 0;
}
int simHeadForeign(int matchsize)
{
double a = 1502.0;
double b = 1500.0;
double f = 1501.0;
int i;
good = 0;
bad = 0;
for (i=0; i<100000; i++) {
double e0 = eval(a, f, matchsize);
double f0 = eval(b, f, matchsize);
if (e0 > f0) good++; else bad++;
}
printf("(Foreign test) correct: %9d incorrect: %9d %10.4f\n", good, bad, 100.0 * good / (double)(good + bad));
return 0;
}
int main(int argc, char **argv)
{
init_genrand(time(NULL));
simHeadToHead(20000);
simHeadForeign(20000);
return 0;
}
Remeber Remi warned that my forumal could be wrong since there is usually covariance. Please look at my calculation, and see how the covariance affects A-B significantly. It is equal in magnitude to the variance.So when you match A with B, you have 4 time as big a variance. Look at my example and tell me where I made a mistake.Code: Select all
For A vs B var(A)=var(B) cov(A,B)=-sqrt(var(A)var(B))=-var(A) So var(A-B)=var(A)+var(A)-2(-var(A))=4var(A)