Observator bias or...

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Alessandro Scotti

Re: Observator bias or...

Post by Alessandro Scotti » Wed May 30, 2007 8:31 pm

I remember since testing with Kiwi that results with 100 games are very unreliable. It sometimes happen that a version gets a bad start but gets better at the end of the long test. On the other hand, I had a version reach 64% after 100 games and finish with a disappointing 50% after 720 games... I will now increase the number to 800 and see if that brings some benefits (not much is expected though).

ed

Re: Observator bias or...

Post by ed » Wed May 30, 2007 11:50 pm

Alessandro Scotti wrote:I remember since testing with Kiwi that results with 100 games are very unreliable. It sometimes happen that a version gets a bad start but gets better at the end of the long test. On the other hand, I had a version reach 64% after 100 games and finish with a disappointing 50% after 720 games... I will now increase the number to 800 and see if that brings some benefits (not much is expected though).
I once wrote a small util that emulates a match between two equal engines in strength, here is the code, do some tests and shiver.

Ed

-----------------------------------------------------------

Code: Select all

#include <stdio.h>
#include <stdlib.h>

void main&#40;)            // emulate matches

&#123;       int r,x,max,c; float win,loss,draw,f1,f2,f3,f4; char w&#91;200&#93;; int rnd,d,e;

        srand&#40;rnd&#41;;

again&#58;  printf&#40;"Number of Games "); gets&#40;w&#41;; max=atoi&#40;w&#41;;

loop&#58;   x=0; win=0; loss=0; draw=0; printf&#40;"\n");

next&#58;   if &#40;x==max&#41; goto einde;

        r=rand&#40;); r=r&3; if &#40;r==0&#41; goto next;
        if &#40;r==1&#41; win++;
        if &#40;r==2&#41; loss++;
        if &#40;r==3&#41; draw++;
        x++; if &#40;x==&#40;max/4&#41;) goto disp;
             if &#40;x==&#40;max/2&#41;) goto disp;
             if &#40;x==&#40;max/4&#41;+&#40;max/2&#41;) goto disp;
             if &#40;x==max&#41; goto disp;
        goto next;


disp&#58;   f1=win+&#40;draw/2&#41;; f2=loss+&#40;draw/2&#41;; f4=x; f3=&#40;f1*100&#41;/f4; d=f1; e=f2;
        printf&#40;"%d-%d (%.1f%%)    ",d,e,f3&#41;;
        goto next;

einde&#58;  c=getch&#40;); if &#40;c=='q') return;
        if &#40;c=='a') &#123; printf&#40;"\n\n"); goto again; &#125;
        goto loop;

&#125;

Dann Corbit
Posts: 10206
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Observator bias or...

Post by Dann Corbit » Thu May 31, 2007 12:19 am

Slightly cleaned up version of the same thing. The original exhibits undefined behavior because of access to an uninitialized variable.

Code: Select all

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void            terminate_chance&#40;void&#41;
&#123;
    int             c;
    puts&#40;"enter 'q' to quit, anything else to continue");
    c = getch&#40;);
    if &#40;c == 'q')
        exit&#40;EXIT_SUCCESS&#41;;
    puts&#40;"\n");
&#125;

int             main&#40;void&#41;
&#123;                               // emulate matches

    int             r,
                    x,
                    max,
                    c;
    double          win,
                    loss,
                    draw,
                    f1,
                    f2,
                    f3,
                    f4;
    char            w&#91;200&#93;;
    int             rnd,
                    d,
                    e;
    int             keep_looping;

    srand&#40;&#40;unsigned&#41; time&#40;NULL&#41;);
    for (;;) &#123;
        keep_looping = 1;
        printf&#40;"Number of Games ");
        fflush&#40;stdout&#41;;
        fgets&#40;w, sizeof w, stdin&#41;;
        max = atoi&#40;w&#41;;

        for (; keep_looping;) &#123;
            x = 0;
            win = 0;
            loss = 0;
            draw = 0;
            printf&#40;"\n");

            for (; keep_looping;) &#123;
                for (; keep_looping;) &#123;
                    do &#123;
                        if &#40;x == max&#41; &#123;
                            terminate_chance&#40;);
                            keep_looping = 0;
                        &#125;
                        r = rand&#40;);
                        r &= 3;
                    &#125;
                    while &#40;r == 0&#41;;

                    if &#40;r == 1&#41;
                        win++;
                    if &#40;r == 2&#41;
                        loss++;
                    if &#40;r == 3&#41;
                        draw++;
                    x++;
                    if &#40;x == &#40;max / 4&#41;)
                        break;
                    if &#40;x == &#40;max / 2&#41;)
                        break;
                    if &#40;x == &#40;max / 4&#41; + &#40;max / 2&#41;)
                        break;
                    if &#40;x == max&#41;
                        break;
                &#125;

                f1 = win + &#40;draw / 2&#41;;
                f2 = loss + &#40;draw / 2&#41;;
                f4 = x;
                f3 = &#40;f1 * 100&#41; / f4;
                d = f1;
                e = f2;
                printf&#40;"%d-%d (%.1f%%)    ", d, e, f3&#41;;
            &#125;

        &#125;
    &#125;
&#125;


Dann Corbit
Posts: 10206
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Observator bias or...

Post by Dann Corbit » Thu May 31, 2007 12:33 am

New version that should also compile on non-windows platforms.

Code: Select all

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void            terminate_chance&#40;void&#41;
&#123;
    int             c;
    puts&#40;"enter 'q' to quit, anything else to continue");
    c = getchar&#40;);
    if &#40;c == 'q')
        exit&#40;EXIT_SUCCESS&#41;;
    puts&#40;"\n");
&#125;

int             main&#40;void&#41;
&#123;                               // emulate matches

    int             r,
                    x,
                    max;
    double          win,
                    loss,
                    draw,
                    f1,
                    f2,
                    f3,
                    f4;
    char            w&#91;200&#93;;
    int             d,
                    e;
    int             keep_looping;

    srand&#40;&#40;unsigned&#41; time&#40;NULL&#41;);
    for (;;) &#123;
        keep_looping = 1;
        printf&#40;"Number of Games&#58;");
        fflush&#40;stdout&#41;;
        fgets&#40;w, sizeof w, stdin&#41;;
        max = atoi&#40;w&#41;;

        for (; keep_looping;) &#123;
            x = 0;
            win = 0;
            loss = 0;
            draw = 0;
            printf&#40;"\n");

            for (; keep_looping;) &#123;
                for (; keep_looping;) &#123;
                    do &#123;
                        if &#40;x == max&#41; &#123;
                            terminate_chance&#40;);
                            keep_looping = 0;
                        &#125;
                        r = rand&#40;);
                        r &= 3;
                    &#125;
                    while &#40;r == 0&#41;;

                    if &#40;r == 1&#41;
                        win++;
                    if &#40;r == 2&#41;
                        loss++;
                    if &#40;r == 3&#41;
                        draw++;
                    x++;
                    if &#40;x == &#40;max / 4&#41;)
                        break;
                    if &#40;x == &#40;max / 2&#41;)
                        break;
                    if &#40;x == &#40;max / 4&#41; + &#40;max / 2&#41;)
                        break;
                    if &#40;x == max&#41;
                        break;
                &#125;

                f1 = win + &#40;draw / 2&#41;;
                f2 = loss + &#40;draw / 2&#41;;
                f4 = x;
                f3 = &#40;f1 * 100&#41; / f4;
                d = &#40;int&#41; &#40;f1 + 0.5&#41;;
                e = &#40;int&#41; &#40;f2 + 0.5&#41;;
                printf&#40;"%d-%d (%.1f%%)    ", d, e, f3&#41;;
            &#125;
        &#125;
    &#125;
&#125;


Dann Corbit
Posts: 10206
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Observator bias or...

Post by Dann Corbit » Thu May 31, 2007 12:40 am

This version cures the extraneous printing and is formatted better.

Code: Select all

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

void            terminate_chance&#40;char *string, size_t length&#41;
&#123;
    int             c;
    puts&#40;"\nEnter 'q' and <Enter> to quit, anything else to continue");
    fgets&#40;string, sizeof string, stdin&#41;;
    c = string&#91;0&#93;;
    if &#40;c == 'q')
        exit&#40;EXIT_SUCCESS&#41;;
    puts&#40;"\n");
&#125;

int             main&#40;void&#41;
&#123;                               // emulate matches

    int             r,
                    x,
                    max;
    double          win,
                    loss,
                    draw,
                    f1,
                    f2,
                    f3,
                    f4;
    char            w&#91;200&#93;;
    int             d,
                    e;
    int             keep_looping;

    srand&#40;&#40;unsigned&#41; time&#40;NULL&#41;);
    for (;;) &#123;
        keep_looping = 1;
        printf&#40;"Number of Games&#58;");
        fflush&#40;stdout&#41;;
        fgets&#40;w, sizeof w, stdin&#41;;
        max = atoi&#40;w&#41;;

        for (; keep_looping;) &#123;
            x = 0;
            win = 0;
            loss = 0;
            draw = 0;
            printf&#40;"\n");

            for (; keep_looping;) &#123;
                for (; keep_looping;) &#123;
                    do &#123;
                        if &#40;x == max&#41; &#123;
                            terminate_chance&#40;w, sizeof w&#41;;
                            keep_looping = 0;
                        &#125;
                        r = rand&#40;);
                        r &= 3;
                    &#125;
                    while &#40;r == 0&#41;;

                    if &#40;r == 1&#41;
                        win++;
                    if &#40;r == 2&#41;
                        loss++;
                    if &#40;r == 3&#41;
                        draw++;
                    x++;
                    if &#40;x == &#40;max / 4&#41;)
                        break;
                    if &#40;x == &#40;max / 2&#41;)
                        break;
                    if &#40;x == &#40;max / 4&#41; + &#40;max / 2&#41;)
                        break;
                    if &#40;x == max&#41;
                        break;
                &#125;
                if &#40;keep_looping&#41; &#123;
                    f1 = win + &#40;draw / 2&#41;;
                    f2 = loss + &#40;draw / 2&#41;;
                    f4 = x;
                    f3 = &#40;f1 * 100&#41; / f4;
                    d = &#40;int&#41; &#40;f1 + 0.5&#41;;
                    e = &#40;int&#41; &#40;f2 + 0.5&#41;;
                    printf&#40;"%d-%d (%.1f%%)    ", d, e, f3&#41;;
                &#125;
            &#125;
        &#125;
    &#125;
&#125;

ed

Re: Observator bias or...

Post by ed » Thu May 31, 2007 9:40 am

Thx Dann for the C programming course once again. :wink: :wink:

Ed

User avatar
hgm
Posts: 23790
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Observator bias or...

Post by hgm » Thu May 31, 2007 11:02 am

Alessandro Scotti wrote:I remember since testing with Kiwi that results with 100 games are very unreliable. It sometimes happen that a version gets a bad start but gets better at the end of the long test. On the other hand, I had a version reach 64% after 100 games and finish with a disappointing 50% after 720 games... I will now increase the number to 800 and see if that brings some benefits (not much is expected though).
64% after 100 games between approximately equal engines is extreme: the standard error over 100 games should be 0.4/sqrt(100) = 4%, so a 14% deviation represents 3.5 sigma. This should happen on the average only 1 in ~4000 tries.

I noted a very strange effect when I was testing uMax in self play. The standard error over 100 games should be 4%, but when I played 1000 games between the same versions, and looked at the scores of the ten individual 100-game runs, these results deviated on the average much more from each other (and the final average result) than you would expect from the calculated standard error. This can only happen if the games are not independent! I can indeed not exclude this, as all the games were played in a single run, and were using the random seed the previous game ended with. So with a bad randomizer, if a single game repeats due to an equal or very close seed at the start of the game, it might imply that the following game repeats as well, destroying the independence of the game.

Whatever the cause, the effect was that the error in the win percentage was always a lot larger than you would expect based on the number of games.

Tony

Re: Observator bias or...

Post by Tony » Thu May 31, 2007 12:04 pm

hgm wrote:
Alessandro Scotti wrote:I remember since testing with Kiwi that results with 100 games are very unreliable. It sometimes happen that a version gets a bad start but gets better at the end of the long test. On the other hand, I had a version reach 64% after 100 games and finish with a disappointing 50% after 720 games... I will now increase the number to 800 and see if that brings some benefits (not much is expected though).
64% after 100 games between approximately equal engines is extreme: the standard error over 100 games should be 0.4/sqrt(100) = 4%, so a 14% deviation represents 3.5 sigma. This should happen on the average only 1 in ~4000 tries.

I noted a very strange effect when I was testing uMax in self play. The standard error over 100 games should be 4%, but when I played 1000 games between the same versions, and looked at the scores of the ten individual 100-game runs, these results deviated on the average much more from each other (and the final average result) than you would expect from the calculated standard error. This can only happen if the games are not independent! I can indeed not exclude this, as all the games were played in a single run, and were using the random seed the previous game ended with. So with a bad randomizer, if a single game repeats due to an equal or very close seed at the start of the game, it might imply that the following game repeats as well, destroying the independence of the game.

Whatever the cause, the effect was that the error in the win percentage was always a lot larger than you would expect based on the number of games.
I think the math only works if P(win)=P(loose)=P(draw)=1/3 (which I doubt is the case)

Ed's code even assumes P(win,white)==P(win,black) which I doubt as well.

Tony

Uri Blass
Posts: 8611
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Observator bias or...

Post by Uri Blass » Thu May 31, 2007 12:51 pm

Tony wrote:
hgm wrote:
Alessandro Scotti wrote:I remember since testing with Kiwi that results with 100 games are very unreliable. It sometimes happen that a version gets a bad start but gets better at the end of the long test. On the other hand, I had a version reach 64% after 100 games and finish with a disappointing 50% after 720 games... I will now increase the number to 800 and see if that brings some benefits (not much is expected though).
64% after 100 games between approximately equal engines is extreme: the standard error over 100 games should be 0.4/sqrt(100) = 4%, so a 14% deviation represents 3.5 sigma. This should happen on the average only 1 in ~4000 tries.

I noted a very strange effect when I was testing uMax in self play. The standard error over 100 games should be 4%, but when I played 1000 games between the same versions, and looked at the scores of the ten individual 100-game runs, these results deviated on the average much more from each other (and the final average result) than you would expect from the calculated standard error. This can only happen if the games are not independent! I can indeed not exclude this, as all the games were played in a single run, and were using the random seed the previous game ended with. So with a bad randomizer, if a single game repeats due to an equal or very close seed at the start of the game, it might imply that the following game repeats as well, destroying the independence of the game.

Whatever the cause, the effect was that the error in the win percentage was always a lot larger than you would expect based on the number of games.
I think the math only works if P(win)=P(loose)=P(draw)=1/3 (which I doubt is the case)

Ed's code even assumes P(win,white)==P(win,black) which I doubt as well.

Tony
With bigger probability for white the variance is even smaller so result of 64% after 100 games is even less expected.

Uri

Tony

Re: Observator bias or...

Post by Tony » Thu May 31, 2007 12:55 pm

Uri Blass wrote:
Tony wrote:
hgm wrote:
Alessandro Scotti wrote:I remember since testing with Kiwi that results with 100 games are very unreliable. It sometimes happen that a version gets a bad start but gets better at the end of the long test. On the other hand, I had a version reach 64% after 100 games and finish with a disappointing 50% after 720 games... I will now increase the number to 800 and see if that brings some benefits (not much is expected though).
64% after 100 games between approximately equal engines is extreme: the standard error over 100 games should be 0.4/sqrt(100) = 4%, so a 14% deviation represents 3.5 sigma. This should happen on the average only 1 in ~4000 tries.

I noted a very strange effect when I was testing uMax in self play. The standard error over 100 games should be 4%, but when I played 1000 games between the same versions, and looked at the scores of the ten individual 100-game runs, these results deviated on the average much more from each other (and the final average result) than you would expect from the calculated standard error. This can only happen if the games are not independent! I can indeed not exclude this, as all the games were played in a single run, and were using the random seed the previous game ended with. So with a bad randomizer, if a single game repeats due to an equal or very close seed at the start of the game, it might imply that the following game repeats as well, destroying the independence of the game.

Whatever the cause, the effect was that the error in the win percentage was always a lot larger than you would expect based on the number of games.
I think the math only works if P(win)=P(loose)=P(draw)=1/3 (which I doubt is the case)

Ed's code even assumes P(win,white)==P(win,black) which I doubt as well.

Tony
With bigger probability for white the variance is even smaller so result of 64% after 100 games is even less expected.

Uri
Not if P(draw)<1/3

Tony

Post Reply