Discussion of chess software programming and technical issues.
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
-
Edmund
- Posts: 668
- Joined: Mon Dec 03, 2007 2:01 pm
- Location: Barcelona, Spain
-
Contact:
Post
by Edmund » Thu Jan 14, 2010 7:39 am
Whenever I do engine testing I usually refer to the following table computed by Joseph Ciarrochi:
http://www.husvankempen.de/nunn/rating/tablejoseph.htm
It is a great resource to me, but what surprises me is one of the sidenotes by the author:
# It is critical that you choose your sample size ahead of time, and do not make any conclusions until you have run the full tournament. It is incorrect, statistically, to watch the running of the tournament, wait until an engine reaches a cut-off, and then stop the tournament.
Does this make sense? Why would it make a difference if I stopped the tournament in the middle (eg to save time) in case of obviously stronger players. After all I only want to tell is A > B and no exact rating.
-
Gian-Carlo Pascutto
- Posts: 1063
- Joined: Sat Dec 13, 2008 6:00 pm
-
Contact:
Post
by Gian-Carlo Pascutto » Thu Jan 14, 2010 9:17 am
Because the statistics (confidence margins) are calculated on the basis that you do not do that.
If you do, the actual confidence will be significantly less than what is in the tables.
-
MattieShoes
- Posts: 718
- Joined: Fri Mar 20, 2009 7:59 pm
Post
by MattieShoes » Thu Jan 14, 2010 9:56 am
Stopping early because you're out of time should be the same as a shorter tournament.
Stopping early because of some sort of rating or result cutoff would screw up the confidence margins.
Take coin flipping.
Setting out for 1000 flips and stopping after 500 because you're bored is fine.
Setting out for 1000 flips and stopping once you have 20 more heads than tails is bad.
-
Edmund
- Posts: 668
- Joined: Mon Dec 03, 2007 2:01 pm
- Location: Barcelona, Spain
-
Contact:
Post
by Edmund » Thu Jan 14, 2010 10:55 am
Thanks for the kind answers. This makes sense.
So no way to trick statistics then ..

-
John Major
- Posts: 27
- Joined: Fri Dec 11, 2009 9:23 pm
Post
by John Major » Thu Jan 14, 2010 11:36 am
Comparing continuously till a cutoff is called 'sequential analysis'. It requires a different table that takes into account the larger likelihood of error, because you make many comparisons.
It was developed during WW2 and deemed so important that it was classified till '45.
-
Edmund
- Posts: 668
- Joined: Mon Dec 03, 2007 2:01 pm
- Location: Barcelona, Spain
-
Contact:
Post
by Edmund » Thu Jan 14, 2010 12:02 pm
John Major wrote:Comparing continuously till a cutoff is called 'sequential analysis'. It requires a different table that takes into account the larger likelihood of error, because you make many comparisons.
It was developed during WW2 and deemed so important that it was classified till '45.
Thats interesting ...
how could this idea be ported to chess engine testing? I think a lot of effort (time, computer resources that could be used of different tests) could be saved if we didn't have to run through the whole tournament in case of obvious differences.
Is there a way to determine the cutoff win-% after n played games in case we want to have an errorbar of say 1% ?
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 2:27 pm
Post
by Don » Thu Jan 14, 2010 11:01 pm
Edmund wrote:John Major wrote:Comparing continuously till a cutoff is called 'sequential analysis'. It requires a different table that takes into account the larger likelihood of error, because you make many comparisons.
It was developed during WW2 and deemed so important that it was classified till '45.
Thats interesting ...
how could this idea be ported to chess engine testing? I think a lot of effort (time, computer resources that could be used of different tests) could be saved if we didn't have to run through the whole tournament in case of obvious differences.
Is there a way to determine the cutoff win-% after n played games in case we want to have an errorbar of say 1% ?
One thing you could do is play until one side is ahead by N points and this will automatically adjust resource usage based on how much confidence you need. But I don't know how to make the calculation to determine what N should be for a given confidence.
-
Edmund
- Posts: 668
- Joined: Mon Dec 03, 2007 2:01 pm
- Location: Barcelona, Spain
-
Contact:
Post
by Edmund » Fri Jan 15, 2010 8:51 am
Don wrote:Edmund wrote:John Major wrote:Comparing continuously till a cutoff is called 'sequential analysis'. It requires a different table that takes into account the larger likelihood of error, because you make many comparisons.
It was developed during WW2 and deemed so important that it was classified till '45.
Thats interesting ...
how could this idea be ported to chess engine testing? I think a lot of effort (time, computer resources that could be used of different tests) could be saved if we didn't have to run through the whole tournament in case of obvious differences.
Is there a way to determine the cutoff win-% after n played games in case we want to have an errorbar of say 1% ?
One thing you could do is play until one side is ahead by N points and this will automatically adjust resource usage based on how much confidence you need. But I don't know how to make the calculation to determine what N should be for a given confidence.
I would imagine your N also depends on the number of games played sofar.
-
Edmund
- Posts: 668
- Joined: Mon Dec 03, 2007 2:01 pm
- Location: Barcelona, Spain
-
Contact:
Post
by Edmund » Fri Jan 15, 2010 9:02 am
Gian-Carlo Pascutto wrote:I think this has exactly the same error as the original proposition.
So margins have to be doubled then? ie use alpha of 0.5% instead of 1%?