Comparing two version of the same engine

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Kempelen
Posts: 620
Joined: Fri Feb 08, 2008 9:44 am
Location: Madrid - Spain
Contact:

Comparing two version of the same engine

Post by Kempelen » Sun Oct 26, 2008 6:01 pm

Hi

I release Rodin v1.14 a few months ago. Now I am writting improvements for a new version and have doubts about doing a match between both versions for testing purposes

How many games and what result do you consider that a match between both engines would be necesary to know that the new one is stronger?.

Thx

Karmazen & Oliver
Posts: 374
Joined: Fri Mar 09, 2007 11:34 pm

Re: Comparing two version of the same engine

Post by Karmazen & Oliver » Sun Oct 26, 2008 7:33 pm

Kempelen wrote:Hi

I release Rodin v1.14 a few months ago. Now I am writting improvements for a new version and have doubts about doing a match between both versions for testing purposes

How many games and what result do you consider that a match between both engines would be necesary to know that the new one is stronger?.

Thx
If it is the same engine, Do a match with indentical plys, can say you which has but chess knowledge...

and see how many time need for that... if the codig are more fast...

swami
Posts: 6546
Joined: Thu Mar 09, 2006 3:21 am

Re: Comparing two version of the same engine

Post by swami » Sun Oct 26, 2008 7:48 pm

Kempelen wrote: How many games and what result do you consider that a match between both engines would be necesary to know that the new one is stronger?.

Thx
100 games against distinct engines and blitz 1/1 gauntlet Nunn or Noomen test for a start to keep track of progress in improvement.

bob
Posts: 20914
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Comparing two version of the same engine

Post by bob » Sun Oct 26, 2008 8:35 pm

Kempelen wrote:Hi

I release Rodin v1.14 a few months ago. Now I am writting improvements for a new version and have doubts about doing a match between both versions for testing purposes

How many games and what result do you consider that a match between both engines would be necesary to know that the new one is stronger?.

Thx
If you are playing version A against version B, there is no way to determine which is better. You need to play both versions against a group of common opponents instead.

Karmazen & Oliver
Posts: 374
Joined: Fri Mar 09, 2007 11:34 pm

Re: Comparing two version of the same engine

Post by Karmazen & Oliver » Mon Oct 27, 2008 12:30 am

bob wrote:
Kempelen wrote:Hi

I release Rodin v1.14 a few months ago. Now I am writting improvements for a new version and have doubts about doing a match between both versions for testing purposes

How many games and what result do you consider that a match between both engines would be necesary to know that the new one is stronger?.

Thx
If you are playing version A against version B, there is no way to determine which is better. You need to play both versions against a group of common opponents instead.
there is no way to determine which is better???

it´s simple . ENGINE A versus ENGINE B.

a match ply: similar.
other macht time similar.

If A is better that B. A win.

User avatar
geots
Posts: 4790
Joined: Fri Mar 10, 2006 11:42 pm

Re: Comparing two version of the same engine

Post by geots » Mon Oct 27, 2008 1:14 am

Karmazen & Oliver wrote:
bob wrote:
Kempelen wrote:Hi

I release Rodin v1.14 a few months ago. Now I am writting improvements for a new version and have doubts about doing a match between both versions for testing purposes

How many games and what result do you consider that a match between both engines would be necesary to know that the new one is stronger?.

Thx
If you are playing version A against version B, there is no way to determine which is better. You need to play both versions against a group of common opponents instead.
there is no way to determine which is better???

it´s simple . ENGINE A versus ENGINE B.

a match ply: similar.
other macht time similar.

If A is better that B. A win.

Bob is dead-on right on this one. There is no argument to make. Case closed.


Best,

User avatar
Kempelen
Posts: 620
Joined: Fri Feb 08, 2008 9:44 am
Location: Madrid - Spain
Contact:

Re: Comparing two version of the same engine

Post by Kempelen » Mon Oct 27, 2008 7:58 am

bob wrote:If you are playing version A against version B, there is no way to determine which is better. You need to play both versions against a group of common opponents instead.
Then supposing a gounlet tournament of Engine A versus, for example, 12 engines and other tournment of engine B versus those 12 engines too, how many games are needed, appropiate time level, and % score difference is enought to say that A is better than B?

krazyken

Re: Comparing two version of the same engine

Post by krazyken » Mon Oct 27, 2008 8:28 am

Kempelen wrote:
bob wrote:If you are playing version A against version B, there is no way to determine which is better. You need to play both versions against a group of common opponents instead.
Then supposing a gounlet tournament of Engine A versus, for example, 12 engines and other tournment of engine B versus those 12 engines too, how many games are needed, appropiate time level, and % score difference is enought to say that A is better than B?
There are a few variables to consider, such as how many ELO difference is actually better? The smaller the difference you want to detect, the more games are needed. The quick and easy way is to run some games, and put all the games in one pgn file (make sure both versions have a different name). Then use BayesELO on that pgn file to get relative ratings, where you can see +/- margin of error. You can also use the LOS (likelihood of superiority) function in BayesELO to enhance your picture. If the margins aren't small enough run some more games. An important factor is to try to avoid repeating games, so use different starting positions for each game.

I'm sure someone may come along with exact math for you, my guess is that with 12 opponents you could start with 8 games with each (a 96 game run). That should give you a decent starting point to decide if you want more games.

bob
Posts: 20914
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Comparing two version of the same engine

Post by bob » Mon Oct 27, 2008 3:47 pm

Karmazen & Oliver wrote:
bob wrote:
Kempelen wrote:Hi

I release Rodin v1.14 a few months ago. Now I am writting improvements for a new version and have doubts about doing a match between both versions for testing purposes

How many games and what result do you consider that a match between both engines would be necesary to know that the new one is stronger?.

Thx
If you are playing version A against version B, there is no way to determine which is better. You need to play both versions against a group of common opponents instead.
there is no way to determine which is better???

it´s simple . ENGINE A versus ENGINE B.

a match ply: similar.
other macht time similar.

If A is better that B. A win.
It isn't quite that simple. Your new change might have a side-effect of weakening some other part of your game, but your program doesn't understand (say) the finer points of king-side attack, so you won't notice that this new change has actually made your program worse, because the only opponent you test against can't exploit the weakness...

This is why "inbreeding" is bad for biological reproduction.

bob
Posts: 20914
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Comparing two version of the same engine

Post by bob » Mon Oct 27, 2008 3:49 pm

Kempelen wrote:
bob wrote:If you are playing version A against version B, there is no way to determine which is better. You need to play both versions against a group of common opponents instead.
Then supposing a gounlet tournament of Engine A versus, for example, 12 engines and other tournment of engine B versus those 12 engines too, how many games are needed, appropiate time level, and % score difference is enought to say that A is better than B?
Depends on how much better. If new version is 200 elo better, you can figure that out in 50 games. If it is 2 elo better, you will need almost 100,000 games...

Post Reply