On engine testing again!

Discussion of chess software programming and technical issues.

Moderator: Ras

Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

On engine testing again!

Post by Edsel Apostol »

Let's say that due to the limited resources one can only play 1200 games per engine version/settings.

Which testing method is better and why?

A. 120 games against each of the 10 opponents
B. 240 games against each of the 5 opponents
C. 300 games against each of the 4 opponents
D. 400 games against each of the 3 opponents
E. 600 games against each of the 2 opponents
E. 1200 games against an opponent
User avatar
Graham Banks
Posts: 44611
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: On engine testing again!

Post by Graham Banks »

I'd go for:

A. 120 games against each of the 10 opponents

I think that 120 games is a reasonable number against a given opponent, if you're having each play from the same opening line as both White and Black.
A good range of opponents is better than a limited range.

The only other option I'd consider would be:

B. 240 games against each of the 5 opponents
gbanksnz at gmail.com
Dann Corbit
Posts: 12792
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: On engine testing again!

Post by Dann Corbit »

Edsel Apostol wrote:Let's say that due to the limited resources one can only play 1200 games per engine version/settings.

Which testing method is better and why?

A. 120 games against each of the 10 opponents
B. 240 games against each of the 5 opponents
C. 300 games against each of the 4 opponents
D. 400 games against each of the 3 opponents
E. 600 games against each of the 2 opponents
E. 1200 games against an opponent
I suggest method A.
If there is some flaw in search or eval method, it is ten times as likely to uncover it with approach A than approach E if the feature that is flawed has a small probability to be implemented correctly.

IOW, suppose that you chose method E and the opponent does not understand open files and diagonals. By some coincidence, open files and diagonals is also a serious flaw in the program to be tested. The flaw will go unrevealed. But if you chose ten distinct programs, the odds that all ten would lack the feature are far less than a single program lacking the feature.

If you see that program N destroys your program (and it should not according to Elo estimates) then you can examine the games afterwards and see what defect is present in your games.
swami
Posts: 6662
Joined: Thu Mar 09, 2006 4:21 am

Re: On engine testing again!

Post by swami »

I completely agree with Dann. His reasoning is pretty good.

A is the best choice if you like variety. If you want more games, you could reduce the time control instead of reducing the number of participants. If the time control is really short to the point where you can't reduce it anymore, then Option B is Ok provided you've chosen 5 engines with completely different style and wanted to see more games.
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: On engine testing again!

Post by Edsel Apostol »

Thanks Graham, Dann and Swami. I'm already doing A.

What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.

Which is better:

A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?

or

B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?

Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:

A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)
Kurt Utzinger
Posts: 169
Joined: Sun May 11, 2008 10:31 pm
Location: Switzerland

Re: On engine testing again!

Post by Kurt Utzinger »

In my opinion the best method is
A. 120 games against each of the 10 opponents
Kurt
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: On engine testing again!

Post by Dirt »

Edsel Apostol wrote:Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:
Uri had the interesting suggestion to play against the strongest engine you can find, but give your engine time odds so that it plays at a roughly equal strength. This can somewhat speed up testing. It might not be best to choose all your opponents this way.
swami
Posts: 6662
Joined: Thu Mar 09, 2006 4:21 am

Re: On engine testing again!

Post by swami »

Edsel Apostol wrote:Thanks Graham, Dann and Swami. I'm already doing A.

What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.

Which is better:

A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?

or

B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?

Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:

A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)
I'd prefer A: But not the older test suite of 2006 from Noomen.

Instead:

Gambit Suite 2008 (50 Gambit Opening positions)

Noomen Test Suite 2008 (50 Balanced opening positions)

These can be downloaded here:
http://www.rybkachess.com/index.php?auswahl=Downloads

You might get the idea from results about whether your engine does better with Gambit style or with Balanced Style.

As for your second query:

B. 2750 - 2950 (engines weaker or stronger within 100 elo)
swami
Posts: 6662
Joined: Thu Mar 09, 2006 4:21 am

Re: On engine testing again!

Post by swami »

Dirt wrote:
Edsel Apostol wrote:Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:
Uri had the interesting suggestion to play against the strongest engine you can find, but give your engine time odds so that it plays at a roughly equal strength. This can somewhat speed up testing. It might not be best to choose all your opponents this way.
There's no correct rating assigned for top engine with time odd. One needs to know the rating of the opponent to see if the changes made in the engine resulted in a progress.
User avatar
Graham Banks
Posts: 44611
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: On engine testing again!

Post by Graham Banks »

Edsel Apostol wrote:Thanks Graham, Dann and Swami. I'm already doing A.

What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.

Which is better:

A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?

or

B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?

Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:

A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)
I don't use opening suites, so can't comment on them.
I'd select opponents within 100 ELO either side.

Cheers,
Graham.
gbanksnz at gmail.com