On engine testing again!

Edsel Apostol · Post by **Edsel Apostol** » Fri Jan 01, 2010 7:41 am

Let's say that due to the limited resources one can only play 1200 games per engine version/settings.

Which testing method is better and why?

A. 120 games against each of the 10 opponents
B. 240 games against each of the 5 opponents
C. 300 games against each of the 4 opponents
D. 400 games against each of the 3 opponents
E. 600 games against each of the 2 opponents
E. 1200 games against an opponent

Graham Banks · Post by **Graham Banks** » Fri Jan 01, 2010 7:51 am

I'd go for:

A. 120 games against each of the 10 opponents

I think that 120 games is a reasonable number against a given opponent, if you're having each play from the same opening line as both White and Black.
A good range of opponents is better than a limited range.

The only other option I'd consider would be:

B. 240 games against each of the 5 opponents

Dann Corbit · Post by **Dann Corbit** » Fri Jan 01, 2010 8:11 am

Edsel Apostol wrote:Let's say that due to the limited resources one can only play 1200 games per engine version/settings.

Which testing method is better and why?

A. 120 games against each of the 10 opponents
B. 240 games against each of the 5 opponents
C. 300 games against each of the 4 opponents
D. 400 games against each of the 3 opponents
E. 600 games against each of the 2 opponents
E. 1200 games against an opponent

I suggest method A.
If there is some flaw in search or eval method, it is ten times as likely to uncover it with approach A than approach E if the feature that is flawed has a small probability to be implemented correctly.

IOW, suppose that you chose method E and the opponent does not understand open files and diagonals. By some coincidence, open files and diagonals is also a serious flaw in the program to be tested. The flaw will go unrevealed. But if you chose ten distinct programs, the odds that all ten would lack the feature are far less than a single program lacking the feature.

If you see that program N destroys your program (and it should not according to Elo estimates) then you can examine the games afterwards and see what defect is present in your games.

swami · Post by **swami** » Fri Jan 01, 2010 8:51 am

I completely agree with Dann. His reasoning is pretty good.

A is the best choice if you like variety. If you want more games, you could reduce the time control instead of reducing the number of participants. If the time control is really short to the point where you can't reduce it anymore, then Option B is Ok provided you've chosen 5 engines with completely different style and wanted to see more games.

Edsel Apostol · Post by **Edsel Apostol** » Fri Jan 01, 2010 9:20 am

Thanks Graham, Dann and Swami. I'm already doing A.

What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.

Which is better:

A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?

or

B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?

Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:

A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)

Kurt Utzinger · Post by **Kurt Utzinger** » Fri Jan 01, 2010 9:56 am

In my opinion the best method is
A. 120 games against each of the 10 opponents
Kurt

Dirt · Post by **Dirt** » Fri Jan 01, 2010 9:58 am

Edsel Apostol wrote:Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:

Uri had the interesting suggestion to play against the strongest engine you can find, but give your engine time odds so that it plays at a roughly equal strength. This can somewhat speed up testing. It might not be best to choose all your opponents this way.

swami · Post by **swami** » Fri Jan 01, 2010 10:05 am

Edsel Apostol wrote:Thanks Graham, Dann and Swami. I'm already doing A.

What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.

Which is better:

A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?

or

B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?

Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:

A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)

I'd prefer A: But not the older test suite of 2006 from Noomen.

Instead:

Gambit Suite 2008 (50 Gambit Opening positions)

Noomen Test Suite 2008 (50 Balanced opening positions)

These can be downloaded here:
http://www.rybkachess.com/index.php?auswahl=Downloads

You might get the idea from results about whether your engine does better with Gambit style or with Balanced Style.

As for your second query:

B. 2750 - 2950 (engines weaker or stronger within 100 elo)

swami · Post by **swami** » Fri Jan 01, 2010 10:10 am

Dirt wrote:
Edsel Apostol wrote:Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:
Uri had the interesting suggestion to play against the strongest engine you can find, but give your engine time odds so that it plays at a roughly equal strength. This can somewhat speed up testing. It might not be best to choose all your opponents this way.

There's no correct rating assigned for top engine with time odd. One needs to know the rating of the opponent to see if the changes made in the engine resulted in a progress.

Graham Banks · Post by **Graham Banks** » Fri Jan 01, 2010 10:16 am

Edsel Apostol wrote:Thanks Graham, Dann and Swami. I'm already doing A.

What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.

Which is better:

A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?

or

B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?

Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:

A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)

I don't use opening suites, so can't comment on them.
I'd select opponents within 100 ELO either side.

Cheers,
Graham.

On engine testing again!

On engine testing again!

Re: On engine testing again!

Re: On engine testing again!

Re: On engine testing again!

Re: On engine testing again!

Re: On engine testing again!

Re: On engine testing again!

Re: On engine testing again!

Re: On engine testing again!

Re: On engine testing again!