Let's say that due to the limited resources one can only play 1200 games per engine version/settings.
Which testing method is better and why?
A. 120 games against each of the 10 opponents
B. 240 games against each of the 5 opponents
C. 300 games against each of the 4 opponents
D. 400 games against each of the 3 opponents
E. 600 games against each of the 2 opponents
E. 1200 games against an opponent
On engine testing again!
Moderator: Ras
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
-
- Posts: 44611
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: On engine testing again!
I'd go for:
A. 120 games against each of the 10 opponents
I think that 120 games is a reasonable number against a given opponent, if you're having each play from the same opening line as both White and Black.
A good range of opponents is better than a limited range.
The only other option I'd consider would be:
B. 240 games against each of the 5 opponents
A. 120 games against each of the 10 opponents
I think that 120 games is a reasonable number against a given opponent, if you're having each play from the same opening line as both White and Black.
A good range of opponents is better than a limited range.
The only other option I'd consider would be:
B. 240 games against each of the 5 opponents
gbanksnz at gmail.com
-
- Posts: 12792
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: On engine testing again!
I suggest method A.Edsel Apostol wrote:Let's say that due to the limited resources one can only play 1200 games per engine version/settings.
Which testing method is better and why?
A. 120 games against each of the 10 opponents
B. 240 games against each of the 5 opponents
C. 300 games against each of the 4 opponents
D. 400 games against each of the 3 opponents
E. 600 games against each of the 2 opponents
E. 1200 games against an opponent
If there is some flaw in search or eval method, it is ten times as likely to uncover it with approach A than approach E if the feature that is flawed has a small probability to be implemented correctly.
IOW, suppose that you chose method E and the opponent does not understand open files and diagonals. By some coincidence, open files and diagonals is also a serious flaw in the program to be tested. The flaw will go unrevealed. But if you chose ten distinct programs, the odds that all ten would lack the feature are far less than a single program lacking the feature.
If you see that program N destroys your program (and it should not according to Elo estimates) then you can examine the games afterwards and see what defect is present in your games.
-
- Posts: 6662
- Joined: Thu Mar 09, 2006 4:21 am
Re: On engine testing again!
I completely agree with Dann. His reasoning is pretty good.
A is the best choice if you like variety. If you want more games, you could reduce the time control instead of reducing the number of participants. If the time control is really short to the point where you can't reduce it anymore, then Option B is Ok provided you've chosen 5 engines with completely different style and wanted to see more games.
A is the best choice if you like variety. If you want more games, you could reduce the time control instead of reducing the number of participants. If the time control is really short to the point where you can't reduce it anymore, then Option B is Ok provided you've chosen 5 engines with completely different style and wanted to see more games.
-
- Posts: 803
- Joined: Mon Jul 17, 2006 5:53 am
- Full name: Edsel Apostol
Re: On engine testing again!
Thanks Graham, Dann and Swami. I'm already doing A.
What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.
Which is better:
A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?
or
B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?
Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:
A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)
What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.
Which is better:
A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?
or
B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?
Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:
A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)
Edsel Apostol
https://github.com/ed-apostol/InvictusChess
https://github.com/ed-apostol/InvictusChess
-
- Posts: 169
- Joined: Sun May 11, 2008 10:31 pm
- Location: Switzerland
Re: On engine testing again!
In my opinion the best method is
A. 120 games against each of the 10 opponents
Kurt
A. 120 games against each of the 10 opponents
Kurt
-
- Posts: 2851
- Joined: Wed Mar 08, 2006 10:01 pm
- Location: Irvine, CA, USA
Re: On engine testing again!
Uri had the interesting suggestion to play against the strongest engine you can find, but give your engine time odds so that it plays at a roughly equal strength. This can somewhat speed up testing. It might not be best to choose all your opponents this way.Edsel Apostol wrote:Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:
-
- Posts: 6662
- Joined: Thu Mar 09, 2006 4:21 am
Re: On engine testing again!
I'd prefer A: But not the older test suite of 2006 from Noomen.Edsel Apostol wrote:Thanks Graham, Dann and Swami. I'm already doing A.
What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.
Which is better:
A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?
or
B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?
Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:
A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)
Instead:
Gambit Suite 2008 (50 Gambit Opening positions)
Noomen Test Suite 2008 (50 Balanced opening positions)
These can be downloaded here:
http://www.rybkachess.com/index.php?auswahl=Downloads
You might get the idea from results about whether your engine does better with Gambit style or with Balanced Style.
As for your second query:
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
-
- Posts: 6662
- Joined: Thu Mar 09, 2006 4:21 am
Re: On engine testing again!
There's no correct rating assigned for top engine with time odd. One needs to know the rating of the opponent to see if the changes made in the engine resulted in a progress.Dirt wrote:Uri had the interesting suggestion to play against the strongest engine you can find, but give your engine time odds so that it plays at a roughly equal strength. This can somewhat speed up testing. It might not be best to choose all your opponents this way.Edsel Apostol wrote:Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:
-
- Posts: 44611
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: On engine testing again!
I don't use opening suites, so can't comment on them.Edsel Apostol wrote:Thanks Graham, Dann and Swami. I'm already doing A.
What about the best opening suite to use for testing? I'm currently using the combined Noomen Test Suite 2006 and 2008, a total of 60 positions. What I liked about it is that it is thematic. It has themes for pawn storms, king attacks, passed pawns, etc.
Which is better:
A. Using thematic openings where it caters for the important evaluation parts of an engine (like Noomen's Test Suite)?
or
B. A collection of the most common openings found in real games (like MLmfl, Sedat's top 50)?
Another question, what should be the best range of elos of the opponents if for example the engine has 2850 elo and why:
A. 2850 - 3050 (engines stronger within +200 elo)
B. 2750 - 2950 (engines weaker or stronger within 100 elo)
C. 2650 - 2850 (engines weaker within 200 elo)
I'd select opponents within 100 ELO either side.
Cheers,
Graham.
gbanksnz at gmail.com