I think your suggestion is very sensible.Kempelen wrote:Should it work?
I have been thinking about ways to improve testing time results. People usually use tournaments from startup position, or tournaments with a set of very limited set of position (i.e. 32), or tournaments with a lot of random positions. I asumme all people is doing this with a minimum of 1000 to 4000 games.
.... but ....
what about repeating the same tournament, with the same opponents, with the same positions per opponent?. Assuming a set of positions would be very large....
example:
Game 1, agains Crafty, black, posicion from FEN file 'myfenpositions.epd', number of position 540
Game 2, agains Critter, white, position from FEN file 'myfenpositions.epd', number of position 3251
....
etc
the idea is that the number of position would be always the same and not choosed ramdomly, without repeating any FEN, but enought varied.
The tournament file from the tournament manager would always be the same, without the need to recreate the tournament. The test would always repeat the same.
Would be results between tests more accurate than randomly choose the startup position.?
With Komodo we do all sort of exploratory testing, but our primary test to verify that a version is good is a gauntlet style test where the candidate version of Komodo plays at least 2 foreign programs for about 20,000 games.
The opening set is very shallow and very large - we don't tune Komodo to any particular book because we want it to play all positions well. It will be the future job of a book maker to make an optimize book. Our book is not deep because we want it to also be able to play the opening reasonable well. It is 5 moves per side so that the programs are "on their own" as soon as possible while still being able to provide lot's of variety.
We actually have something like 35,000 openings but they are chosen in a way that we don't just repeat the first 10,000 over and over again (for a 20,000 game match.) However they are played in pairs so that each "pair" of programs plays both side always of the same opening. A hash of the programs determine which subset (or more accurately which order) to play from the 35,000 opening set.
We have some evidence that indicates playing foreign opponents is superior to self-play.
Don