1)my definition of obviously good ideas is probably different than your definition.bob wrote:Uri Blass wrote:I disagree.bob wrote:Sorry, but I can't give you an answer. To get reliable/stable results, you need thousands of games, not tens or hundreds. My goal is to be able to accurately say whether A or A' is better with a very high level of accuracy. I tried to use one PC for years and never found something workable...Jan Brouwer wrote:Hi Bob,
I understand that you have considerable hardware resources available for testing.
Can you give any general advice on how you would test on a single PC, let's say a quad-core processor with a time limit of about 20 hour (to allow for daily iterations) ?
What time-control (maybe several different ones?), how many different opponents, etc.
So far I have done most testing at 20 second + 1 second / move against about 6 opponents using Nunn starting positions, just to get a reasonable number of games in a few hours.
There are things that you clearly can do
1)You can use common sense to decide if a change is good.
If the program has some weakness and you see that a change fix that weakness and the result in games is also good that the change probably works.
Sorry, but that's no good. I can't count the number of "obviously good ideas" we have implemented this past year, but testing shows they are worse than the original. If you rely on intuition, you are going to make a _lot_ of wrong steps. Objective measurement is the key...
Again, wrong in my opinion. To solve test suites faster, just increase your check extensions, etc. But that won't make your program play better in real games. It will likely slow it down enough that it will actually play significantly worse. Chest is a good example. Optimized for finding mates. Would make a horrible game player...
2)You can use test suites in case that you make changes in your search.
If you make your program faster with the same output you can be practically sure that you made an improvement and you may need games only to verify that you have no bugs that happen only when you make more than one search.
Possibly. But "super-fast" games make tactical programs look better than they actually are, or they make positional programs look worse. Because the relative difference in the average search depth increases as the games speed up. Again, you can draw the wrong conclusion.Not always the change in the search is exactly speed improvement but even in that case you can use test suites.
Note that I allow checks in the first 2 plies of the qsearch in movei but I have no special move generator that generates only captures and checks and I simply generate all moves and later find the checks out of them.
I think to add special move generator that generates only captures and checks.
This generator will not give me a pure speed improvement because the order of the generated moves may be different relative to the normal move generator but I think that it will be possible to see if there is an improvement based on test suites when I may play games only to verify that there is no serious bug.
3)You may play games at super fast time control for part of the changes that you try.
If you only do an eval change, you can often get by with fast games. But if you don't run long games occasionally, you get surprised...
Based on my knowledge part of the testing of rybka is simply by very fast games(game in less than 1 second)
Uri
I do not see it relevant for the middle game and
I think that I can say it mainly about knowledge for specific endgame that movei still does not have(like knowing some endgames are drawn or won and in that case the only thing that I am afraid from is bugs).
One example:
Movei still has big score of advantage for positions like KRB vs KRP and I am sure that it should be smaller and closer to draw.
Movei is already a slow searcher so I am not afraid from possible small loss in speed from adding knowledge.
2)I agree that there are changes that are productive in test suites but counter productive in games and I do not say that I can use test suites for every search change and the question if to use test suites is dependent on the change that you do.
If the change that you do is close to be equivalent to speed improvement you can use test suites.
It does not have to be direct speed improvement because the order of moves that you generate may be different but I talk about changes that cause you to get the same depth faster without change in the search algorithm.
3)I plan to test the program against itself in 1000 nodes per second that mean less than 1 second per game and I believe that it can be productive in detecting bugs in adding endgame knowledge(if the new program does not win the match that it is supposed to win).
In this case
Most of the matchs of 2 games from the first opening position are supposed to end by 1:1 and I may look at games that are finished by different result to see what is wrong or if not 1:1 is thanks to right knowledge.
Uri