Long game vs short game testing

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Mincho Georgiev
Posts: 454
Joined: Sat Apr 04, 2009 6:44 pm
Location: Bulgaria

Re: Long game vs short game testing

Post by Mincho Georgiev »

bob wrote:
vladstamate wrote:Hi all,

I have seemingly simple question: If I run n games at 20sec per game and I also run n games at 1min per game, should I trust the results of the second test more?

Intuitively I would say yes, since the second test allows more "game time" therefore he chance of showing weeknesses is increased. Also the engines being tested can reach a larger depth therefore some techniques might fire more often (those that have if(depth>value) in them).

Can this be generalized so that we always trust longer time testing?

Here is a more interesting question: Would you trust a 3000 games @ 20sec per game or a 1000 game @ 1min per game test? On average they both should take roughly the same time (about 90000sec or 1500min or 25 hours).

Am I correct in assuming the above?

Regards,
Vlad.
Two answers. based on tens of millions of games.

1. For eval changes, and most search changes, short or long time controls won't matter. If you are better at one, you are better at the other. I've verified this by playing games from 10secs +0.1sec inc to 60min+60sec ijncrement.

2. For timing changes, and some search changes that greatly alter the shape of the tree, this can change. A change to time usage will be influenced by the time control used, and really needs to be tuned at the time control you plan to play most of the time. Some search changes can cause a tree explosion at deeper depths if extensions can "cascade" down through the tree. As you go deeper, you extend more and before you know it, you are hung at some depth with no hope of getting to the next ply in finite time.
I would like to add the reversal too. If the modification to A' doesn't performs better at any time control, when it's supposed to, but it performs better on 'some' TC, then you have a controversy and most likely the change is give either very small gain or no at all, which have to be proved by a significant amount of games when that arises.