TalkChess.com

Posted: **Sun Aug 30, 2015 12:00 am**

I have good reasons to believe that the drawback of playing bullet games (say 40/15s) to test a change is that you have to play (a lot) more games than testing at CCRL/CEGT (40/240s).

Leaving out my considerations for the above statement for the moment I want to ask if research has been done to investigate that and maybe there is a formula (or factor) that you can use as base to lower the number of games when you (for instance) double the time control.

Say, you are playing 15,000 (40/15) bullet games. When you decide to double the time control, will 12,000 games (or so) give an equivalent result?

In general the rating lists give a pretty good indication of the strength of programs and they are not playing 10,000 games.

Posted: **Sun Aug 30, 2015 1:21 am**

Rebel wrote:I have good reasons to believe that the drawback of playing bullet games (say 40/15s) to test a change is that you have to play (a lot) more games than testing at CCRL/CEGT (40/240s).

Leaving out my considerations for the above statement for the moment I want to ask if research has been done to investigate that and maybe there is a formula (or factor) that you can use as base to lower the number of games when you (for instance) double the time control.

Say, you are playing 15,000 (40/15) bullet games. When you decide to double the time control, will 12,000 games (or so) give an equivalent result?

In general the rating lists give a pretty good indication of the strength of programs and they are not playing 10,000 games.

What do you base that on? The error bar is not affected by the time control used, only the number of games played and the draw rate. In general, longer games will require MORE games since there will be more draws in the mix.

So no, doubling the time control just checks different things such as search extensions and reductions which might have a greater effect on longer games than on short games...

Posted: **Sun Aug 30, 2015 2:29 am**

bob wrote:
Rebel wrote:I have good reasons to believe that the drawback of playing bullet games (say 40/15s) to test a change is that you have to play (a lot) more games than testing at CCRL/CEGT (40/240s).

Leaving out my considerations for the above statement for the moment I want to ask if research has been done to investigate that and maybe there is a formula (or factor) that you can use as base to lower the number of games when you (for instance) double the time control.

Say, you are playing 15,000 (40/15) bullet games. When you decide to double the time control, will 12,000 games (or so) give an equivalent result?

In general the rating lists give a pretty good indication of the strength of programs and they are not playing 10,000 games.
What do you base that on? The error bar is not affected by the time control used, only the number of games played and the draw rate. In general, longer games will require MORE games since there will be more draws in the mix.

Do you have hard numbers?

So no, doubling the time control just checks different things such as search extensions and reductions which might have a greater effect on longer games than on short games...

That's a given, yes.

Posted: **Sun Aug 30, 2015 2:43 am**

Rebel wrote:
bob wrote:
Rebel wrote:I have good reasons to believe that the drawback of playing bullet games (say 40/15s) to test a change is that you have to play (a lot) more games than testing at CCRL/CEGT (40/240s).

Leaving out my considerations for the above statement for the moment I want to ask if research has been done to investigate that and maybe there is a formula (or factor) that you can use as base to lower the number of games when you (for instance) double the time control.

Say, you are playing 15,000 (40/15) bullet games. When you decide to double the time control, will 12,000 games (or so) give an equivalent result?

In general the rating lists give a pretty good indication of the strength of programs and they are not playing 10,000 games.
What do you base that on? The error bar is not affected by the time control used, only the number of games played and the draw rate. In general, longer games will require MORE games since there will be more draws in the mix.
Do you have hard numbers?

So no, doubling the time control just checks different things such as search extensions and reductions which might have a greater effect on longer games than on short games...
That's a given, yes.

Hard numbers? Look up "Elo". The formula has NO reference to time control. Just number of wins, losses and draws.

Think about it. If if worked as you (we) would like, one could play one LONG game and get the same result. Except we know what one game would be, random noise...

I do lots of testing at different time controls, but the number of games to get +/-3 Elo remains at 30K or so.

I've even tested at 40moves/2 hours a couple of times, running over 700 games at a time. Takes forever, 30K still +/- 3.

Posted: **Sun Aug 30, 2015 9:04 am**

bob wrote: The formula has NO reference to time control.

Yep, that's why I started this topic. And it makes me wonder if that is correct, not saying it (the elo bar) is not correct.

Think about it. If if worked as you (we) would like, one could play one LONG game and get the same result. Except we know what one game would be, random noise...

Sure.

But I do think playing 1000 (40/240s) games is far more reliable than 1000 (40/15s) bullet games. Much less horizon effects, much less negative influence from the 1/18th of a second, CLOCKS_PER_SEC which is pretty dominant in bullet games.

I do lots of testing at different time controls, but the number of games to get +/-3 Elo remains at 30K or so.

I've even tested at 40moves/2 hours a couple of times, running over 700 games at a time. Takes forever, 30K still +/- 3.

Doing the math provided my interpretation of the above is correct would mean: 3 games a day on 1 PC, meaning 10,000 days to finish, divided by 700 cores is roughly 2 weeks.

Posted: **Sun Aug 30, 2015 9:19 am**

Rebel wrote:I have good reasons to believe that the drawback of playing bullet games (say 40/15s) to test a change is that you have to play (a lot) more games than testing at CCRL/CEGT (40/240s).

Leaving out my considerations for the above statement for the moment I want to ask if research has been done to investigate that and maybe there is a formula (or factor) that you can use as base to lower the number of games when you (for instance) double the time control.

Say, you are playing 15,000 (40/15) bullet games. When you decide to double the time control, will 12,000 games (or so) give an equivalent result?

In general the rating lists give a pretty good indication of the strength of programs and they are not playing 10,000 games.

Bob is correct. Many, for example positional factors, scale differently with time control, so one goes to longer time controls only out of need, to check to reasonable game-play time controls. If all factors would scale equally, it would be preferable to test at shortest possible time controls, given by the the clock tick and such.

Posted: **Sun Aug 30, 2015 5:26 pm**

Laskos wrote:
Rebel wrote:I have good reasons to believe that the drawback of playing bullet games (say 40/15s) to test a change is that you have to play (a lot) more games than testing at CCRL/CEGT (40/240s).

Leaving out my considerations for the above statement for the moment I want to ask if research has been done to investigate that and maybe there is a formula (or factor) that you can use as base to lower the number of games when you (for instance) double the time control.

Say, you are playing 15,000 (40/15) bullet games. When you decide to double the time control, will 12,000 games (or so) give an equivalent result?

In general the rating lists give a pretty good indication of the strength of programs and they are not playing 10,000 games.
Bob is correct. Many, for example positional factors, scale differently with time control, so one goes to longer time controls only out of need, to check to reasonable game-play time controls. If all factors would scale equally, it would be preferable to test at shortest possible time controls, given by the the clock tick and such.

Thanks Kai for moving in.

Are you saying that playing 10,000 (40/240s) games is as reliable than 10,000 (40/15s) bullet games?

This probably is true for positional changes, but for search changes?

Posted: **Sun Aug 30, 2015 6:20 pm**

Testing at a bullet time control is not as reliable as testing at a blitz time control when comparing the results from a fixed number of games. As you stated, there are sources of error that are not accounted for when estimating Elo difference that have a larger effect on bullet games. So, 1000 40/240s games are more reliable than 1000 40/15s games (assuming that other conditions such as the openings are the same).

However, I think that source of estimation error is dominated by the reduction of error due to the increased number of games that can be played in a fixed time period when using bullet time controls. Approximately 16,000 40/15s games can be played in the same amount of time as 1000 40/240s games (assuming an average of 80 moves per game). Ignoring the draw rate (I can not recall the error bar formula that includes draws at the moment), that would result in calculated error bars that are 1/4 of those for 1000 games.

Posted: **Sun Aug 30, 2015 8:45 pm**

Adam Hair wrote:Testing at a bullet time control is not as reliable as testing at a blitz time control when comparing the results from a fixed number of games. As you stated, there are sources of error that are not accounted for when estimating Elo difference that have a larger effect on bullet games. So, 1000 40/240s games are more reliable than 1000 40/15s games (assuming that other conditions such as the openings are the same).

I don't quite understand what you mean. If all factors contributing to the engine strength behave the same at every time control, then the effect is extremely mild one on draw rate only, and the error bars are slightly larger at lower draw rate, thus, at shorter time control (at fixed number of games with a balanced result). You mean that? This effect is extremely mild, and it's easily offset by higher number of games performed in the same amount of time. The end result, if scaling is identical for all factors, the shortest time control is the best to get the smallest error margins in the same amount of time. The problems arises when the scaling for some contributing factor to the strength is different, some, for example, scaling well at longer time control, needing longer time control games, ideally going into the territory of rating game-play. And this problem is unavoidable, therefore one has to be careful when using bullet time control, but use it whenever it is safe.

However, I think that source of estimation error is dominated by the reduction of error due to the increased number of games that can be played in a fixed time period when using bullet time controls. Approximately 16,000 40/15s games can be played in the same amount of time as 1000 40/240s games (assuming an average of 80 moves per game). Ignoring the draw rate (I can not recall the error bar formula that includes draws at the moment), that would result in calculated error bars that are 1/4 of those for 1000 games.

Posted: **Sun Aug 30, 2015 8:57 pm**

Rebel wrote:
Laskos wrote:
Rebel wrote:I have good reasons to believe that the drawback of playing bullet games (say 40/15s) to test a change is that you have to play (a lot) more games than testing at CCRL/CEGT (40/240s).

Leaving out my considerations for the above statement for the moment I want to ask if research has been done to investigate that and maybe there is a formula (or factor) that you can use as base to lower the number of games when you (for instance) double the time control.

Say, you are playing 15,000 (40/15) bullet games. When you decide to double the time control, will 12,000 games (or so) give an equivalent result?

In general the rating lists give a pretty good indication of the strength of programs and they are not playing 10,000 games.
Bob is correct. Many, for example positional factors, scale differently with time control, so one goes to longer time controls only out of need, to check to reasonable game-play time controls. If all factors would scale equally, it would be preferable to test at shortest possible time controls, given by the the clock tick and such.
Thanks Kai for moving in.

Are you saying that playing 10,000 (40/240s) games is as reliable than 10,000 (40/15s) bullet games?

This probably is true for positional changes, but for search changes?

If the rating lists at 40/15s are identical to rating lists at 40/240s, and all changes to the engine scale the same in time, then yes. As we know, they are similar, but not identical. One has to extract, what factors contribute almost identically, what very differently. Those suspected by "intuition" or whatever as contributing to differences, are better be tested at longer time controls too. Is a bit of an art, but no clear statistical rules to determine what changes need to be tested how.

TalkChess.com

Bullet vs regular time control, say 40/4m CCRL/CEGT

Bullet vs regular time control, say 40/4m CCRL/CEGT

Re: Bullet vs regular time control, say 40/4m CCRL/CEGT

Re: Bullet vs regular time control, say 40/4m CCRL/CEGT

Re: Bullet vs regular time control, say 40/4m CCRL/CEGT

Re: Bullet vs regular time control, say 40/4m CCRL/CEGT

Re: Bullet vs regular time control, say 40/4m CCRL/CEGT

Re: Bullet vs regular time control, say 40/4m CCRL/CEGT

Re: Bullet vs regular time control, say 40/4m CCRL/CEGT

Re: Bullet vs regular time control, say 40/4m CCRL/CEGT

Re: Bullet vs regular time control, say 40/4m CCRL/CEGT