Stockfish scaling

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish scaling

Post by Laskos »

lucasart wrote:
Uri Blass wrote:I guess that 40+0.4 is going to give more draws and my opinion is that it is better to use 10+0.1 for stage 1 and 40+0.4 for stage 2 instead of
15+0.05 for stage 1 and 60+0.05 for stage 2.
That is also my guess. But instead of guessing, we need to measure.

Unfortunately the framework cannot be used for that, for two reasons:
1/ that broken logic it has of not rescaling the increment
2/ adjudication rules in the framework are far too aggressive, and in this case we ideally want no adjudication at all, to avoid polluting the measure.

So it has to be done locally. I can't do it now, because my CPU resources are running full blast on CLOP-ing DiscoCheck all over the place.
Just 1,000 games each run between two same, recent Stockfishes, but it looks the opposite of what you guessed:
No adjudications.

1,000 games
15'' + 0.5''
88s average game length.
62.9% draws

1,000 games
42'' + 0.05''
76s average game length
65.6% draws

Not only the games are a bit shorter than expected with very small increment, but they seem of higher quality. Middlegame and generally the time management seem to count more, and Stockfish framework TC seems better adapted. Maybe not enough games, though.
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish scaling

Post by Uri Blass »

Laskos wrote:
lucasart wrote:
Uri Blass wrote:I guess that 40+0.4 is going to give more draws and my opinion is that it is better to use 10+0.1 for stage 1 and 40+0.4 for stage 2 instead of
15+0.05 for stage 1 and 60+0.05 for stage 2.
That is also my guess. But instead of guessing, we need to measure.

Unfortunately the framework cannot be used for that, for two reasons:
1/ that broken logic it has of not rescaling the increment
2/ adjudication rules in the framework are far too aggressive, and in this case we ideally want no adjudication at all, to avoid polluting the measure.

So it has to be done locally. I can't do it now, because my CPU resources are running full blast on CLOP-ing DiscoCheck all over the place.
Just 1,000 games each run between two same, recent Stockfishes, but it looks the opposite of what you guessed:
No adjudications.

1,000 games
15'' + 0.5''
88s average game length.
62.9% draws

1,000 games
42'' + 0.05''
76s average game length
65.6% draws

Not only the games are a bit shorter than expected with very small increment, but they seem of higher quality. Middlegame and generally the time management seem to count more, and Stockfish framework TC seems better adapted. Maybe not enough games, though.
Note that I did not suggest 15+0.5 and I think that it is too much increasement.

It may be interesting to compare 15''+0.05'' with 10''+0.1''
or with longer time control
60''+0.05'' with 40''+0.4''
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Stockfish scaling

Post by lucasart »

lucasart wrote:
Uri Blass wrote:I guess that 40+0.4 is going to give more draws and my opinion is that it is better to use 10+0.1 for stage 1 and 40+0.4 for stage 2 instead of
15+0.05 for stage 1 and 60+0.05 for stage 2.
That is also my guess. But instead of guessing, we need to measure.
I ran a test: DiscoCheck playing against itself 6"+0.05" against 9"+0". There were no time losses (set Time Buffer = 10ms in DC), and the 6+0.05 version was convincingly stronger.

So it confirms my intuition. Using a realistic time/inc ratio (here 120) is preferable to zero increment.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Stockfish scaling

Post by Laskos »

lucasart wrote:
lucasart wrote:
Uri Blass wrote:I guess that 40+0.4 is going to give more draws and my opinion is that it is better to use 10+0.1 for stage 1 and 40+0.4 for stage 2 instead of
15+0.05 for stage 1 and 60+0.05 for stage 2.
That is also my guess. But instead of guessing, we need to measure.
I ran a test: DiscoCheck playing against itself 6"+0.05" against 9"+0". There were no time losses (set Time Buffer = 10ms in DC), and the 6+0.05 version was convincingly stronger.

So it confirms my intuition. Using a realistic time/inc ratio (here 120) is preferable to zero increment.
I didn't check those TC and with DC, but Stockfish games at 9''+0'' average 14.3s, those at 6''+0.05'' 16.9s.
20% difference at this TC can make for 50 ELO points, so you have to make sure that TC are equivalent as the length of the game goes.