Scaling from FGRL results with top 3 engines

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Scaling from FGRL results with top 3 engines

Post by Laskos »

Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:

Code: Select all

Houdini 6 x64         : 3239 2250 (+1416,=741,- 93), 79.4 %
Normalized ELO: 1.03 +/- 0.04

Stockfish 8 64        : 3208 2250 (+1294,=842,-114), 76.2 %
Normalized ELO: 0.89 +/- 0.04

Komodo 11.2 64-bit    : 3189 2250 (+1278,=781,-191), 74.2 %
Normalized ELO: 0.75 +/- 0.04

600s + 6s

Code: Select all

Houdini 6         : 3173 2700 (+1346,=1284,- 70), 73.6 %
Normalized ELO: 0.86 +/- 0.04

Stockfish 8       : 3140 2700 (+1144,=1464,- 92), 69.5 %
Normalized ELO: 0.70 +/- 0.04

Komodo 11.2       : 3150 2700 (+1265,=1292,-143), 70.8 %
Normalized ELO: 0.70 +/- 0.04

Scaling from 60s + 0.6s to 600s + 6s:

Code: Select all

Houdini 6 x64:         -0.17 +/- 0.05
Stockfish x64:         -0.19 +/- 0.05
Komodo 11.2 64-bit:    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov »

Laskos wrote:Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:

Code: Select all

Houdini 6 x64         : 3239 2250 (+1416,=741,- 93), 79.4 %
Normalized ELO: 1.03 +/- 0.04

Stockfish 8 64        : 3208 2250 (+1294,=842,-114), 76.2 %
Normalized ELO: 0.89 +/- 0.04

Komodo 11.2 64-bit    : 3189 2250 (+1278,=781,-191), 74.2 %
Normalized ELO: 0.75 +/- 0.04

600s + 6s

Code: Select all

Houdini 6         : 3173 2700 (+1346,=1284,- 70), 73.6 %
Normalized ELO: 0.86 +/- 0.04

Stockfish 8       : 3140 2700 (+1144,=1464,- 92), 69.5 %
Normalized ELO: 0.70 +/- 0.04

Komodo 11.2       : 3150 2700 (+1265,=1292,-143), 70.8 %
Normalized ELO: 0.70 +/- 0.04

Scaling from 60s + 0.6s to 600s + 6s:

Code: Select all

Houdini 6 x64:         -0.17 +/- 0.05
Stockfish x64:         -0.19 +/- 0.05
Komodo 11.2 64-bit:    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.
so, Houdini not clearly better, even than Komodo.

the main reason Komodo is scaling better than other top engines with longer TC is the bigger extent of deficiency of Komodo king safety, which makes the engine lose many more games at shorter TCs, where it could miss attacking tactics, while at longer TC such enemy attacks are not missed.

other factors might weigh in too, like for example Komodo having more reasonable evaluation term values, or deeper knowledge that scales better at LTC, like for example imbalance evaluation, but I guess the primary reason is king safety.

of course, Komodo authors might not agree with this.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Scaling from FGRL results with top 3 engines

Post by Dann Corbit »

If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov »

indeed, look at the stats, I guess my hypothesis is right, at longer TC,
Komodo wins only 10 games more, but loses 50 games less.

so, it should not have exhibited any extraordinary skills at the longer TC, rather than filled some holes in defence.

at the same time, H and SF lose each only 20 games less at longer TC, but win significantly less.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov »

Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
sorry Dann, but that is all BS.

1) SMP is irrelevant, as Andreas' tests are conducted with single thread

2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.

3) evaluation will not affect scaling much, hmm, I said 'king safety' and not 'evaluation', king safety might be related to both evaluation and search(and move ordering too, for that matter).

I have not based my observations on pure speculation, but on following extreme number of games at STC between the tops.

also, see Kai's statistics, seems to be pointing at exactly my hypothesis.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Scaling from FGRL results with top 3 engines

Post by Dann Corbit »

Lyudmil Tsvetkov wrote:
Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
sorry Dann, but that is all BS.

1) SMP is irrelevant, as Andreas' tests are conducted with single thread

2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.

3) evaluation will not affect scaling much, hmm, I said 'king safety' and not 'evaluation', king safety might be related to both evaluation and search(and move ordering too, for that matter).

I have not based my observations on pure speculation, but on following extreme number of games at STC between the tops.

also, see Kai's statistics, seems to be pointing at exactly my hypothesis.
>>
2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.
<<
You're a funny man.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Scaling from FGRL results with top 3 engines

Post by mjlef »

Lyudmil Tsvetkov wrote:
Laskos wrote:Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:

Code: Select all

Houdini 6 x64         &#58; 3239 2250 (+1416,=741,- 93&#41;, 79.4 %
Normalized ELO&#58; 1.03 +/- 0.04

Stockfish 8 64        &#58; 3208 2250 (+1294,=842,-114&#41;, 76.2 %
Normalized ELO&#58; 0.89 +/- 0.04

Komodo 11.2 64-bit    &#58; 3189 2250 (+1278,=781,-191&#41;, 74.2 %
Normalized ELO&#58; 0.75 +/- 0.04

600s + 6s

Code: Select all

Houdini 6         &#58; 3173 2700 (+1346,=1284,- 70&#41;, 73.6 %
Normalized ELO&#58; 0.86 +/- 0.04

Stockfish 8       &#58; 3140 2700 (+1144,=1464,- 92&#41;, 69.5 %
Normalized ELO&#58; 0.70 +/- 0.04

Komodo 11.2       &#58; 3150 2700 (+1265,=1292,-143&#41;, 70.8 %
Normalized ELO&#58; 0.70 +/- 0.04

Scaling from 60s + 0.6s to 600s + 6s:

Code: Select all

Houdini 6 x64&#58;         -0.17 +/- 0.05
Stockfish x64&#58;         -0.19 +/- 0.05
Komodo 11.2 64-bit&#58;    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.
so, Houdini not clearly better, even than Komodo.

the main reason Komodo is scaling better than other top engines with longer TC is the bigger extent of deficiency of Komodo king safety, which makes the engine lose many more games at shorter TCs, where it could miss attacking tactics, while at longer TC such enemy attacks are not missed.

other factors might weigh in too, like for example Komodo having more reasonable evaluation term values, or deeper knowledge that scales better at LTC, like for example imbalance evaluation, but I guess the primary reason is king safety.

of course, Komodo authors might not agree with this.
We have some evidence of the best values for King Safety varying by search depth, but when we did this a couple of years ago, if I recall correctly, smaller King Safety values worked better in faster games/less search depth. Of course King Safety keeps changing in Komodo and it is worth testing some more.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov »

Dann Corbit wrote:
Lyudmil Tsvetkov wrote:
Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
sorry Dann, but that is all BS.

1) SMP is irrelevant, as Andreas' tests are conducted with single thread

2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.

3) evaluation will not affect scaling much, hmm, I said 'king safety' and not 'evaluation', king safety might be related to both evaluation and search(and move ordering too, for that matter).

I have not based my observations on pure speculation, but on following extreme number of games at STC between the tops.

also, see Kai's statistics, seems to be pointing at exactly my hypothesis.
>>
2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.
<<
You're a funny man.
you are even funnier.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov »

mjlef wrote:
Lyudmil Tsvetkov wrote:
Laskos wrote:Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:

Code: Select all

Houdini 6 x64         &#58; 3239 2250 (+1416,=741,- 93&#41;, 79.4 %
Normalized ELO&#58; 1.03 +/- 0.04

Stockfish 8 64        &#58; 3208 2250 (+1294,=842,-114&#41;, 76.2 %
Normalized ELO&#58; 0.89 +/- 0.04

Komodo 11.2 64-bit    &#58; 3189 2250 (+1278,=781,-191&#41;, 74.2 %
Normalized ELO&#58; 0.75 +/- 0.04

600s + 6s

Code: Select all

Houdini 6         &#58; 3173 2700 (+1346,=1284,- 70&#41;, 73.6 %
Normalized ELO&#58; 0.86 +/- 0.04

Stockfish 8       &#58; 3140 2700 (+1144,=1464,- 92&#41;, 69.5 %
Normalized ELO&#58; 0.70 +/- 0.04

Komodo 11.2       &#58; 3150 2700 (+1265,=1292,-143&#41;, 70.8 %
Normalized ELO&#58; 0.70 +/- 0.04

Scaling from 60s + 0.6s to 600s + 6s:

Code: Select all

Houdini 6 x64&#58;         -0.17 +/- 0.05
Stockfish x64&#58;         -0.19 +/- 0.05
Komodo 11.2 64-bit&#58;    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.
so, Houdini not clearly better, even than Komodo.

the main reason Komodo is scaling better than other top engines with longer TC is the bigger extent of deficiency of Komodo king safety, which makes the engine lose many more games at shorter TCs, where it could miss attacking tactics, while at longer TC such enemy attacks are not missed.

other factors might weigh in too, like for example Komodo having more reasonable evaluation term values, or deeper knowledge that scales better at LTC, like for example imbalance evaluation, but I guess the primary reason is king safety.

of course, Komodo authors might not agree with this.
We have some evidence of the best values for King Safety varying by search depth, but when we did this a couple of years ago, if I recall correctly, smaller King Safety values worked better in faster games/less search depth. Of course King Safety keeps changing in Komodo and it is worth testing some more.
so this should only support my hypothesis.

I guess tuning an engine is a real pain.
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Scaling from FGRL results with top 3 engines

Post by Uri Blass »

Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
I think that better search does not mean lower branching factor

It is easy to get lower branching factor by dubious pruning.

I think that evaluation is important and I expect top engines not to scale well if you change their evaluation to simple piece square table evaluation.