Scaling from FGRL results with top 3 engines

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
Laskos
Posts: 8965
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Scaling from FGRL results with top 3 engines

Post by Laskos » Mon Sep 25, 2017 9:49 pm

Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:

Code: Select all

Houdini 6 x64         : 3239 2250 (+1416,=741,- 93), 79.4 %
Normalized ELO: 1.03 +/- 0.04

Stockfish 8 64        : 3208 2250 (+1294,=842,-114), 76.2 %
Normalized ELO: 0.89 +/- 0.04

Komodo 11.2 64-bit    : 3189 2250 (+1278,=781,-191), 74.2 %
Normalized ELO: 0.75 +/- 0.04

600s + 6s

Code: Select all

Houdini 6         : 3173 2700 (+1346,=1284,- 70), 73.6 %
Normalized ELO: 0.86 +/- 0.04

Stockfish 8       : 3140 2700 (+1144,=1464,- 92), 69.5 %
Normalized ELO: 0.70 +/- 0.04

Komodo 11.2       : 3150 2700 (+1265,=1292,-143), 70.8 %
Normalized ELO: 0.70 +/- 0.04

Scaling from 60s + 0.6s to 600s + 6s:

Code: Select all

Houdini 6 x64:         -0.17 +/- 0.05
Stockfish x64:         -0.19 +/- 0.05
Komodo 11.2 64-bit:    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.

Lyudmil Tsvetkov
Posts: 6037
Joined: Tue Jun 12, 2012 10:41 am

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov » Tue Sep 26, 2017 11:14 pm

Laskos wrote:Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:

Code: Select all

Houdini 6 x64         : 3239 2250 (+1416,=741,- 93), 79.4 %
Normalized ELO: 1.03 +/- 0.04

Stockfish 8 64        : 3208 2250 (+1294,=842,-114), 76.2 %
Normalized ELO: 0.89 +/- 0.04

Komodo 11.2 64-bit    : 3189 2250 (+1278,=781,-191), 74.2 %
Normalized ELO: 0.75 +/- 0.04

600s + 6s

Code: Select all

Houdini 6         : 3173 2700 (+1346,=1284,- 70), 73.6 %
Normalized ELO: 0.86 +/- 0.04

Stockfish 8       : 3140 2700 (+1144,=1464,- 92), 69.5 %
Normalized ELO: 0.70 +/- 0.04

Komodo 11.2       : 3150 2700 (+1265,=1292,-143), 70.8 %
Normalized ELO: 0.70 +/- 0.04

Scaling from 60s + 0.6s to 600s + 6s:

Code: Select all

Houdini 6 x64:         -0.17 +/- 0.05
Stockfish x64:         -0.19 +/- 0.05
Komodo 11.2 64-bit:    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.
so, Houdini not clearly better, even than Komodo.

the main reason Komodo is scaling better than other top engines with longer TC is the bigger extent of deficiency of Komodo king safety, which makes the engine lose many more games at shorter TCs, where it could miss attacking tactics, while at longer TC such enemy attacks are not missed.

other factors might weigh in too, like for example Komodo having more reasonable evaluation term values, or deeper knowledge that scales better at LTC, like for example imbalance evaluation, but I guess the primary reason is king safety.

of course, Komodo authors might not agree with this.

Dann Corbit
Posts: 9505
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Scaling from FGRL results with top 3 engines

Post by Dann Corbit » Tue Sep 26, 2017 11:19 pm

If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Lyudmil Tsvetkov
Posts: 6037
Joined: Tue Jun 12, 2012 10:41 am

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov » Tue Sep 26, 2017 11:22 pm

indeed, look at the stats, I guess my hypothesis is right, at longer TC,
Komodo wins only 10 games more, but loses 50 games less.

so, it should not have exhibited any extraordinary skills at the longer TC, rather than filled some holes in defence.

at the same time, H and SF lose each only 20 games less at longer TC, but win significantly less.

Lyudmil Tsvetkov
Posts: 6037
Joined: Tue Jun 12, 2012 10:41 am

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov » Tue Sep 26, 2017 11:30 pm

Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
sorry Dann, but that is all BS.

1) SMP is irrelevant, as Andreas' tests are conducted with single thread

2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.

3) evaluation will not affect scaling much, hmm, I said 'king safety' and not 'evaluation', king safety might be related to both evaluation and search(and move ordering too, for that matter).

I have not based my observations on pure speculation, but on following extreme number of games at STC between the tops.

also, see Kai's statistics, seems to be pointing at exactly my hypothesis.

Dann Corbit
Posts: 9505
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Scaling from FGRL results with top 3 engines

Post by Dann Corbit » Tue Sep 26, 2017 11:48 pm

Lyudmil Tsvetkov wrote:
Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
sorry Dann, but that is all BS.

1) SMP is irrelevant, as Andreas' tests are conducted with single thread

2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.

3) evaluation will not affect scaling much, hmm, I said 'king safety' and not 'evaluation', king safety might be related to both evaluation and search(and move ordering too, for that matter).

I have not based my observations on pure speculation, but on following extreme number of games at STC between the tops.

also, see Kai's statistics, seems to be pointing at exactly my hypothesis.
>>
2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.
<<
You're a funny man.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

mjlef
Posts: 1387
Joined: Thu Mar 30, 2006 12:08 pm
Contact:

Re: Scaling from FGRL results with top 3 engines

Post by mjlef » Wed Sep 27, 2017 3:27 am

Lyudmil Tsvetkov wrote:
Laskos wrote:Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:

Code: Select all

Houdini 6 x64         &#58; 3239 2250 (+1416,=741,- 93&#41;, 79.4 %
Normalized ELO&#58; 1.03 +/- 0.04

Stockfish 8 64        &#58; 3208 2250 (+1294,=842,-114&#41;, 76.2 %
Normalized ELO&#58; 0.89 +/- 0.04

Komodo 11.2 64-bit    &#58; 3189 2250 (+1278,=781,-191&#41;, 74.2 %
Normalized ELO&#58; 0.75 +/- 0.04

600s + 6s

Code: Select all

Houdini 6         &#58; 3173 2700 (+1346,=1284,- 70&#41;, 73.6 %
Normalized ELO&#58; 0.86 +/- 0.04

Stockfish 8       &#58; 3140 2700 (+1144,=1464,- 92&#41;, 69.5 %
Normalized ELO&#58; 0.70 +/- 0.04

Komodo 11.2       &#58; 3150 2700 (+1265,=1292,-143&#41;, 70.8 %
Normalized ELO&#58; 0.70 +/- 0.04

Scaling from 60s + 0.6s to 600s + 6s:

Code: Select all

Houdini 6 x64&#58;         -0.17 +/- 0.05
Stockfish x64&#58;         -0.19 +/- 0.05
Komodo 11.2 64-bit&#58;    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.
so, Houdini not clearly better, even than Komodo.

the main reason Komodo is scaling better than other top engines with longer TC is the bigger extent of deficiency of Komodo king safety, which makes the engine lose many more games at shorter TCs, where it could miss attacking tactics, while at longer TC such enemy attacks are not missed.

other factors might weigh in too, like for example Komodo having more reasonable evaluation term values, or deeper knowledge that scales better at LTC, like for example imbalance evaluation, but I guess the primary reason is king safety.

of course, Komodo authors might not agree with this.
We have some evidence of the best values for King Safety varying by search depth, but when we did this a couple of years ago, if I recall correctly, smaller King Safety values worked better in faster games/less search depth. Of course King Safety keeps changing in Komodo and it is worth testing some more.

Lyudmil Tsvetkov
Posts: 6037
Joined: Tue Jun 12, 2012 10:41 am

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov » Wed Sep 27, 2017 5:22 pm

Dann Corbit wrote:
Lyudmil Tsvetkov wrote:
Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
sorry Dann, but that is all BS.

1) SMP is irrelevant, as Andreas' tests are conducted with single thread

2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.

3) evaluation will not affect scaling much, hmm, I said 'king safety' and not 'evaluation', king safety might be related to both evaluation and search(and move ordering too, for that matter).

I have not based my observations on pure speculation, but on following extreme number of games at STC between the tops.

also, see Kai's statistics, seems to be pointing at exactly my hypothesis.
>>
2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.
<<
You're a funny man.
you are even funnier.

Lyudmil Tsvetkov
Posts: 6037
Joined: Tue Jun 12, 2012 10:41 am

Re: Scaling from FGRL results with top 3 engines

Post by Lyudmil Tsvetkov » Wed Sep 27, 2017 5:26 pm

mjlef wrote:
Lyudmil Tsvetkov wrote:
Laskos wrote:Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:

Code: Select all

Houdini 6 x64         &#58; 3239 2250 (+1416,=741,- 93&#41;, 79.4 %
Normalized ELO&#58; 1.03 +/- 0.04

Stockfish 8 64        &#58; 3208 2250 (+1294,=842,-114&#41;, 76.2 %
Normalized ELO&#58; 0.89 +/- 0.04

Komodo 11.2 64-bit    &#58; 3189 2250 (+1278,=781,-191&#41;, 74.2 %
Normalized ELO&#58; 0.75 +/- 0.04

600s + 6s

Code: Select all

Houdini 6         &#58; 3173 2700 (+1346,=1284,- 70&#41;, 73.6 %
Normalized ELO&#58; 0.86 +/- 0.04

Stockfish 8       &#58; 3140 2700 (+1144,=1464,- 92&#41;, 69.5 %
Normalized ELO&#58; 0.70 +/- 0.04

Komodo 11.2       &#58; 3150 2700 (+1265,=1292,-143&#41;, 70.8 %
Normalized ELO&#58; 0.70 +/- 0.04

Scaling from 60s + 0.6s to 600s + 6s:

Code: Select all

Houdini 6 x64&#58;         -0.17 +/- 0.05
Stockfish x64&#58;         -0.19 +/- 0.05
Komodo 11.2 64-bit&#58;    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.
so, Houdini not clearly better, even than Komodo.

the main reason Komodo is scaling better than other top engines with longer TC is the bigger extent of deficiency of Komodo king safety, which makes the engine lose many more games at shorter TCs, where it could miss attacking tactics, while at longer TC such enemy attacks are not missed.

other factors might weigh in too, like for example Komodo having more reasonable evaluation term values, or deeper knowledge that scales better at LTC, like for example imbalance evaluation, but I guess the primary reason is king safety.

of course, Komodo authors might not agree with this.
We have some evidence of the best values for King Safety varying by search depth, but when we did this a couple of years ago, if I recall correctly, smaller King Safety values worked better in faster games/less search depth. Of course King Safety keeps changing in Komodo and it is worth testing some more.
so this should only support my hypothesis.

I guess tuning an engine is a real pain.

Uri Blass
Posts: 8367
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Scaling from FGRL results with top 3 engines

Post by Uri Blass » Wed Sep 27, 2017 6:15 pm

Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
I think that better search does not mean lower branching factor

It is easy to get lower branching factor by dubious pruning.

I think that evaluation is important and I expect top engines not to scale well if you change their evaluation to simple piece square table evaluation.

Post Reply