Scaling from FGRL results with top 3 engines

Laskos · Post by **Laskos** » Mon Sep 25, 2017 11:49 pm

Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:

Code: Select all

Houdini 6 x64         : 3239 2250 (+1416,=741,- 93), 79.4 %
Normalized ELO: 1.03 +/- 0.04

Stockfish 8 64        : 3208 2250 (+1294,=842,-114), 76.2 %
Normalized ELO: 0.89 +/- 0.04

Komodo 11.2 64-bit    : 3189 2250 (+1278,=781,-191), 74.2 %
Normalized ELO: 0.75 +/- 0.04

600s + 6s

Code: Select all

Houdini 6         : 3173 2700 (+1346,=1284,- 70), 73.6 %
Normalized ELO: 0.86 +/- 0.04

Stockfish 8       : 3140 2700 (+1144,=1464,- 92), 69.5 %
Normalized ELO: 0.70 +/- 0.04

Komodo 11.2       : 3150 2700 (+1265,=1292,-143), 70.8 %
Normalized ELO: 0.70 +/- 0.04

Scaling from 60s + 0.6s to 600s + 6s:

Code: Select all

Houdini 6 x64:         -0.17 +/- 0.05
Stockfish x64:         -0.19 +/- 0.05
Komodo 11.2 64-bit:    -0.05 +/- 0.05

Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Sep 27, 2017 1:14 am

Laskos wrote:Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:
Code: Select all
Houdini 6 x64         : 3239 2250 (+1416,=741,- 93), 79.4 %
Normalized ELO: 1.03 +/- 0.04

Stockfish 8 64        : 3208 2250 (+1294,=842,-114), 76.2 %
Normalized ELO: 0.89 +/- 0.04

Komodo 11.2 64-bit    : 3189 2250 (+1278,=781,-191), 74.2 %
Normalized ELO: 0.75 +/- 0.04
600s + 6s
Code: Select all
Houdini 6         : 3173 2700 (+1346,=1284,- 70), 73.6 %
Normalized ELO: 0.86 +/- 0.04

Stockfish 8       : 3140 2700 (+1144,=1464,- 92), 69.5 %
Normalized ELO: 0.70 +/- 0.04

Komodo 11.2       : 3150 2700 (+1265,=1292,-143), 70.8 %
Normalized ELO: 0.70 +/- 0.04
Scaling from 60s + 0.6s to 600s + 6s:
Code: Select all
Houdini 6 x64:         -0.17 +/- 0.05
Stockfish x64:         -0.19 +/- 0.05
Komodo 11.2 64-bit:    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.

so, Houdini not clearly better, even than Komodo.

the main reason Komodo is scaling better than other top engines with longer TC is the bigger extent of deficiency of Komodo king safety, which makes the engine lose many more games at shorter TCs, where it could miss attacking tactics, while at longer TC such enemy attacks are not missed.

other factors might weigh in too, like for example Komodo having more reasonable evaluation term values, or deeper knowledge that scales better at LTC, like for example imbalance evaluation, but I guess the primary reason is king safety.

of course, Komodo authors might not agree with this.

Dann Corbit · Post by **Dann Corbit** » Wed Sep 27, 2017 1:19 am

If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Sep 27, 2017 1:22 am

indeed, look at the stats, I guess my hypothesis is right, at longer TC,
Komodo wins only 10 games more, but loses 50 games less.

so, it should not have exhibited any extraordinary skills at the longer TC, rather than filled some holes in defence.

at the same time, H and SF lose each only 20 games less at longer TC, but win significantly less.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Sep 27, 2017 1:30 am

Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.

sorry Dann, but that is all BS.

1) SMP is irrelevant, as Andreas' tests are conducted with single thread

2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.

3) evaluation will not affect scaling much, hmm, I said 'king safety' and not 'evaluation', king safety might be related to both evaluation and search(and move ordering too, for that matter).

I have not based my observations on pure speculation, but on following extreme number of games at STC between the tops.

also, see Kai's statistics, seems to be pointing at exactly my hypothesis.

Dann Corbit · Post by **Dann Corbit** » Wed Sep 27, 2017 1:48 am

Lyudmil Tsvetkov wrote:
Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
sorry Dann, but that is all BS.

1) SMP is irrelevant, as Andreas' tests are conducted with single thread

2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.

3) evaluation will not affect scaling much, hmm, I said 'king safety' and not 'evaluation', king safety might be related to both evaluation and search(and move ordering too, for that matter).

I have not based my observations on pure speculation, but on following extreme number of games at STC between the tops.

also, see Kai's statistics, seems to be pointing at exactly my hypothesis.

>>
2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.
<<
You're a funny man.

mjlef · Post by **mjlef** » Wed Sep 27, 2017 5:27 am

Lyudmil Tsvetkov wrote:
Laskos wrote:Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:
Code: Select all
Houdini 6 x64         : 3239 2250 (+1416,=741,- 93), 79.4 %
Normalized ELO: 1.03 +/- 0.04

Stockfish 8 64        : 3208 2250 (+1294,=842,-114), 76.2 %
Normalized ELO: 0.89 +/- 0.04

Komodo 11.2 64-bit    : 3189 2250 (+1278,=781,-191), 74.2 %
Normalized ELO: 0.75 +/- 0.04
600s + 6s
Code: Select all
Houdini 6         : 3173 2700 (+1346,=1284,- 70), 73.6 %
Normalized ELO: 0.86 +/- 0.04

Stockfish 8       : 3140 2700 (+1144,=1464,- 92), 69.5 %
Normalized ELO: 0.70 +/- 0.04

Komodo 11.2       : 3150 2700 (+1265,=1292,-143), 70.8 %
Normalized ELO: 0.70 +/- 0.04
Scaling from 60s + 0.6s to 600s + 6s:
Code: Select all
Houdini 6 x64:         -0.17 +/- 0.05
Stockfish x64:         -0.19 +/- 0.05
Komodo 11.2 64-bit:    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.
so, Houdini not clearly better, even than Komodo.

the main reason Komodo is scaling better than other top engines with longer TC is the bigger extent of deficiency of Komodo king safety, which makes the engine lose many more games at shorter TCs, where it could miss attacking tactics, while at longer TC such enemy attacks are not missed.

other factors might weigh in too, like for example Komodo having more reasonable evaluation term values, or deeper knowledge that scales better at LTC, like for example imbalance evaluation, but I guess the primary reason is king safety.

of course, Komodo authors might not agree with this.

We have some evidence of the best values for King Safety varying by search depth, but when we did this a couple of years ago, if I recall correctly, smaller King Safety values worked better in faster games/less search depth. Of course King Safety keeps changing in Komodo and it is worth testing some more.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Sep 27, 2017 7:22 pm

Dann Corbit wrote:
Lyudmil Tsvetkov wrote:
Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.
sorry Dann, but that is all BS.

1) SMP is irrelevant, as Andreas' tests are conducted with single thread

2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.

3) evaluation will not affect scaling much, hmm, I said 'king safety' and not 'evaluation', king safety might be related to both evaluation and search(and move ordering too, for that matter).

I have not based my observations on pure speculation, but on following extreme number of games at STC between the tops.

also, see Kai's statistics, seems to be pointing at exactly my hypothesis.
>>
2) BF has nothing to do with elo, are you aware of that? an engine with higher BF might have higher elo.
<<
You're a funny man.

you are even funnier.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Sep 27, 2017 7:26 pm

mjlef wrote:
Lyudmil Tsvetkov wrote:
Laskos wrote:Excellent FGRL rating list of Andreas Strangmüller is updated today with Houdini 6 at two time controls:
http://www.fastgm.de

The first scaling results on single core can be obtained. To take into account the increase in draw rates due to longer time controls, one has to use Normalized ELO to derive the scaling with time control.

60s + 0.6s:
Code: Select all
Houdini 6 x64         : 3239 2250 (+1416,=741,- 93), 79.4 %
Normalized ELO: 1.03 +/- 0.04

Stockfish 8 64        : 3208 2250 (+1294,=842,-114), 76.2 %
Normalized ELO: 0.89 +/- 0.04

Komodo 11.2 64-bit    : 3189 2250 (+1278,=781,-191), 74.2 %
Normalized ELO: 0.75 +/- 0.04
600s + 6s
Code: Select all
Houdini 6         : 3173 2700 (+1346,=1284,- 70), 73.6 %
Normalized ELO: 0.86 +/- 0.04

Stockfish 8       : 3140 2700 (+1144,=1464,- 92), 69.5 %
Normalized ELO: 0.70 +/- 0.04

Komodo 11.2       : 3150 2700 (+1265,=1292,-143), 70.8 %
Normalized ELO: 0.70 +/- 0.04
Scaling from 60s + 0.6s to 600s + 6s:
Code: Select all
Houdini 6 x64:         -0.17 +/- 0.05
Stockfish x64:         -0.19 +/- 0.05
Komodo 11.2 64-bit:    -0.05 +/- 0.05
Therefore, from 60s to 600s, Houdini and Stockfish scale very similarly, but Komodo significantly better than both. The most interesting FGRL list will be 60min LTC one.
so, Houdini not clearly better, even than Komodo.

the main reason Komodo is scaling better than other top engines with longer TC is the bigger extent of deficiency of Komodo king safety, which makes the engine lose many more games at shorter TCs, where it could miss attacking tactics, while at longer TC such enemy attacks are not missed.

other factors might weigh in too, like for example Komodo having more reasonable evaluation term values, or deeper knowledge that scales better at LTC, like for example imbalance evaluation, but I guess the primary reason is king safety.

of course, Komodo authors might not agree with this.
We have some evidence of the best values for King Safety varying by search depth, but when we did this a couple of years ago, if I recall correctly, smaller King Safety values worked better in faster games/less search depth. Of course King Safety keeps changing in Komodo and it is worth testing some more.

so this should only support my hypothesis.

I guess tuning an engine is a real pain.

Uri Blass · Post by **Uri Blass** » Wed Sep 27, 2017 8:15 pm

Dann Corbit wrote:If an engine scales better, it is most likely search that is better (lower branching factor).

The second most likely thing would be the SMP implementation.

The evaluation will not affect scaling much, except for improvement in the move ordering.

I think that better search does not mean lower branching factor

It is easy to get lower branching factor by dubious pruning.

I think that evaluation is important and I expect top engines not to scale well if you change their evaluation to simple piece square table evaluation.

Scaling from FGRL results with top 3 engines

Scaling from FGRL results with top 3 engines

Re: Scaling from FGRL results with top 3 engines

Re: Scaling from FGRL results with top 3 engines

Re: Scaling from FGRL results with top 3 engines

Re: Scaling from FGRL results with top 3 engines

Re: Scaling from FGRL results with top 3 engines

Re: Scaling from FGRL results with top 3 engines

Re: Scaling from FGRL results with top 3 engines

Re: Scaling from FGRL results with top 3 engines

Re: Scaling from FGRL results with top 3 engines