Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Hugo · Post by **Hugo** » Thu Jan 04, 2018 5:53 pm

hi, wish you all a happy new year 2018

Inspired by the tests of Andreas Strangmueller
http://www.talkchess.com/forum/viewtopi ... 18&t=66180

I was starting a test of 1000 games, using the HERT test set.
Timer was 10min+10sec, which is average of 38 min per game.
Here is the result on 1 cpu each engine

And now, instead of doubling the time, I am raising the number of threads per engine to 4 cpu and will run the test with HERT set again.

regards, Clemens Keck

marcd · Post by **marcd** » Thu Jan 04, 2018 6:47 pm

can you do a live broadcast?

PCM72 · Post by **PCM72** » Thu Jan 04, 2018 8:33 pm

Perhaps it's too early in thinking of the next of the next of the next, but I think that doubling again at least a couple of times till 16 cpu (T4?) is worth a 15-days test, in order to let possible some extrapolation for correspondence players.
Happy 2018!

fauzi · Post by **fauzi** » Fri Jan 05, 2018 11:57 am

Hi Clemens,

thanks for the tests, can you please share the pgn?

Krzysztof Grzelak · Post by **Krzysztof Grzelak** » Fri Jan 05, 2018 1:55 pm

Hugo wrote:hi, wish you all a happy new year 2018

Inspired by the tests of Andreas Strangmueller
http://www.talkchess.com/forum/viewtopi ... 18&t=66180

I was starting a test of 1000 games, using the HERT test set.
Timer was 10min+10sec, which is average of 38 min per game.
Here is the result on 1 cpu each engine

And now, instead of doubling the time, I am raising the number of threads per engine to 4 cpu and will run the test with HERT set again.

regards, Clemens Keck

I see a very serious mistake in this match. You're using a debut book.

Hugo · Post by **Hugo** » Fri Jan 05, 2018 6:56 pm

Krzysztof Grzelak wrote:
I see a very serious mistake in this match. You're using a debut book.

I am ok with the HERT set
The 500 Openings are played with reversed colors. ECO spread seems to be not bad at all.

Code: Select all

Games        &#58;   1000 &#40;finished&#41;

White Wins   &#58;    174 &#40;17.4 %)
Black Wins   &#58;     68 ( 6.8 %)
Draws        &#58;    758 &#40;75.8 %)
Unfinished   &#58;      0

White Perf.  &#58; 55.3 %
Black Perf.  &#58; 44.7 %

ECO A =     79 Games ( 7.9 %)
ECO B =    283 Games &#40;28.3 %)
ECO C =    305 Games &#40;30.5 %)
ECO D =    194 Games &#40;19.4 %)
ECO E =    139 Games &#40;13.9 %)

actually after 262 games on 4cpu, Stockfish is +33Elo in the lead.

Hugo · Post by **Hugo** » Tue Jan 09, 2018 8:07 pm

standings after 820 games

Hugo · Post by **Hugo** » Thu Jan 11, 2018 5:58 am

Hi All

the second testrun (4cpu each engine) is finnished.

How to interpret the result comparing T1 and T4 ?
Looks like Komodo with parallel search could reduce the distance to Stockfish by 13 Elo.

Regards, C.K.

Code: Select all

Games        &#58;   1000 &#40;finished&#41;

White Wins   &#58;    178 &#40;17.8 %)
Black Wins   &#58;     50 ( 5.0 %)
Draws        &#58;    772 &#40;77.2 %)
Unfinished   &#58;      0

White Perf.  &#58; 56.4 %
Black Perf.  &#58; 43.6 %

ECO A =     82 Games ( 8.2 %)
ECO B =    283 Games &#40;28.3 %)
ECO C =    304 Games &#40;30.4 %)
ECO D =    192 Games &#40;19.2 %)
ECO E =    139 Games &#40;13.9 %)

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Stockfish 261217 64 POPCNT T4  &#58; 3017   10  10  1000    54.8 %   2983   77.2 %
  2 Komodo 1973.00 64-bit T4       &#58; 2983   10  10  1000    45.2 %   3017   77.2 %

PCM72 · Post by **PCM72** » Thu Jan 11, 2018 7:11 pm

Hugo wrote:How to interpret the result comparing T1 and T4 ?
Looks like Komodo with parallel search could reduce the distance to Stockfish by 13 Elo.

Such a "reduction of distance" looks just "a bit beyond the expected", since it should be about 10, interpolating your conditions from the "normal reductions" in CEGT and CCRL (it's quite normal that increasing TC or #CPU the elo-gain should be diminishing, both in engineA vs engineB and in versionA1 vs versionA2).

CEGT 40/20
Stockfish 8 vs Stockfish 7 gains. 1CPU: 92(=3332-3240). 4CPU: 76(=3415-3339) -> reduction=16
Komodo 11.2 vs Komodo 10.4 gains. 1CPU: 13(=3328-3315). 4CPU: 24(=3422-3398) -> reduction=-11
Stockfish 8 vs Komodo 11.2 gains. 1CPU: 4(=3332-3328). 4CPU: -7(=3415-3422) -> reduction=11

CEGT 40/4
Stockfish 8 vs Stockfish 7 gains. 1CPU: 82(=3331-3249). 4CPU: 49(=3418-3369) -> reduction=33
Komodo 11.2 vs Komodo 10.4 gains. 1CPU: 26(=3330-3304). 4CPU: 7(=3404*-3397) -> reduction=19
Stockfish 8 vs Komodo 11.2 gains. 1CPU: 1(=3331-3330). 4CPU: 14(=3418-3404*) -> reduction=-13
*=Komodo 11.01

CCRL 40/4
Stockfish 8 vs Stockfish 7 gains. 1CPU: 67(=3423-3356). 4CPU: 73(=3495-3422) -> reduction=-6
Komodo 11.2 vs Komodo 10.4 gains. 1CPU: 58(=3426-3368). 4CPU: 43(=3514-3471) -> reduction=15
Stockfish 8 vs Komodo 11.2 gains. 1CPU: -3(=3423-3426). 4CPU: -19(=3495-3514) -> reduction=-16

Also, looking into http://chess.ultimaiq.net/scalability.htm it suggests a ~7 (or at least a <10) elo average reduction (although the abruptly low 1.7 elo reduction from the last two "doubling TC elo gains").

All these interpolations have to be quite rough, also because a few thousand games are not enough for them, but if this "bit beyond the expected" was confirmed by other similar tests (T16?) we could have a more definite clue that Komodo would work better than Stockfish at correspondence time controls (or at least that it would "gain" better at TCEC time controls).

Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Re: Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Re: Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Re: Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Re: Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Re: Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Re: Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Re: Komodo 1973.00 64-bit - Stockfish 261217 64, match1000

Re: Komodo 1973.00 64-bit - Stockfish 261217 64, match1000