In response to the KID thread.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: In response to the KID thread.

Post by Lyudmil Tsvetkov »

Andscacs coming in second.

congrats, Daniel!

apart from SF, we should lean on Andscacs for the KID.

but I would say, those are ratings of third-hand players of the KID.
User avatar
Master Om
Posts: 450
Joined: Wed Nov 24, 2010 10:57 am
Location: INDIA

Re: In response to the KID thread.

Post by Master Om »

Laskos wrote:
yanquis1972 wrote:thanks for the clarification & re-test kai! very simple mistake to make & the fact no one thought to bring it up given your test parameters is proof of how easy it is to overlook.
I performed extensive tests with correct settings (12600 games each), and here are the WILO ratings:

Code: Select all

                                     8moves_GM    KID_722     Performance
      # PLAYER                       : RATING     RATING       KID vs GM
      
      1 Stockfish 260517 64 BMI2     : 3135.4     3163.1         +27.7
      2 Houdini 5.01 Pro x64-popc    : 3061.0     3069.7          +8.7
      3 Komodo 11.01 64-bit          : 3057.7     3046.0         -11.7
      4 Deep Shredder 13 x64         : 2972.0     2958.9         -13.1
      5 Gull 3 x64                   : 2938.2     2936.5          -1.7
      6 Fritz 15                     : 2923.7     2895.9         -27.8
      7 Andscacs 0.91b               : 2912.0     2929.9         +17.9
Stockfish is not only by far the strongest engine in KID, it also overperforms significantly in KID compared to general GM openings. If anything, Komodo underperforms a bit in KID.
Thanks Kai, I was waiting for such tests by u. This result is in accordance to my observation. Komodo is poor in case of KID and only plays well in slav type positions. Stockfish on the others hand easily understands the h4 line attack and pawn storm to the king but has fail highs and fail lows as pointed out by Lyudmil.
The only kid that engine understand better is bayonet attack. That's why all tend towards play that.
Thanks again.
Always Expect the Unexpected
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: In response to the KID thread.

Post by Laskos »

Master Om wrote:
Laskos wrote:
yanquis1972 wrote:thanks for the clarification & re-test kai! very simple mistake to make & the fact no one thought to bring it up given your test parameters is proof of how easy it is to overlook.
I performed extensive tests with correct settings (12600 games each), and here are the WILO ratings:

Code: Select all

                                     8moves_GM    KID_722     Performance
      # PLAYER                       : RATING     RATING       KID vs GM
      
      1 Stockfish 260517 64 BMI2     : 3135.4     3163.1         +27.7
      2 Houdini 5.01 Pro x64-popc    : 3061.0     3069.7          +8.7
      3 Komodo 11.01 64-bit          : 3057.7     3046.0         -11.7
      4 Deep Shredder 13 x64         : 2972.0     2958.9         -13.1
      5 Gull 3 x64                   : 2938.2     2936.5          -1.7
      6 Fritz 15                     : 2923.7     2895.9         -27.8
      7 Andscacs 0.91b               : 2912.0     2929.9         +17.9
Stockfish is not only by far the strongest engine in KID, it also overperforms significantly in KID compared to general GM openings. If anything, Komodo underperforms a bit in KID.
Thanks Kai, I was waiting for such tests by u. This result is in accordance to my observation. Komodo is poor in case of KID and only plays well in slav type positions. Stockfish on the others hand easily understands the h4 line attack and pawn storm to the king but has fail highs and fail lows as pointed out by Lyudmil.
The only kid that engine understand better is bayonet attack. That's why all tend towards play that.
Thanks again.
Om, I also left overnight to some longer time control tests of Stockfsih and Komodo head to head with normal adjudication (not my weird ones), from general GM openings and from KID openings. Time control is 10 min + 6 sec increment. The results:

General GM openings:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 1798.151 sec)
Settings = RR/64MB/600000ms+6000ms/M 600cp for 3 moves, D 120 moves/PGN:C:\LittleBlitzer\8moves_GM.pgn(32000)
Time = 23127 sec elapsed, 0 sec remaining
 1.  Komodo 11.01 64-bit      	43.0/100	8-22-70  	(L: m=0 t=0 i=0 a=22)	(D: r=53 i=7 f=0 s=0 a=10)	(tpm=14293.9 d=27.81 nps=1435520)
 2.  Stockfish 260517 64 BMI2 	57.0/100	22-8-70  	(L: m=0 t=0 i=0 a=8)	(D: r=53 i=7 f=0 s=0 a=10)	(tpm=13683.7 d=31.29 nps=1642082)
KID openings:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 1875.278 sec)
Settings = RR/64MB/600000ms+6000ms/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\KID_ECO_E60_E99.epd(722)
Time = 24339 sec elapsed, 0 sec remaining
 1.  Komodo 11.01 64-bit      	38.0/100	6-30-64  	(L: m=0 t=0 i=0 a=30)	(D: r=45 i=5 f=4 s=1 a=9)	(tpm=14091.6 d=28.00 nps=1405241)
 2.  Stockfish 260517 64 BMI2 	62.0/100	30-6-64  	(L: m=0 t=0 i=0 a=6)	(D: r=45 i=5 f=4 s=1 a=9)	(tpm=13593.8 d=30.53 nps=1590753)
Although not very many games were played (100 each test), it is clear that Stockfish overperforms in KID.

Advantage of Stockfish dev compared to Komodo 11.01:

Code: Select all

                         General Openings             KID Openings  | Overperformance of Stockfish dev in KID
====================================================================|========================================
ELO              :            49                          85        |         +36
WILO             :           176                         280        |        +104
Normalized ELO   :         0.264                       0.436        |      +0.172 
====================================================================|========================================
It seems Komodo has hard time beating Stockfish in KID (30 to 6 score). It seems to me even longer time controls will exacerbate this.
User avatar
Master Om
Posts: 450
Joined: Wed Nov 24, 2010 10:57 am
Location: INDIA

Re: In response to the KID thread.

Post by Master Om »

Laskos wrote:
Master Om wrote:
Laskos wrote:
yanquis1972 wrote:thanks for the clarification & re-test kai! very simple mistake to make & the fact no one thought to bring it up given your test parameters is proof of how easy it is to overlook.
I performed extensive tests with correct settings (12600 games each), and here are the WILO ratings:

Code: Select all

                                     8moves_GM    KID_722     Performance
      # PLAYER                       : RATING     RATING       KID vs GM
      
      1 Stockfish 260517 64 BMI2     : 3135.4     3163.1         +27.7
      2 Houdini 5.01 Pro x64-popc    : 3061.0     3069.7          +8.7
      3 Komodo 11.01 64-bit          : 3057.7     3046.0         -11.7
      4 Deep Shredder 13 x64         : 2972.0     2958.9         -13.1
      5 Gull 3 x64                   : 2938.2     2936.5          -1.7
      6 Fritz 15                     : 2923.7     2895.9         -27.8
      7 Andscacs 0.91b               : 2912.0     2929.9         +17.9
Stockfish is not only by far the strongest engine in KID, it also overperforms significantly in KID compared to general GM openings. If anything, Komodo underperforms a bit in KID.
Thanks Kai, I was waiting for such tests by u. This result is in accordance to my observation. Komodo is poor in case of KID and only plays well in slav type positions. Stockfish on the others hand easily understands the h4 line attack and pawn storm to the king but has fail highs and fail lows as pointed out by Lyudmil.
The only kid that engine understand better is bayonet attack. That's why all tend towards play that.
Thanks again.
Om, I also left overnight to some longer time control tests of Stockfsih and Komodo head to head with normal adjudication (not my weird ones), from general GM openings and from KID openings. Time control is 10 min + 6 sec increment. The results:

General GM openings:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 1798.151 sec)
Settings = RR/64MB/600000ms+6000ms/M 600cp for 3 moves, D 120 moves/PGN:C:\LittleBlitzer\8moves_GM.pgn(32000)
Time = 23127 sec elapsed, 0 sec remaining
 1.  Komodo 11.01 64-bit      	43.0/100	8-22-70  	(L: m=0 t=0 i=0 a=22)	(D: r=53 i=7 f=0 s=0 a=10)	(tpm=14293.9 d=27.81 nps=1435520)
 2.  Stockfish 260517 64 BMI2 	57.0/100	22-8-70  	(L: m=0 t=0 i=0 a=8)	(D: r=53 i=7 f=0 s=0 a=10)	(tpm=13683.7 d=31.29 nps=1642082)
KID openings:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 1875.278 sec)
Settings = RR/64MB/600000ms+6000ms/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\KID_ECO_E60_E99.epd(722)
Time = 24339 sec elapsed, 0 sec remaining
 1.  Komodo 11.01 64-bit      	38.0/100	6-30-64  	(L: m=0 t=0 i=0 a=30)	(D: r=45 i=5 f=4 s=1 a=9)	(tpm=14091.6 d=28.00 nps=1405241)
 2.  Stockfish 260517 64 BMI2 	62.0/100	30-6-64  	(L: m=0 t=0 i=0 a=6)	(D: r=45 i=5 f=4 s=1 a=9)	(tpm=13593.8 d=30.53 nps=1590753)
Although not very many games were played (100 each test), it is clear that Stockfish overperforms in KID.

Advantage of Stockfish dev compared to Komodo 11.01:

Code: Select all

                         General Openings             KID Openings  | Overperformance of Stockfish dev in KID
====================================================================|========================================
ELO              :            49                          85        |         +36
WILO             :           176                         280        |        +104
Normalized ELO   :         0.264                       0.436        |      +0.172 
====================================================================|========================================
It seems Komodo has hard time beating Stockfish in KID (30 to 6 score). It seems to me even longer time controls will exacerbate this.

Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Always Expect the Unexpected
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: In response to the KID thread.

Post by Laskos »

Master Om wrote:

Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Om, I played a gauntlet of fast games against Stockfish dev from Noomen KID suite, all the non-default settings are performing worse:

Code: Select all

   # PLAYER                                 : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 Stockfish 260517 64 BMI2               : 3133.6    9.8    4037.0    5800    69.6     100   
 
   2 Komodo default (Contempt=0)            : 3036.8   25.1     331.0     852    38.8     100    
   3 Stockfish 2014 default                 : 2980.4   25.3     285.0     979    29.1      56    
   4 Komodo King Safety=150 (Contempt=0)    : 2977.3   23.9     297.0     970    30.6      59    
   5 Stockfish 2014 Agg=200 Cow=0           : 2972.7   24.8     307.0    1005    30.5      76    
   6 Critter 1.6                            : 2958.6   26.2     246.0     843    29.2      82    
   7 Komodo Dynamism=240 (Contempt=0)       : 2940.6   23.6     297.0    1151    25.8     ---    
Stockfish dev seems way too strong in KID.
peter
Posts: 3186
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: In response to the KID thread.

Post by peter »

Hi!
Laskos wrote:
Master Om wrote:

Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Om, I played a gauntlet of fast games against Stockfish dev from Noomen KID suite, all the non-default settings are performing worse:

Code: Select all

   # PLAYER                                 : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 Stockfish 260517 64 BMI2               : 3133.6    9.8    4037.0    5800    69.6     100   
 
   2 Komodo default (Contempt=0)            : 3036.8   25.1     331.0     852    38.8     100    
   3 Stockfish 2014 default                 : 2980.4   25.3     285.0     979    29.1      56    
   4 Komodo King Safety=150 (Contempt=0)    : 2977.3   23.9     297.0     970    30.6      59    
   5 Stockfish 2014 Agg=200 Cow=0           : 2972.7   24.8     307.0    1005    30.5      76    
   6 Critter 1.6                            : 2958.6   26.2     246.0     843    29.2      82    
   7 Komodo Dynamism=240 (Contempt=0)       : 2940.6   23.6     297.0    1151    25.8     ---    
Stockfish dev seems way too strong in KID.
Stockfish dev is way too strong in all positions near to the start position of chess which are "balanced", Kai, what here to me means "not of any clear positional advantage" of either side.

The further advanced into midgame and the less balanced the opening-positions are, which you use for test- matches, are the lesser way too strong or even the other way round, higher way to strong as for Elo out of engine- engine -matches.

And then there are especially in "KID", which is one of the most inhomogen opening- systems in theory at all (there are even as well closed games as well as semi open ones within), many of them especially good for SF, especially good for certain (but really fine- tuned) settings of SF, many especially good for komodo (which is even better tunable but even harder to fine- tune of course).

Simply forget about being able to discriminate such only very slightly different engines and their settings by Elo and engine- engine- games.
(If you mean the so called illusion (elosion :)) "Overall Playing Strength".
Of course you can measure the strictly to one single position bounded playing strength of certain engines of a certain test- pool in a certain special thematic tournament, and measure that in strictly to the one and only test -position bounded performance in Elo too, but that's not the common sense of Elo, is it?)

You won't get enough statistically significant data in any reasonable time, as short as you migth turn down TC, all you can raise is statistical noise, the shorter your TC gets and the more inhomogen your test-set and your engine- pool gets.

The only way to discriminate engines and their settings as for real position- dependent strength is pure positional testing, engine for engine, output for output with empty hash and with from Forward -Backward (of well interactively evalutated test- lines from theory) full hash, single position for single position.
SCNR.
:)
Peter.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: In response to the KID thread.

Post by Laskos »

peter wrote:Hi!
Laskos wrote:
Master Om wrote:

Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Om, I played a gauntlet of fast games against Stockfish dev from Noomen KID suite, all the non-default settings are performing worse:

Code: Select all

   # PLAYER                                 : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 Stockfish 260517 64 BMI2               : 3133.6    9.8    4037.0    5800    69.6     100   
 
   2 Komodo default (Contempt=0)            : 3036.8   25.1     331.0     852    38.8     100    
   3 Stockfish 2014 default                 : 2980.4   25.3     285.0     979    29.1      56    
   4 Komodo King Safety=150 (Contempt=0)    : 2977.3   23.9     297.0     970    30.6      59    
   5 Stockfish 2014 Agg=200 Cow=0           : 2972.7   24.8     307.0    1005    30.5      76    
   6 Critter 1.6                            : 2958.6   26.2     246.0     843    29.2      82    
   7 Komodo Dynamism=240 (Contempt=0)       : 2940.6   23.6     297.0    1151    25.8     ---    
Stockfish dev seems way too strong in KID.
Stockfish dev is way too strong in all positions near to the start position of chess which are "balanced", Kai, what here to me means "not of any clear positional advantage" of either side.

The further advanced into midgame and the less balanced the opening-positions are, which you use for test- matches, are the lesser way too strong or even the other way round, higher way to strong as for Elo out of engine- engine -matches.

And then there are especially in "KID", which is one of the most inhomogen opening- systems in theory at all (there are even as well closed games as well as semi open ones within), many of them especially good for SF, especially good for certain (but really fine- tuned) settings of SF, many especially good for komodo (which is even better tunable but even harder to fine- tune of course).

Simply forget about being able to discriminate such only very slightly different engines and their settings by Elo and engine- engine- games.
(If you mean the so called illusion (elosion :)) "Overall Playing Strength".
Of course you can measure the strictly to one single position bounded playing strength of certain engines of a certain test- pool in a certain special thematic tournament, and measure that in strictly to the one and only test -position bounded performance in Elo too, but that's not the common sense of Elo, is it?)

You won't get enough statistically significant data in any reasonable time, as short as you migth turn down TC, all you can raise is statistical noise, the shorter your TC gets and the more inhomogen your test-set and your engine- pool gets.

The only way to discriminate engines and their settings as for real position- dependent strength is pure positional testing, engine for engine, output for output with empty hash and with from Forward -Backward (of well interactively evalutated test- lines from theory) full hash, single position for single position.
SCNR.
:)
Frankly, I am pretty sick of patzers not much better than me in chess, hyper-analyzing with engines particular positions, often irrelevant positions, often emitting speculations way beyond their abilities. This forum is abundant with such silly threads. At least invite here some GMs to emit opinions. For me personally, my tests showing Stockfish significantly overperforming in KID are much more relevant than the KID thread with air venting Lyudmil as expert and some engine runs on irrelevant positions for hours. It is my taste, deal with it. That's why I posted the reply to that thread.
peter
Posts: 3186
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: In response to the KID thread.

Post by peter »

Laskos wrote:It is my taste, deal with it.
That's what I did, or why should I have written pages otherwise?

I guess, what you really meant was, you'd rather like me not to deal with your taste, but that's ok for me too
Peter.
User avatar
Master Om
Posts: 450
Joined: Wed Nov 24, 2010 10:57 am
Location: INDIA

Re: In response to the KID thread.

Post by Master Om »

Laskos wrote:
Master Om wrote:

Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Om, I played a gauntlet of fast games against Stockfish dev from Noomen KID suite, all the non-default settings are performing worse:

Code: Select all

   # PLAYER                                 : RATING  ERROR    POINTS  PLAYED     (%)   CFS(next)
   1 Stockfish 260517 64 BMI2               : 3133.6    9.8    4037.0    5800    69.6     100   
 
   2 Komodo default (Contempt=0)            : 3036.8   25.1     331.0     852    38.8     100    
   3 Stockfish 2014 default                 : 2980.4   25.3     285.0     979    29.1      56    
   4 Komodo King Safety=150 (Contempt=0)    : 2977.3   23.9     297.0     970    30.6      59    
   5 Stockfish 2014 Agg=200 Cow=0           : 2972.7   24.8     307.0    1005    30.5      76    
   6 Critter 1.6                            : 2958.6   26.2     246.0     843    29.2      82    
   7 Komodo Dynamism=240 (Contempt=0)       : 2940.6   23.6     297.0    1151    25.8     ---    
Stockfish dev seems way too strong in KID.

Thank you very much Kai.
Thanks for the test. My purpose was not to see if stockfish won. My purpose was to see where stockfish lost.
Can you please post the games of where stockfish lost
Regards
Om
Always Expect the Unexpected
Uri Blass
Posts: 10297
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: In response to the KID thread.

Post by Uri Blass »

yanquis1972 wrote:thanks for the clarification & re-test kai! very simple mistake to make & the fact no one thought to bring it up given your test parameters is proof of how easy it is to overlook.
I did not not read the exact conditions in the first post earlier.

I think that adjudication after 20 moves by engines is not a serious way to evaluate opening performance of engines regardless of contempt.

The question should be if engines play the right moves and not if they evaluate the position correctly.

The best kid engines is the engine that play better moves and evaluation of engines may be misleading even without contempt.

The correct adjudication after 20 moves is by the 32 piece tablebases and if you have not these tablebases you can try to get an estimate to the result of these tablebases simply by playing games between the best engines that we have at long time control(longer time control is going to give a better estimate but testing of course going to take more time).