In response to the KID thread.
Moderators: hgm, Rebel, chrisw
-
- Posts: 6052
- Joined: Tue Jun 12, 2012 12:41 pm
Re: In response to the KID thread.
Andscacs coming in second.
congrats, Daniel!
apart from SF, we should lean on Andscacs for the KID.
but I would say, those are ratings of third-hand players of the KID.
congrats, Daniel!
apart from SF, we should lean on Andscacs for the KID.
but I would say, those are ratings of third-hand players of the KID.
-
- Posts: 450
- Joined: Wed Nov 24, 2010 10:57 am
- Location: INDIA
Re: In response to the KID thread.
Thanks Kai, I was waiting for such tests by u. This result is in accordance to my observation. Komodo is poor in case of KID and only plays well in slav type positions. Stockfish on the others hand easily understands the h4 line attack and pawn storm to the king but has fail highs and fail lows as pointed out by Lyudmil.Laskos wrote:I performed extensive tests with correct settings (12600 games each), and here are the WILO ratings:yanquis1972 wrote:thanks for the clarification & re-test kai! very simple mistake to make & the fact no one thought to bring it up given your test parameters is proof of how easy it is to overlook.Stockfish is not only by far the strongest engine in KID, it also overperforms significantly in KID compared to general GM openings. If anything, Komodo underperforms a bit in KID.Code: Select all
8moves_GM KID_722 Performance # PLAYER : RATING RATING KID vs GM 1 Stockfish 260517 64 BMI2 : 3135.4 3163.1 +27.7 2 Houdini 5.01 Pro x64-popc : 3061.0 3069.7 +8.7 3 Komodo 11.01 64-bit : 3057.7 3046.0 -11.7 4 Deep Shredder 13 x64 : 2972.0 2958.9 -13.1 5 Gull 3 x64 : 2938.2 2936.5 -1.7 6 Fritz 15 : 2923.7 2895.9 -27.8 7 Andscacs 0.91b : 2912.0 2929.9 +17.9
The only kid that engine understand better is bayonet attack. That's why all tend towards play that.
Thanks again.
Always Expect the Unexpected
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: In response to the KID thread.
Om, I also left overnight to some longer time control tests of Stockfsih and Komodo head to head with normal adjudication (not my weird ones), from general GM openings and from KID openings. Time control is 10 min + 6 sec increment. The results:Master Om wrote:Thanks Kai, I was waiting for such tests by u. This result is in accordance to my observation. Komodo is poor in case of KID and only plays well in slav type positions. Stockfish on the others hand easily understands the h4 line attack and pawn storm to the king but has fail highs and fail lows as pointed out by Lyudmil.Laskos wrote:I performed extensive tests with correct settings (12600 games each), and here are the WILO ratings:yanquis1972 wrote:thanks for the clarification & re-test kai! very simple mistake to make & the fact no one thought to bring it up given your test parameters is proof of how easy it is to overlook.Stockfish is not only by far the strongest engine in KID, it also overperforms significantly in KID compared to general GM openings. If anything, Komodo underperforms a bit in KID.Code: Select all
8moves_GM KID_722 Performance # PLAYER : RATING RATING KID vs GM 1 Stockfish 260517 64 BMI2 : 3135.4 3163.1 +27.7 2 Houdini 5.01 Pro x64-popc : 3061.0 3069.7 +8.7 3 Komodo 11.01 64-bit : 3057.7 3046.0 -11.7 4 Deep Shredder 13 x64 : 2972.0 2958.9 -13.1 5 Gull 3 x64 : 2938.2 2936.5 -1.7 6 Fritz 15 : 2923.7 2895.9 -27.8 7 Andscacs 0.91b : 2912.0 2929.9 +17.9
The only kid that engine understand better is bayonet attack. That's why all tend towards play that.
Thanks again.
General GM openings:
Code: Select all
Games Completed = 100 of 100 (Avg game length = 1798.151 sec)
Settings = RR/64MB/600000ms+6000ms/M 600cp for 3 moves, D 120 moves/PGN:C:\LittleBlitzer\8moves_GM.pgn(32000)
Time = 23127 sec elapsed, 0 sec remaining
1. Komodo 11.01 64-bit 43.0/100 8-22-70 (L: m=0 t=0 i=0 a=22) (D: r=53 i=7 f=0 s=0 a=10) (tpm=14293.9 d=27.81 nps=1435520)
2. Stockfish 260517 64 BMI2 57.0/100 22-8-70 (L: m=0 t=0 i=0 a=8) (D: r=53 i=7 f=0 s=0 a=10) (tpm=13683.7 d=31.29 nps=1642082)
Code: Select all
Games Completed = 100 of 100 (Avg game length = 1875.278 sec)
Settings = RR/64MB/600000ms+6000ms/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\KID_ECO_E60_E99.epd(722)
Time = 24339 sec elapsed, 0 sec remaining
1. Komodo 11.01 64-bit 38.0/100 6-30-64 (L: m=0 t=0 i=0 a=30) (D: r=45 i=5 f=4 s=1 a=9) (tpm=14091.6 d=28.00 nps=1405241)
2. Stockfish 260517 64 BMI2 62.0/100 30-6-64 (L: m=0 t=0 i=0 a=6) (D: r=45 i=5 f=4 s=1 a=9) (tpm=13593.8 d=30.53 nps=1590753)
Advantage of Stockfish dev compared to Komodo 11.01:
Code: Select all
General Openings KID Openings | Overperformance of Stockfish dev in KID
====================================================================|========================================
ELO : 49 85 | +36
WILO : 176 280 | +104
Normalized ELO : 0.264 0.436 | +0.172
====================================================================|========================================
-
- Posts: 450
- Joined: Wed Nov 24, 2010 10:57 am
- Location: INDIA
Re: In response to the KID thread.
Laskos wrote:Om, I also left overnight to some longer time control tests of Stockfsih and Komodo head to head with normal adjudication (not my weird ones), from general GM openings and from KID openings. Time control is 10 min + 6 sec increment. The results:Master Om wrote:Thanks Kai, I was waiting for such tests by u. This result is in accordance to my observation. Komodo is poor in case of KID and only plays well in slav type positions. Stockfish on the others hand easily understands the h4 line attack and pawn storm to the king but has fail highs and fail lows as pointed out by Lyudmil.Laskos wrote:I performed extensive tests with correct settings (12600 games each), and here are the WILO ratings:yanquis1972 wrote:thanks for the clarification & re-test kai! very simple mistake to make & the fact no one thought to bring it up given your test parameters is proof of how easy it is to overlook.Stockfish is not only by far the strongest engine in KID, it also overperforms significantly in KID compared to general GM openings. If anything, Komodo underperforms a bit in KID.Code: Select all
8moves_GM KID_722 Performance # PLAYER : RATING RATING KID vs GM 1 Stockfish 260517 64 BMI2 : 3135.4 3163.1 +27.7 2 Houdini 5.01 Pro x64-popc : 3061.0 3069.7 +8.7 3 Komodo 11.01 64-bit : 3057.7 3046.0 -11.7 4 Deep Shredder 13 x64 : 2972.0 2958.9 -13.1 5 Gull 3 x64 : 2938.2 2936.5 -1.7 6 Fritz 15 : 2923.7 2895.9 -27.8 7 Andscacs 0.91b : 2912.0 2929.9 +17.9
The only kid that engine understand better is bayonet attack. That's why all tend towards play that.
Thanks again.
General GM openings:KID openings:Code: Select all
Games Completed = 100 of 100 (Avg game length = 1798.151 sec) Settings = RR/64MB/600000ms+6000ms/M 600cp for 3 moves, D 120 moves/PGN:C:\LittleBlitzer\8moves_GM.pgn(32000) Time = 23127 sec elapsed, 0 sec remaining 1. Komodo 11.01 64-bit 43.0/100 8-22-70 (L: m=0 t=0 i=0 a=22) (D: r=53 i=7 f=0 s=0 a=10) (tpm=14293.9 d=27.81 nps=1435520) 2. Stockfish 260517 64 BMI2 57.0/100 22-8-70 (L: m=0 t=0 i=0 a=8) (D: r=53 i=7 f=0 s=0 a=10) (tpm=13683.7 d=31.29 nps=1642082)
Although not very many games were played (100 each test), it is clear that Stockfish overperforms in KID.Code: Select all
Games Completed = 100 of 100 (Avg game length = 1875.278 sec) Settings = RR/64MB/600000ms+6000ms/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\KID_ECO_E60_E99.epd(722) Time = 24339 sec elapsed, 0 sec remaining 1. Komodo 11.01 64-bit 38.0/100 6-30-64 (L: m=0 t=0 i=0 a=30) (D: r=45 i=5 f=4 s=1 a=9) (tpm=14091.6 d=28.00 nps=1405241) 2. Stockfish 260517 64 BMI2 62.0/100 30-6-64 (L: m=0 t=0 i=0 a=6) (D: r=45 i=5 f=4 s=1 a=9) (tpm=13593.8 d=30.53 nps=1590753)
Advantage of Stockfish dev compared to Komodo 11.01:It seems Komodo has hard time beating Stockfish in KID (30 to 6 score). It seems to me even longer time controls will exacerbate this.Code: Select all
General Openings KID Openings | Overperformance of Stockfish dev in KID ====================================================================|======================================== ELO : 49 85 | +36 WILO : 176 280 | +104 Normalized ELO : 0.264 0.436 | +0.172 ====================================================================|========================================
Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Always Expect the Unexpected
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: In response to the KID thread.
Om, I played a gauntlet of fast games against Stockfish dev from Noomen KID suite, all the non-default settings are performing worse:Master Om wrote:
Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(next)
1 Stockfish 260517 64 BMI2 : 3133.6 9.8 4037.0 5800 69.6 100
2 Komodo default (Contempt=0) : 3036.8 25.1 331.0 852 38.8 100
3 Stockfish 2014 default : 2980.4 25.3 285.0 979 29.1 56
4 Komodo King Safety=150 (Contempt=0) : 2977.3 23.9 297.0 970 30.6 59
5 Stockfish 2014 Agg=200 Cow=0 : 2972.7 24.8 307.0 1005 30.5 76
6 Critter 1.6 : 2958.6 26.2 246.0 843 29.2 82
7 Komodo Dynamism=240 (Contempt=0) : 2940.6 23.6 297.0 1151 25.8 ---
-
- Posts: 3186
- Joined: Sat Feb 16, 2008 7:38 am
- Full name: Peter Martan
Re: In response to the KID thread.
Hi!
The further advanced into midgame and the less balanced the opening-positions are, which you use for test- matches, are the lesser way too strong or even the other way round, higher way to strong as for Elo out of engine- engine -matches.
And then there are especially in "KID", which is one of the most inhomogen opening- systems in theory at all (there are even as well closed games as well as semi open ones within), many of them especially good for SF, especially good for certain (but really fine- tuned) settings of SF, many especially good for komodo (which is even better tunable but even harder to fine- tune of course).
Simply forget about being able to discriminate such only very slightly different engines and their settings by Elo and engine- engine- games.
(If you mean the so called illusion (elosion ) "Overall Playing Strength".
Of course you can measure the strictly to one single position bounded playing strength of certain engines of a certain test- pool in a certain special thematic tournament, and measure that in strictly to the one and only test -position bounded performance in Elo too, but that's not the common sense of Elo, is it?)
You won't get enough statistically significant data in any reasonable time, as short as you migth turn down TC, all you can raise is statistical noise, the shorter your TC gets and the more inhomogen your test-set and your engine- pool gets.
The only way to discriminate engines and their settings as for real position- dependent strength is pure positional testing, engine for engine, output for output with empty hash and with from Forward -Backward (of well interactively evalutated test- lines from theory) full hash, single position for single position.
SCNR.
Stockfish dev is way too strong in all positions near to the start position of chess which are "balanced", Kai, what here to me means "not of any clear positional advantage" of either side.Laskos wrote:Om, I played a gauntlet of fast games against Stockfish dev from Noomen KID suite, all the non-default settings are performing worse:Master Om wrote:
Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Stockfish dev seems way too strong in KID.Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(next) 1 Stockfish 260517 64 BMI2 : 3133.6 9.8 4037.0 5800 69.6 100 2 Komodo default (Contempt=0) : 3036.8 25.1 331.0 852 38.8 100 3 Stockfish 2014 default : 2980.4 25.3 285.0 979 29.1 56 4 Komodo King Safety=150 (Contempt=0) : 2977.3 23.9 297.0 970 30.6 59 5 Stockfish 2014 Agg=200 Cow=0 : 2972.7 24.8 307.0 1005 30.5 76 6 Critter 1.6 : 2958.6 26.2 246.0 843 29.2 82 7 Komodo Dynamism=240 (Contempt=0) : 2940.6 23.6 297.0 1151 25.8 ---
The further advanced into midgame and the less balanced the opening-positions are, which you use for test- matches, are the lesser way too strong or even the other way round, higher way to strong as for Elo out of engine- engine -matches.
And then there are especially in "KID", which is one of the most inhomogen opening- systems in theory at all (there are even as well closed games as well as semi open ones within), many of them especially good for SF, especially good for certain (but really fine- tuned) settings of SF, many especially good for komodo (which is even better tunable but even harder to fine- tune of course).
Simply forget about being able to discriminate such only very slightly different engines and their settings by Elo and engine- engine- games.
(If you mean the so called illusion (elosion ) "Overall Playing Strength".
Of course you can measure the strictly to one single position bounded playing strength of certain engines of a certain test- pool in a certain special thematic tournament, and measure that in strictly to the one and only test -position bounded performance in Elo too, but that's not the common sense of Elo, is it?)
You won't get enough statistically significant data in any reasonable time, as short as you migth turn down TC, all you can raise is statistical noise, the shorter your TC gets and the more inhomogen your test-set and your engine- pool gets.
The only way to discriminate engines and their settings as for real position- dependent strength is pure positional testing, engine for engine, output for output with empty hash and with from Forward -Backward (of well interactively evalutated test- lines from theory) full hash, single position for single position.
SCNR.
Peter.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: In response to the KID thread.
Frankly, I am pretty sick of patzers not much better than me in chess, hyper-analyzing with engines particular positions, often irrelevant positions, often emitting speculations way beyond their abilities. This forum is abundant with such silly threads. At least invite here some GMs to emit opinions. For me personally, my tests showing Stockfish significantly overperforming in KID are much more relevant than the KID thread with air venting Lyudmil as expert and some engine runs on irrelevant positions for hours. It is my taste, deal with it. That's why I posted the reply to that thread.peter wrote:Hi!
Stockfish dev is way too strong in all positions near to the start position of chess which are "balanced", Kai, what here to me means "not of any clear positional advantage" of either side.Laskos wrote:Om, I played a gauntlet of fast games against Stockfish dev from Noomen KID suite, all the non-default settings are performing worse:Master Om wrote:
Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Stockfish dev seems way too strong in KID.Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(next) 1 Stockfish 260517 64 BMI2 : 3133.6 9.8 4037.0 5800 69.6 100 2 Komodo default (Contempt=0) : 3036.8 25.1 331.0 852 38.8 100 3 Stockfish 2014 default : 2980.4 25.3 285.0 979 29.1 56 4 Komodo King Safety=150 (Contempt=0) : 2977.3 23.9 297.0 970 30.6 59 5 Stockfish 2014 Agg=200 Cow=0 : 2972.7 24.8 307.0 1005 30.5 76 6 Critter 1.6 : 2958.6 26.2 246.0 843 29.2 82 7 Komodo Dynamism=240 (Contempt=0) : 2940.6 23.6 297.0 1151 25.8 ---
The further advanced into midgame and the less balanced the opening-positions are, which you use for test- matches, are the lesser way too strong or even the other way round, higher way to strong as for Elo out of engine- engine -matches.
And then there are especially in "KID", which is one of the most inhomogen opening- systems in theory at all (there are even as well closed games as well as semi open ones within), many of them especially good for SF, especially good for certain (but really fine- tuned) settings of SF, many especially good for komodo (which is even better tunable but even harder to fine- tune of course).
Simply forget about being able to discriminate such only very slightly different engines and their settings by Elo and engine- engine- games.
(If you mean the so called illusion (elosion ) "Overall Playing Strength".
Of course you can measure the strictly to one single position bounded playing strength of certain engines of a certain test- pool in a certain special thematic tournament, and measure that in strictly to the one and only test -position bounded performance in Elo too, but that's not the common sense of Elo, is it?)
You won't get enough statistically significant data in any reasonable time, as short as you migth turn down TC, all you can raise is statistical noise, the shorter your TC gets and the more inhomogen your test-set and your engine- pool gets.
The only way to discriminate engines and their settings as for real position- dependent strength is pure positional testing, engine for engine, output for output with empty hash and with from Forward -Backward (of well interactively evalutated test- lines from theory) full hash, single position for single position.
SCNR.
-
- Posts: 3186
- Joined: Sat Feb 16, 2008 7:38 am
- Full name: Peter Martan
Re: In response to the KID thread.
That's what I did, or why should I have written pages otherwise?Laskos wrote:It is my taste, deal with it.
I guess, what you really meant was, you'd rather like me not to deal with your taste, but that's ok for me too
Peter.
-
- Posts: 450
- Joined: Wed Nov 24, 2010 10:57 am
- Location: INDIA
Re: In response to the KID thread.
Laskos wrote:Om, I played a gauntlet of fast games against Stockfish dev from Noomen KID suite, all the non-default settings are performing worse:Master Om wrote:
Thanks Kai, for the tests.
Can you test few things if u have time ?
Add Critter 1.6 with king safety evaluation to 100.
Use komodo with increased dynamism. Just make it double than the default value.
Try komodo with increased value for king safety parameter.
Try stockfish 7 with aggressiveness 200 and cowardice 0.
Please try only closed positions like in classical KID e5, Makogonov, samish e5....
U can use Jeroen Noomen KID test suite though.
Regards
Om
Stockfish dev seems way too strong in KID.Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%) CFS(next) 1 Stockfish 260517 64 BMI2 : 3133.6 9.8 4037.0 5800 69.6 100 2 Komodo default (Contempt=0) : 3036.8 25.1 331.0 852 38.8 100 3 Stockfish 2014 default : 2980.4 25.3 285.0 979 29.1 56 4 Komodo King Safety=150 (Contempt=0) : 2977.3 23.9 297.0 970 30.6 59 5 Stockfish 2014 Agg=200 Cow=0 : 2972.7 24.8 307.0 1005 30.5 76 6 Critter 1.6 : 2958.6 26.2 246.0 843 29.2 82 7 Komodo Dynamism=240 (Contempt=0) : 2940.6 23.6 297.0 1151 25.8 ---
Thank you very much Kai.
Thanks for the test. My purpose was not to see if stockfish won. My purpose was to see where stockfish lost.
Can you please post the games of where stockfish lost
Regards
Om
Always Expect the Unexpected
-
- Posts: 10297
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: In response to the KID thread.
I did not not read the exact conditions in the first post earlier.yanquis1972 wrote:thanks for the clarification & re-test kai! very simple mistake to make & the fact no one thought to bring it up given your test parameters is proof of how easy it is to overlook.
I think that adjudication after 20 moves by engines is not a serious way to evaluate opening performance of engines regardless of contempt.
The question should be if engines play the right moves and not if they evaluate the position correctly.
The best kid engines is the engine that play better moves and evaluation of engines may be misleading even without contempt.
The correct adjudication after 20 moves is by the 32 piece tablebases and if you have not these tablebases you can try to get an estimate to the result of these tablebases simply by playing games between the best engines that we have at long time control(longer time control is going to give a better estimate but testing of course going to take more time).