Re: Komodo 2.01 is out!
Posted: Sat Jun 11, 2011 4:04 pm
Do you have a good control of extrachess losses (on time, illegal moves, etc.) ? I am using LittleBlitzer which has an excellent control on the reasons of losses. Besides that, in ultrashort games, do you see how much time engines are using generally? For example, I cannot go much lower than 100ms increment (with 1000ms basis), the time used by engines becomes erratic, even if they are not losing on time.Don wrote:We believe it is over 100 ELO but it does not always come out the way we thinkLaskos wrote:Wow, seems a serious improvement. After 400 games 1s + 0.1s
112 +/- 29 Elo points (95% confidence) improvement in self-play, probably a little less in a gauntlet, but the new Reptilian seems the level of SF 2.01. Will leave it for more games, then a gauntlet.Code: Select all
Program Score % Elo + - Draws 1 Komodo64 2.01 64 bit : 262.5/400 65.6 3256 29 29 31.2 % 2 Komodo64 1.3 JA : 137.5/400 34.4 3144 29 29 31.2 %
Kai
Komodo 1.3 was supposed to be about 50 ELO in our private testing but when it came out it was much less.
I think the Stockfish team also expected a lot more with version 2.1. It could be because we are all forced to test pretty fast to get statistical confidence in any changes.
We now test more against foreign opponents than we used to (although we have always done some of that) and we have gradually increased out time control to give us a better picture of how we will do at "real" time controls.
Because Komodo is now so much stronger we can no longer handicap the programs we play against which also means it takes a lot longer to test. But I think increasing the time and using foreign opponents more has been a benefit and gives us a better picture of the actual ELO gain despite the fact that increases our resource burden.
Some things we observed: Critter is exceptionally strong at time controls less than 1 minute per game and so is Robbolito. Relative to these programs Stockfish does not look so good until the time control increases, then it looks better. Critter appears to be the least scalable of the programs we test against, but it's difficult to say that for sure, we are extrapolating from out own tests and results we see from rating lists. We believe our program scales very well and we also think Stockfish is exceptional in this area.
We would like to test against Houdini too and it would be the last program we could reasonably handicap to advantage. However we develop and test on Linux because it is has much better behavior for massive testing, but there is no Linux version of Houdini. We can run the 32 bit version using wine but that throws away a lot of the benefit (although the 32 bit version is still stronger than Komodo and Stockfish, at least for now.) But we get crashes and time losses for Houdini.
We have observed that as the programs get stronger, fast time control tests are becoming less reliable predictors of strength. The correlation is still quite high most of the time, but not always. There are some things that clearly test well at game in 3 seconds but test poorly at game in 1 minute and it never used to be that way for us.
Don
Anyway, here are my last results at 1s + 0.1s (average game length ~15 sec)
Self-play finished
Code: Select all
Program Score % Av.Op. Elo + - Draws
1 Komodo64 2.01 64 bit : 657.5/1000 65.8 3143 3257 18 18 34.1 %
2 Komodo64 1.3 JA : 342.5/1000 34.2 3257 3143 18 18 34.1 %
+114 +/- 18 Elo points (95% confidence) improvement.
Gauntlets in progress
Code: Select all
Program Score % Av.Op. Elo + - Draws
Komodo64 2.01 64 bit : 1213.5/3100 39.1 3200 3123 10 10 28.7 %
1 Houdini 1.5a x64 : 463.0/620 74.7 3123 3309 27 26 22.3 %
2 Deep Rybka 4.1 x64 : 397.5/620 64.1 3123 3222 23 23 32.1 %
3 Ivanhoe B47cBx64-1 : 382.0/620 61.6 3123 3204 23 23 30.6 %
4 Stockfish 2.1 JA 64bit : 342.5/620 55.2 3123 3158 23 23 28.2 %
5 Critter 1.01 64-bit : 301.5/620 48.6 3123 3112 23 23 30.2 %
Code: Select all
Program Score % Av.Op. Elo + - Draws
Komodo64 1.3 JA : 377.0/1400 26.9 3200 3026 17 17 26.9 %
1 Houdini 1.5a x64 : 228.0/280 81.4 3026 3281 44 43 19.3 %
2 Deep Rybka 4.1 x64 : 213.0/280 76.1 3026 3225 38 38 27.1 %
3 Ivanhoe B47cBx64-1 : 211.0/280 75.4 3026 3219 38 37 27.9 %
4 Stockfish 2.1 JA 64bit : 191.5/280 68.4 3026 3158 35 34 33.2 %
5 Critter 1.01 64-bit : 179.5/280 64.1 3026 3125 36 36 26.8 %
What I observed: the new Reptile is 15%-20% slower than 1.3 one, but goes a little deeper. It's by far the slowest of all engines tested (including Rybka times 14). I will put it on Sim03 test, if I find my old data file, as it seems to have acquired a lot in both eval and search
Thanks and congratulations for this huge improvement.
Kai