Komodo 3 almost here - windows 64 now available

Don · Post by **Don** » Sat Aug 20, 2011 3:27 pm

Laskos wrote:
Don wrote:
I'm looking on the clemens site and the Ipon site. It's showing 44 ELO on the Clemens 5 inute + 3 second test after about 900 games.

On the IPON list it's showing 21 ELO after 900 games. So it appears that 20 ELO is going to be approximately correct.

Note that these are played on different platforms which can affect things some and of course the error margins are very high.
I am using an AMD without SSE42. Here Kom 3 is ~2.4% slower than Kom 2.03 JA, both 64 bit Win 7. Time control is 1.5s + 0.05s, maybe useless, but often useful to me. Here is the self-testing between two Komodo brothers
Code: Select all
    Program                            Score      %      Elo    +   -    Draws

  1 Komodo64 3                     &#58; 1103.5/2200  50.2   3001   12  12   32.6 %
  2 Komodo64 2.03 JA               &#58; 1096.5/2200  49.8   2999   12  12   32.6 %
It seems, in my testing, the version 3 is not likely to be more than 14 Elo points stronger than 2.03, but don't trust my testing at these time controls, it's just for fun .

Kai

I would not trust the "speed" to tell you anything. Komodo is slower than many of the programs that it can beat and the speed varies from version to version depending on a many factors that we tinker with such as extensions, LMR or other pruning.

In a previous post I mentioned that we have found fast time controls are too unreliable for determining what the true improvement is. We have experimented with game in 1 second + 0.01 for example. It's good for getting huge samples quickly but it's not very reliable - it shows huge improvements where there are none and regressions when there are improvements.

A program is not fully exercised until it is getting out well past 9 or 10 ply. There are even some evaluation issues that don't register at small ply - and some pruning ideas based on sub-searches (even stuff like IID and singular extensions) need a few ply to start working smoothly and consistently.

Another factor in your test - what is the move overhead set to? At those time controls you are taking a big hit - so check komodo 2.03 and komodo 3 and set them to be the same - I suggest just setting it to 1.

Don

IWB · Post by **IWB** » Sat Aug 20, 2011 3:44 pm

Hello Eelco,

Eelco de Groot wrote: Direct link: http://www.inwoba.de/match.html

I hope Ingo does not mind that I crosspost from his site the current status of the Komodo 3.0 gauntlet:

I dont mind as long as the source is mentioned. So thanks for making some advertisement ... if I would get money for page hits ... I should ad some google adds

Btw: If the interest remains that big this day might become a new best day in hits for the IPON.

Keep on clicking ... !

Bye
Ingo

Laskos · Post by **Laskos** » Sat Aug 20, 2011 4:54 pm

Don wrote:
Laskos wrote:
Don wrote:
I'm looking on the clemens site and the Ipon site. It's showing 44 ELO on the Clemens 5 inute + 3 second test after about 900 games.

On the IPON list it's showing 21 ELO after 900 games. So it appears that 20 ELO is going to be approximately correct.

Note that these are played on different platforms which can affect things some and of course the error margins are very high.
I am using an AMD without SSE42. Here Kom 3 is ~2.4% slower than Kom 2.03 JA, both 64 bit Win 7. Time control is 1.5s + 0.05s, maybe useless, but often useful to me. Here is the self-testing between two Komodo brothers
Code: Select all
    Program                            Score      %      Elo    +   -    Draws

  1 Komodo64 3                     &#58; 1103.5/2200  50.2   3001   12  12   32.6 %
  2 Komodo64 2.03 JA               &#58; 1096.5/2200  49.8   2999   12  12   32.6 %
It seems, in my testing, the version 3 is not likely to be more than 14 Elo points stronger than 2.03, but don't trust my testing at these time controls, it's just for fun .

Kai
I would not trust the "speed" to tell you anything. Komodo is slower than many of the programs that it can beat and the speed varies from version to version depending on a many factors that we tinker with such as extensions, LMR or other pruning.

In a previous post I mentioned that we have found fast time controls are too unreliable for determining what the true improvement is. We have experimented with game in 1 second + 0.01 for example. It's good for getting huge samples quickly but it's not very reliable - it shows huge improvements where there are none and regressions when there are improvements.

A program is not fully exercised until it is getting out well past 9 or 10 ply. There are even some evaluation issues that don't register at small ply - and some pruning ideas based on sub-searches (even stuff like IID and singular extensions) need a few ply to start working smoothly and consistently.

Another factor in your test - what is the move overhead set to? At those time controls you are taking a big hit - so check komodo 2.03 and komodo 3 and set them to be the same - I suggest just setting it to 1.

Don

Move overhead was set 20ms each. I will set it to 1 just to see what happens (maybe some losses on time, thus far I had none). I agree with what you said, I gave you the relative speed just to check if there is something wrong with my running (maybe you knew for sure that K3 should be faster than K2.03, for example). The average depth was about 9.0, maybe I will rise the TC by a factor of 2, that would give at least an additional ply at this control.

Kai

Eelco de Groot · Post by **Eelco de Groot** » Sat Aug 20, 2011 8:59 pm

Thanks for doing the tests Ingo. I found it a bit shocking to see how a program like Crafty 23.3 which undergoes a lot of testing itself like Komodo, can still get almost totally obliterated. This by a program with a positional style and Crafty always in the top range of pure nodes per second speed. It can't just be a lack of searchdepth or something. I know this is probably just a case of bad luck; when the expected score is not so high, just a few of those games that Crafty should have won but can't may make the percentage looking even worse. Komodo at one point was scoring a TPR of over 3100 against Crafty

Komodo64 3 SSE42 - Crafty 23.3 JA (2599) 57.5 - 3.5 94.26% Perf=3085

Eelco

nionita · Post by **nionita** » Sat Aug 20, 2011 10:27 pm

On my pc doe not run, neither the sse nor the normal version. Win 7, amd phenom II x6

Laskos · Post by **Laskos** » Sun Aug 21, 2011 1:19 pm

Don wrote:
I would not trust the "speed" to tell you anything. Komodo is slower than many of the programs that it can beat and the speed varies from version to version depending on a many factors that we tinker with such as extensions, LMR or other pruning.

In a previous post I mentioned that we have found fast time controls are too unreliable for determining what the true improvement is. We have experimented with game in 1 second + 0.01 for example. It's good for getting huge samples quickly but it's not very reliable - it shows huge improvements where there are none and regressions when there are improvements.

A program is not fully exercised until it is getting out well past 9 or 10 ply. There are even some evaluation issues that don't register at small ply - and some pruning ideas based on sub-searches (even stuff like IID and singular extensions) need a few ply to start working smoothly and consistently.

Another factor in your test - what is the move overhead set to? At those time controls you are taking a big hit - so check komodo 2.03 and komodo 3 and set them to be the same - I suggest just setting it to 1.

Don

I increased the TC by a factor of 2 (to 3s + 0.1s), setting move overhead to 1. My result on AMD is almost identical to IPON result:

Code: Select all

    Program                            Score       %     Elo    +   -    Draws

  1 Komodo64 3                     &#58; 2880.0/5570  51.7   3006    7   7   36.0 %
  2 Komodo64 2.03 JA               &#58; 2690.0/5570  48.3   2994    7   7   36.0 %

12 +/-7 Elo points improvement 95% confidence. The average depth was ~11.0. It still seems to me that fast controls could do the job.

Kai

Don · Post by **Don** » Sun Aug 21, 2011 3:55 pm

Laskos wrote:
Don wrote:
I would not trust the "speed" to tell you anything. Komodo is slower than many of the programs that it can beat and the speed varies from version to version depending on a many factors that we tinker with such as extensions, LMR or other pruning.

In a previous post I mentioned that we have found fast time controls are too unreliable for determining what the true improvement is. We have experimented with game in 1 second + 0.01 for example. It's good for getting huge samples quickly but it's not very reliable - it shows huge improvements where there are none and regressions when there are improvements.

A program is not fully exercised until it is getting out well past 9 or 10 ply. There are even some evaluation issues that don't register at small ply - and some pruning ideas based on sub-searches (even stuff like IID and singular extensions) need a few ply to start working smoothly and consistently.

Another factor in your test - what is the move overhead set to? At those time controls you are taking a big hit - so check komodo 2.03 and komodo 3 and set them to be the same - I suggest just setting it to 1.

Don
I increased the TC by a factor of 2 (to 3s + 0.1s), setting move overhead to 1. My result on AMD is almost identical to IPON result:
Code: Select all
    Program                            Score       %     Elo    +   -    Draws

  1 Komodo64 3                     &#58; 2880.0/5570  51.7   3006    7   7   36.0 %
  2 Komodo64 2.03 JA               &#58; 2690.0/5570  48.3   2994    7   7   36.0 %
12 +/-7 Elo points improvement 95% confidence. The average depth was ~11.0. It still seems to me that fast controls could do the job.

Kai

But your fast result showed only 2 ELO gain so I don't understand why you think there is no difference. I think you just proved that testing too fast is unreliable - something we proved a long time ago to ourselves.

When I say it's unreliable I don't mean that it never correlates - I mean that you can't trust it. Sometimes it does and sometimes it doesn't. And that probably has something to do with the nature of the change we are testing.

I'll tell you what can happen if you rely on 3 second testing for example: You make 10 changes and find that you have "improved" the program by 20 ELO. Then you do a long test and find out that you have regressed! We have been there and done that! So then you end up re-testing all the changes you made in order to see which one(s) hurt you and that takes longer than if you had done it right.

So in general Larry and I DO run tests very fast as a pre-test - but we don't use those tests to determine which changes to keep - only whether it's interesting enough to continue with a longer test. When we do this we tend to use either a fixed depth level or or something in the 5 second range as we like Fischer time controls. We also bias our decisions based on the nature of the change - for evaluation improvements you can get away with much faster tests and an improvement almost always is an improvement at all depths. If it's s search related change all bets are off. Even ideas that only affect nodes near the leaf are not safe.

Although we do some self testing we primarily like to test against 3 other program that are in the same general strength range as komodo (we used to have to handicap the foreign programs but we are now at a point where we might have to handicap komodo.) When you show a 5 ELO improvement with this kind of test it is more likely to be an actual 5 ELO improvement at longer time controls. If you show 5 ELO at self testing it tends to be a smaller improvement against other programs.

The scaling property of programs varies enormously however. Critter starts out like gangbusters, so if you do a really fast test with 3 or 4 different programs Critter is going to crush them, even if they are roughly the same strength at long time controls. Try it and see what I mean.

PawnStormZ · Post by **PawnStormZ** » Sun Aug 21, 2011 7:14 pm

Thanks Don and Larry for Komodo 3!

It runs fine on my core I7 Win 7 64. I ran a 100 game match (50 positions played with colors reversed) between Komodo 3 and Houdini 1.5. The time control was game in 1 minute plus 1/10 second per move. Since each was using only 1 core I decided to use ponder on; that should give the strongest performance, correct?

The results were:

Engine Pts W D L
Houdini 1.5 59.0 + 39 = 40 - 21
Komodo 3 41.0 + 21 = 40 - 39

Games are here... http://www.datafilehost.com/download-1619a8b6.html

Eelco de Groot · Post by **Eelco de Groot** » Sun Aug 21, 2011 11:19 pm

nionita wrote:On my pc doe not run, neither the sse nor the normal version. Win 7, amd phenom II x6

Don't have that kind of computer, but if you say it does not run, does it mean for instance there is no response if you try to run it from the commandline? Is there some kind of error message that Windows 7 comes up with if it thinks that it is not a regular Windows application, or does the excution start, but then just stops? So far this seems the only case reported here of the regular version of Komodo (not the i7 version) not working?

Eelco

JManion · Post by **JManion** » Mon Aug 22, 2011 1:32 am

I ran 400 game test match. Each engine used only 1 core.

Houdini 1.5a x 64 +159/=136/-105
Komodo3 x 64 +105/=136/-159

Very solid result for Komodo, cant wait to see what will happen when it goes to mp.

Excellent job Don, and Larry.

Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available