Komodo 3 almost here - windows 64 now available

Laskos · Post by **Laskos** » Mon Aug 22, 2011 11:33 am

Don wrote:
Laskos wrote:
Don wrote:
I would not trust the "speed" to tell you anything. Komodo is slower than many of the programs that it can beat and the speed varies from version to version depending on a many factors that we tinker with such as extensions, LMR or other pruning.

In a previous post I mentioned that we have found fast time controls are too unreliable for determining what the true improvement is. We have experimented with game in 1 second + 0.01 for example. It's good for getting huge samples quickly but it's not very reliable - it shows huge improvements where there are none and regressions when there are improvements.

A program is not fully exercised until it is getting out well past 9 or 10 ply. There are even some evaluation issues that don't register at small ply - and some pruning ideas based on sub-searches (even stuff like IID and singular extensions) need a few ply to start working smoothly and consistently.

Another factor in your test - what is the move overhead set to? At those time controls you are taking a big hit - so check komodo 2.03 and komodo 3 and set them to be the same - I suggest just setting it to 1.

Don
I increased the TC by a factor of 2 (to 3s + 0.1s), setting move overhead to 1. My result on AMD is almost identical to IPON result:
Code: Select all
    Program                            Score       %     Elo    +   -    Draws

  1 Komodo64 3                     &#58; 2880.0/5570  51.7   3006    7   7   36.0 %
  2 Komodo64 2.03 JA               &#58; 2690.0/5570  48.3   2994    7   7   36.0 %
12 +/-7 Elo points improvement 95% confidence. The average depth was ~11.0. It still seems to me that fast controls could do the job.

Kai
But your fast result showed only 2 ELO gain so I don't understand why you think there is no difference. I think you just proved that testing too fast is unreliable - something we proved a long time ago to ourselves.

When I say it's unreliable I don't mean that it never correlates - I mean that you can't trust it. Sometimes it does and sometimes it doesn't. And that probably has something to do with the nature of the change we are testing.

I'll tell you what can happen if you rely on 3 second testing for example: You make 10 changes and find that you have "improved" the program by 20 ELO. Then you do a long test and find out that you have regressed! We have been there and done that! So then you end up re-testing all the changes you made in order to see which one(s) hurt you and that takes longer than if you had done it right.

So in general Larry and I DO run tests very fast as a pre-test - but we don't use those tests to determine which changes to keep - only whether it's interesting enough to continue with a longer test. When we do this we tend to use either a fixed depth level or or something in the 5 second range as we like Fischer time controls. We also bias our decisions based on the nature of the change - for evaluation improvements you can get away with much faster tests and an improvement almost always is an improvement at all depths. If it's s search related change all bets are off. Even ideas that only affect nodes near the leaf are not safe.

Although we do some self testing we primarily like to test against 3 other program that are in the same general strength range as komodo (we used to have to handicap the foreign programs but we are now at a point where we might have to handicap komodo.) When you show a 5 ELO improvement with this kind of test it is more likely to be an actual 5 ELO improvement at longer time controls. If you show 5 ELO at self testing it tends to be a smaller improvement against other programs.

The scaling property of programs varies enormously however. Critter starts out like gangbusters, so if you do a really fast test with 3 or 4 different programs Critter is going to crush them, even if they are roughly the same strength at long time controls. Try it and see what I mean.

Agreed on most, the most important being that if it's eval change then fast-testing probably would work better than for search change. My results were consistent, it was 2 +/- 12 points and 12 +/- 7 points 95% confidence, they are consistent even at 70% confidence intervals, nothing unusual to have these discrepancies. I am mostly interested in self-testing and in gauntlets with different engines, for relative strength, not absolute. I do know that Critter seems way stronger at fast controls, the direct comparison of Komodo and Critter at 1.5s + 0.05s is useless, what could help is Kom X versus Critter compared to Komodo Y versus Critter. Yes, self-testing tends to increase differences, but in fact, very often that's the goal, to see a difference of 2-3 points in less than 40,000 games.

I don't say that fast-testing is a mantra, but sometimes there is no other option.

Kai

ps That control 1.5s + .05s was too fast, pushing to the limit of windows C timer precision of ~19ms, therefore 3s + 0.1s should be more reliable. Besides that, the overhead was 20ms in the first testing, and 1ms in the second. Generally, on my comp, >100ms / move testing of eval works pretty well for the reasons you gave too.

Don · Post by **Don** » Mon Aug 22, 2011 2:39 pm

Laskos wrote: I don't say that fast-testing is a mantra, but sometimes there is no other option.

For us there is no other option - we have to test at fast time controls. Of course when I say fast I don't mean 3 seconds, I just mean much faster than most people test at.

nionita · Post by **nionita** » Mon Aug 22, 2011 10:25 pm

The message is in german, but it means something like "the program does not work anymore... searching for a solution" and then it probably tries to find something in the microsoft problem database.

Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available

Re: Komodo 3 almost here - windows 64 now available