about hash tables and illogical behaviour of chess programs

Ralph Stoesser · Post by **Ralph Stoesser** » Mon Apr 26, 2010 12:12 am

mcostalba wrote: BTW could be that, under your test conditions, lower the default magins for pruning and razoring gives better results _independently_ from the fact that you do it only for refined values. IOW perhaps lowering the margins without any further consideration could give even a better result, this would mean that the good result of refined value dependent pruning is just an artifact of lowering the margins in absolute terms.

This is quite possible, if not probable. I've already considered this.
For sure I would test this assumption in case I really could get an Elo gain out of my changes.

But after 1086 games from my second test I see

+241, -233 =612

From 53,7% down to 50,4% within about 400 games.
So it's quite possible that there is nothing anyway.

What is the likelyhood of the following:
engine1 plays 1000 games against engine2.
We know both engines are exactly equal strong.
But one of the two engines always is ahead in the tournament.
How likely is it?

Houdini · Post by **Houdini** » Mon Apr 26, 2010 1:54 am

Ralph Stoesser wrote:What is the likelyhood of the following:
engine1 plays 1000 games against engine2.
We know both engines are exactly equal strong.
But one of the two engines always is ahead in the tournament.
How likely is it?

In my experience this happens quite often - by chance one engine takes a good start and the other never catches up.
To ascertain a 1% (7 Elo) improvement you probably need at least 10,000 games.

mcostalba · Post by **mcostalba** » Mon Apr 26, 2010 7:30 am

Ralph Stoesser wrote:
mcostalba wrote: BTW could be that, under your test conditions, lower the default magins for pruning and razoring gives better results _independently_ from the fact that you do it only for refined values. IOW perhaps lowering the margins without any further consideration could give even a better result, this would mean that the good result of refined value dependent pruning is just an artifact of lowering the margins in absolute terms.
This is quite possible, if not probable. I've already considered this.
For sure I would test this assumption in case I really could get an Elo gain out of my changes.

But after 1086 games from my second test I see

+241, -233 =612

From 53,7% down to 50,4% within about 400 games.
So it's quite possible that there is nothing anyway.

What is the likelyhood of the following:
engine1 plays 1000 games against engine2.
We know both engines are exactly equal strong.
But one of the two engines always is ahead in the tournament.
How likely is it?

If after 1000 games the modified engne fails to show some acceptable improvment we simply give up and move on to something else.

Admittely this can make you miss a good change (though with very small ELO increase), but from the cost/benefit point of view we prefer to eventually miss some minor change but instead try to give test coverage to a bigger number of ideas.

Testing time is a scarce resource for us and should be considered in the equation. So our normal approach is "one shot only". In exceptional cases we have repeated the test, but in this case someone else has re-done the match so to have an independent result.

Ralph Stoesser · Post by **Ralph Stoesser** » Mon Apr 26, 2010 9:24 am

mcostalba wrote:
If after 1000 games the modified engne fails to show some acceptable improvment we simply give up and move on to something else.

Admittely this can make you miss a good change (though with very small ELO increase), but from the cost/benefit point of view we prefer to eventually miss some minor change but instead try to give test coverage to a bigger number of ideas.

Testing time is a scarce resource for us and should be considered in the equation. So our normal approach is "one shot only". In exceptional cases we have repeated the test, but in this case someone else has re-done the match so to have an independent result.

Assumed only search changes are involved.
What is an acceptable improvement after 1000 games? 51%, 52% ...?
Do you use only self play for the initial test?
What time control do you use?

Earlier you wrote a more agressive pruning and razoring scheme could help at fast time control, but would hurt at longer time control. That somewhat contradicts with your assumption that my try to use a less aggressive scheme in 90% of cases (refinedValue belong from evaluate()) could lead to a better result in fast games independently from the original idea to somehow exploit the information from TT entries with tte.depth() < acutal depth.
Also I would tend to think that the razor depth and the margins your team have choosen should be pretty accurate, because you have tested these in an automatical way. Or not

Sorry, I'm very curious, maybe too curious ...

mcostalba · Post by **mcostalba** » Mon Apr 26, 2010 10:19 am

Ralph Stoesser wrote: Assumed only search changes are involved.
What is an acceptable improvement after 1000 games? 51%, 52% ...?
Do you use only self play for the initial test?
What time control do you use?

If reduces pruning even 51% is OK. But I am more used to read the full result in the form of wins / draw / lost

Yes, just self play

At least 1'+0" if your computer is fast enough so to reach at least search depth 13-15 in middle game positions. Never below 1 minute.

Ralph Stoesser wrote: Earlier you wrote a more agressive pruning and razoring scheme could help at fast time control, but would hurt at longer time control. That somewhat contradicts with your assumption that my try to use a less aggressive scheme in 90% of cases (refinedValue belong from evaluate()) could lead to a better result in fast games independently from the original idea to somehow exploit the information from TT entries with tte.depth() < acutal depth.
Also I would tend to think that the razor depth and the margins your team have choosen should be pretty accurate, because you have tested these in an automatical way. Or not

If reduced pruning gives an advantage in super fast games then this is _very_ good, perhaps I have misunderstood your patch (actually I have not seen your patch

)

We don't have an automatic tuning framework for search parameters, only for evaluation parameters, so, no, we have not tested in automatic way. Actually I think there is a big potential in pruning parameter tuning becasue we have just choosen a set that seems to perform well, but we have made very few attempts to optimize it. The main problem is that you need to test a pruning patche with different time controls and many games and so it is very timeconsuming. So normally, when we find a good setup we stay with that at least for the current release. This reduces the risk of taking a blunder.

Ralph Stoesser wrote: Sorry, I'm very curious, maybe too curious ...

Your questions are very up to the point and very practical and is a pleasure to answer to you. I have problems with handwaving argumentations or with theorical ideas. People that start with "I think that...", "You should try...", "I expect that..." typically make me nervous. I admit it is a limitation of mine, but I only understand patches and test results...nothing more...and I am glad of it

about hash tables and illogical behaviour of chess programs

Re: about hash tables and illogical behaviour of chess progr

Re: about hash tables and illogical behaviour of chess progr

Re: about hash tables and illogical behaviour of chess progr

Re: about hash tables and illogical behaviour of chess progr

Re: about hash tables and illogical behaviour of chess progr