Using engines close in strength or not

Discussion of chess software programming and technical issues.

Moderator: Ras

nczempin

Using engines close in strength or not

Post by nczempin »

Bob Hyatt says that my idea is wrong that I should test improvements using engines that are about the same strength or slightly stronger than my particular version of Eden.

I agree that if you use merely one engine that is close in strength, the variance can be quite high and make it difficult to determine any improvements.

However, if I test against a decent number of opponents (for the sake of discussion let's say 10) that are close in strength, this individual variance should be reduced.

Then, when testing the new version against the same engines, I can fairly quickly decide whether any possible improvement is significant or not.

The key is to look only at the total result against all the engines, and not at the automatically more varying individual results.


I think it is more important to play against more engines than to more games against each engine.

So I try to find as many engines that are close in strength as possible. I usually have to weed out those engines that are a certain level of strength too high and those engines that are too weak.

Once I have found those engines I would play as many matches as possible, but usually I find significant results after only one round.

Again, this is because the variability is lowered across the collection of engines.
Alessandro Scotti

Re: Using engines close in strength or not

Post by Alessandro Scotti »

That is exactly what I do. I try not to have engines too strong or too weak, and if my engine progress "too much" then I replace some (usually two) of the weakest engines with stronger ones. Then I rerun the last test to have a new reference for the following tests.
In general, my engine will score >= 48% and <= 57% in the test group. I currently play 800 games per test, trying to "manually" handle the cases where the result is inconclusive.
Martin Bryant has recently made a great post about his test methodology, very worth reading.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Using engines close in strength or not

Post by bob »

nczempin wrote:Bob Hyatt says that my idea is wrong that I should test improvements using engines that are about the same strength or slightly stronger than my particular version of Eden.

I agree that if you use merely one engine that is close in strength, the variance can be quite high and make it difficult to determine any improvements.

However, if I test against a decent number of opponents (for the sake of discussion let's say 10) that are close in strength, this individual variance should be reduced.

Then, when testing the new version against the same engines, I can fairly quickly decide whether any possible improvement is significant or not.

The key is to look only at the total result against all the engines, and not at the automatically more varying individual results.


I think it is more important to play against more engines than to more games against each engine.

So I try to find as many engines that are close in strength as possible. I usually have to weed out those engines that are a certain level of strength too high and those engines that are too weak.

Once I have found those engines I would play as many matches as possible, but usually I find significant results after only one round.

Again, this is because the variability is lowered across the collection of engines.
There is absolutely nothing wrong with having more engines. But you still need to add some that are significantly stronger to the mix. If you do something that makes you play better against weaker or nearly equal engines, but kills you against stronger engines, you need to know that. And yes, it can happen. Just add an aggressive king safety and blow up your weaker opponents by attacking, but get murdered against an engine that is tactically stronger.

You need to know that is happening. Or you wonder "Why do I do so well in my testing but get ripped apart in CCT/WCCC or other tournaments?"

As a human, I personally learned far more playing significantly stronger opponents, rather than playing equal or weaker opponents. There's a good reason for that and it holds true in comp vs comp games as well...
nczempin

Re: Using engines close in strength or not

Post by nczempin »

bob wrote:
nczempin wrote:Bob Hyatt says that my idea is wrong that I should test improvements using engines that are about the same strength or slightly stronger than my particular version of Eden.

I agree that if you use merely one engine that is close in strength, the variance can be quite high and make it difficult to determine any improvements.

However, if I test against a decent number of opponents (for the sake of discussion let's say 10) that are close in strength, this individual variance should be reduced.

Then, when testing the new version against the same engines, I can fairly quickly decide whether any possible improvement is significant or not.

The key is to look only at the total result against all the engines, and not at the automatically more varying individual results.


I think it is more important to play against more engines than to more games against each engine.

So I try to find as many engines that are close in strength as possible. I usually have to weed out those engines that are a certain level of strength too high and those engines that are too weak.

Once I have found those engines I would play as many matches as possible, but usually I find significant results after only one round.

Again, this is because the variability is lowered across the collection of engines.
There is absolutely nothing wrong with having more engines. But you still need to add some that are significantly stronger to the mix. If you do something that makes you play better against weaker or nearly equal engines, but kills you against stronger engines, you need to know that. And yes, it can happen. Just add an aggressive king safety and blow up your weaker opponents by attacking, but get murdered against an engine that is tactically stronger.

You need to know that is happening. Or you wonder "Why do I do so well in my testing but get ripped apart in CCT/WCCC or other tournaments?"

As a human, I personally learned far more playing significantly stronger opponents, rather than playing equal or weaker opponents. There's a good reason for that and it holds true in comp vs comp games as well...
I agree that it is useful to play against stronger opponents. But I think those stronger opponents that I do include already fulfil the purpose you are describing. But also from the human side: If I play a Grandmaster, he will take me apart, and it may well be that I won't have a clue exactly where I went wrong. Even if he then tells me the details, it will not help much. It is like I see so many times, established club players trying to teach strategy to beginners, and they run around with all that knowledge in their head about KPK endgames, but lose their games because they leave the queen en prise. So it is best even for humans to play opponents who are significantly stronger, but not too strong. And given that the only explanation the stronger engine will give you is the better move (and not a joint post-game analysis that you could have with a human), I think that the opposing engines shouldn't be too advanced.

There is one thing wrong with using more engines (just as there is with more games): It takes more time. And I am trying to find the "sweet spot" of sufficient number of games and sufficient number of opponents, yet find significant results in a feasible amount of time.

Your example regarding King aggressiveness is very fitting, because it is exactly what coaches (to be more specific, the coaches of the German national youth teams) advocate: Before you have reached a certain level of tactical maturity, you should focus on tactics; on tactical openings, on tactical play. Only starting from about 2000 will the tactical maturity be sufficient that one should spend more time on strategy beyond the very basics.

It is exactly what I'm trying to do with my engine, and my engine's tactical maturity is still very low.

Why are they advocating this?

Because if you play tactical (open) games, your mistakes will come to light much more obviously and much more quickly. That makes them easier to analyse and to stop making them. Subtle mistakes in positional play can reap what they sowed much much later, perhaps in an even subtler endgame.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Using engines close in strength or not

Post by bob »

nczempin wrote:
bob wrote:
nczempin wrote:Bob Hyatt says that my idea is wrong that I should test improvements using engines that are about the same strength or slightly stronger than my particular version of Eden.

I agree that if you use merely one engine that is close in strength, the variance can be quite high and make it difficult to determine any improvements.

However, if I test against a decent number of opponents (for the sake of discussion let's say 10) that are close in strength, this individual variance should be reduced.

Then, when testing the new version against the same engines, I can fairly quickly decide whether any possible improvement is significant or not.

The key is to look only at the total result against all the engines, and not at the automatically more varying individual results.


I think it is more important to play against more engines than to more games against each engine.

So I try to find as many engines that are close in strength as possible. I usually have to weed out those engines that are a certain level of strength too high and those engines that are too weak.

Once I have found those engines I would play as many matches as possible, but usually I find significant results after only one round.

Again, this is because the variability is lowered across the collection of engines.
There is absolutely nothing wrong with having more engines. But you still need to add some that are significantly stronger to the mix. If you do something that makes you play better against weaker or nearly equal engines, but kills you against stronger engines, you need to know that. And yes, it can happen. Just add an aggressive king safety and blow up your weaker opponents by attacking, but get murdered against an engine that is tactically stronger.

You need to know that is happening. Or you wonder "Why do I do so well in my testing but get ripped apart in CCT/WCCC or other tournaments?"

As a human, I personally learned far more playing significantly stronger opponents, rather than playing equal or weaker opponents. There's a good reason for that and it holds true in comp vs comp games as well...
I agree that it is useful to play against stronger opponents. But I think those stronger opponents that I do include already fulfil the purpose you are describing. But also from the human side: If I play a Grandmaster, he will take me apart, and it may well be that I won't have a clue exactly where I went wrong. Even if he then tells me the details, it will not help much. It is like I see so many times, established club players trying to teach strategy to beginners, and they run around with all that knowledge in their head about KPK endgames, but lose their games because they leave the queen en prise. So it is best even for humans to play opponents who are significantly stronger, but not too strong. And given that the only explanation the stronger engine will give you is the better move (and not a joint post-game analysis that you could have with a human), I think that the opposing engines shouldn't be too advanced.

There is one thing wrong with using more engines (just as there is with more games): It takes more time. And I am trying to find the "sweet spot" of sufficient number of games and sufficient number of opponents, yet find significant results in a feasible amount of time.

Your example regarding King aggressiveness is very fitting, because it is exactly what coaches (to be more specific, the coaches of the German national youth teams) advocate: Before you have reached a certain level of tactical maturity, you should focus on tactics; on tactical openings, on tactical play. Only starting from about 2000 will the tactical maturity be sufficient that one should spend more time on strategy beyond the very basics.

It is exactly what I'm trying to do with my engine, and my engine's tactical maturity is still very low.

Why are they advocating this?

Because if you play tactical (open) games, your mistakes will come to light much more obviously and much more quickly. That makes them easier to analyse and to stop making them. Subtle mistakes in positional play can reap what they sowed much much later, perhaps in an even subtler endgame.
I don't agree. I learned to play chess and before I was 1600 USCF I was reading and studying "My System" which is all about strategy and not tactics. And I learned a lot about various ideas from center control to pawn structure, even though I was not anywhere near 2000.

I learned tactics through playing because your mistakes show up quickly. I learned strategy first by studying and then by playing...

I think both go hand in hand...