Ali Baba and the 40 positions

nczempin · Post by **nczempin** » Mon Sep 17, 2007 11:05 am

If you were to use 40 positions and defining a match between two engines ( regardless of level, whether very high or potato-like) to encompass one game with each color:

My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't.

Shouldn't it be possible to find this out, playing enough games with a wide enough selection of engines, to be able to find such correlations? And if there are such correlations, it would be feasible to remove one of the positions, yet still get a result that is very similar to the previous result, yet reducing the necessary effort?

It would be ideal if the positions are as independent as possible, say one highly tactical position and one that involves the finer points of knight maneuvering and/or rook endgames.

Has this kind of analysis been done (mathematically, not intuitively like I assume it has been) for the Nunn positions or that set of 40 positions that Bob uses for his tests?

In addition, using those 40 positions' results equally weighted will likely result in differences from the underlying proposition that is to be proven.
For example, if (another extreme example, for illustration purposes only; I have not looked at the actual positions) one of the positions were a pawn ending, and such pawn endings occur less frequently in actual games than 1/40, the result of evaluating that position will be over-represented.

One could also somehow (theoretically, I have not examined how this would be possible in practice) find the contribution of each position to the underlying set of all games, and again either take the position out of the test suite if the contribution is insignificant, or at least reduce its weight in the analysis?[/i]

Pradu · Post by **Pradu** » Mon Sep 17, 2007 11:17 am

nczempin wrote:If you were to use 40 positions and defining a match between two engines ( regardless of level, whether very high or potato-like) to encompass one game with each color:

My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't.

Shouldn't it be possible to find this out, playing enough games with a wide enough selection of engines, to be able to find such correlations? And if there are such correlations, it would be feasible to remove one of the positions, yet still get a result that is very similar to the previous result, yet reducing the necessary effort?

It would be ideal if the positions are as independent as possible, say one highly tactical position and one that involves the finer points of knight maneuvering and/or rook endgames.

Has this kind of analysis been done (mathematically, not intuitively like I assume it has been) for the Nunn positions or that set of 40 positions that Bob uses for his tests?

How would you mathematically test if two different positions are even and that they test "orthogonal" parameters? Would you create it in such a way that all parameters, say evaluation terms, are the same except for the parameters you are changing? What if two parameters are interdependent, say the value of a bishop and bishop mobility.

In addition, using those 40 positions' results equally weighted will likely result in differences from the underlying proposition that is to be proven.
For example, if (another extreme example, for illustration purposes only; I have not looked at the actual positions) one of the positions were a pawn ending, and such pawn endings occur less frequently in actual games than 1/40, the result of evaluating that position will be over-represented.

One could also somehow (theoretically, I have not examined how this would be possible in practice) find the contribution of each position to the underlying set of all games, and again either take the position out of the test suite if the contribution is insignificant, or at least reduce its weight in the analysis?[/i]

nczempin · Post by **nczempin** » Mon Sep 17, 2007 11:33 am

Pradu wrote:
nczempin wrote:If you were to use 40 positions and defining a match between two engines ( regardless of level, whether very high or potato-like) to encompass one game with each color:

My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't.

Shouldn't it be possible to find this out, playing enough games with a wide enough selection of engines, to be able to find such correlations? And if there are such correlations, it would be feasible to remove one of the positions, yet still get a result that is very similar to the previous result, yet reducing the necessary effort?

It would be ideal if the positions are as independent as possible, say one highly tactical position and one that involves the finer points of knight maneuvering and/or rook endgames.

Has this kind of analysis been done (mathematically, not intuitively like I assume it has been) for the Nunn positions or that set of 40 positions that Bob uses for his tests?
How would you mathematically test if two different positions are even and that they test "orthogonal" parameters? Would you create it in such a way that all parameters, say evaluation terms, are the same except for the parameters you are changing? What if two parameters are interdependent, say the value of a bishop and bishop mobility.

No, I would not try to test anything that is intrinsic to the position.

I would run games out of this position between two engines, and run games out of the other position, and see if I find a significant correlation.

bob · Post by **bob** » Mon Sep 17, 2007 7:44 pm

nczempin wrote:If you were to use 40 positions and defining a match between two engines ( regardless of level, whether very high or potato-like) to encompass one game with each color:

My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't.

John Nunn (grandmaster) first addressed this many years ago. His idea was to take a set of popular opening systems, and pick a key position from each one, where things were fairly equal even if not balanced (one side is attacking, perhaps). The positions were all different enough that there was little chance that a game starting at one position would transpose to an existing position in the set.

Albert Silver did something similar a few years back and posted the positions here. They are not positions that differ only by one or two pieces/pawns. They come from different opening systems, with different themes, different plans/goals, etc. The idea being that if you can play them all well, you have a well-rounded program.

In reality, programs have problems with some of the positions with a specific color. The Sicilian is an example, but there are others. You might well find that for some positions, you lose from both sides. Which just means you have a basic problem in your evaluation or search when dealing with the goal/plan for that specific opening. But the set is very good for exposing you to attacking/defending, endgame play, middlegame play, positional play, etc. Too few positions and you miss key aspects of the game. Too many and the match becomes intractable due to the number of games required.

Best way to learn from them is to play the matches, and then look at the positions where you lose every game and fix that first. That will probably help results in other positions as well. But once you don't lose every game for any single position, at least now you know you have to major holes in your eval or search that will kill you in tournaments.

Alternatively, you can use the same positions as I do, to play matches to determine if a recent changed helped or hurt overall. And not look at the specific positions and games played from them. A better result is good. And quick to do. Analyzing a game to try to fix a positional hole is much more time-consuming. Both need to be done from time to time.

Shouldn't it be possible to find this out, playing enough games with a wide enough selection of engines, to be able to find such correlations? And if there are such correlations, it would be feasible to remove one of the positions, yet still get a result that is very similar to the previous result, yet reducing the necessary effort?

It would be ideal if the positions are as independent as possible, say one highly tactical position and one that involves the finer points of knight maneuvering and/or rook endgames.

Has this kind of analysis been done (mathematically, not intuitively like I assume it has been) for the Nunn positions or that set of 40 positions that Bob uses for his tests?

In addition, using those 40 positions' results equally weighted will likely result in differences from the underlying proposition that is to be proven.
For example, if (another extreme example, for illustration purposes only; I have not looked at the actual positions) one of the positions were a pawn ending, and such pawn endings occur less frequently in actual games than 1/40, the result of evaluating that position will be over-represented.

One could also somehow (theoretically, I have not examined how this would be possible in practice) find the contribution of each position to the underlying set of all games, and again either take the position out of the test suite if the contribution is insignificant, or at least reduce its weight in the analysis?[/i]

nczempin · Post by **nczempin** » Mon Sep 17, 2007 7:47 pm

bob wrote:They are not positions that differ only by one or two pieces/pawns.

Bob, please, are you serious? Did I claim that they did? How many disclaimers do I have to make when I take an extreme example to illustrate a point so I can keep you from latching onto that example and using it as a straw man argument?

nczempin · Post by **nczempin** » Mon Sep 17, 2007 7:48 pm

bob wrote:
nczempin wrote:If you were to use 40 positions and defining a match between two engines ( regardless of level, whether very high or potato-like) to encompass one game with each color:

My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't.

John Nunn (grandmaster) first addressed this many years ago. His idea was to take a set of popular opening systems, and pick a key position from each one, where things were fairly equal even if not balanced (one side is attacking, perhaps). The positions were all different enough that there was little chance that a game starting at one position would transpose to an existing position in the set.

Albert Silver did something similar a few years back and posted the positions here. They are not positions that differ only by one or two pieces/pawns. They come from different opening systems, with different themes, different plans/goals, etc. The idea being that if you can play them all well, you have a well-rounded program.

In reality, programs have problems with some of the positions with a specific color. The Sicilian is an example, but there are others. You might well find that for some positions, you lose from both sides. Which just means you have a basic problem in your evaluation or search when dealing with the goal/plan for that specific opening. But the set is very good for exposing you to attacking/defending, endgame play, middlegame play, positional play, etc. Too few positions and you miss key aspects of the game. Too many and the match becomes intractable due to the number of games required.

Best way to learn from them is to play the matches, and then look at the positions where you lose every game and fix that first. That will probably help results in other positions as well. But once you don't lose every game for any single position, at least now you know you have to major holes in your eval or search that will kill you in tournaments.

Alternatively, you can use the same positions as I do, to play matches to determine if a recent changed helped or hurt overall. And not look at the specific positions and games played from them. A better result is good. And quick to do. Analyzing a game to try to fix a positional hole is much more time-consuming. Both need to be done from time to time.

Shouldn't it be possible to find this out, playing enough games with a wide enough selection of engines, to be able to find such correlations? And if there are such correlations, it would be feasible to remove one of the positions, yet still get a result that is very similar to the previous result, yet reducing the necessary effort?

It would be ideal if the positions are as independent as possible, say one highly tactical position and one that involves the finer points of knight maneuvering and/or rook endgames.

Has this kind of analysis been done (mathematically, not intuitively like I assume it has been) for the Nunn positions or that set of 40 positions that Bob uses for his tests?

In addition, using those 40 positions' results equally weighted will likely result in differences from the underlying proposition that is to be proven.
For example, if (another extreme example, for illustration purposes only; I have not looked at the actual positions) one of the positions were a pawn ending, and such pawn endings occur less frequently in actual games than 1/40, the result of evaluating that position will be over-represented.

One could also somehow (theoretically, I have not examined how this would be possible in practice) find the contribution of each position to the underlying set of all games, and again either take the position out of the test suite if the contribution is insignificant, or at least reduce its weight in the analysis?[/i]

How about you just answer my question?

This style of mentioning names and wandering off into the distance of the actual question reminds me of Philosophy majors that argue with names rather than facts, and I find it difficult to believe that a CS professor would use the same principles.

Perhaps my question(s) have been lost in my background explanations, I can try and extract them so they stand there purely by themselves.

nczempin · Post by **nczempin** » Mon Sep 17, 2007 7:55 pm

nczempin wrote: Perhaps my question(s) have been lost in my background explanations, I can try and extract them so they stand there purely by themselves.

Here are my questions. For expanations please look at the very first post of this thread:

For those 40 positions you use in your test (or any other set of such positions), has anyone done a correlation analysis?

For those same 40 positions, has anyone analysed the relevance of each position to the general set of games in computer (or human, or any kind) chess?

Oh, and one more question (because I assume the answer would be "no" for both of them, and I don't want to be accused of asking rhetorical questions): Do you agree with me that such an analysis would be a worthwhile undertaking?

bob · Post by **bob** » Mon Sep 17, 2007 11:41 pm

nczempin wrote:
bob wrote:They are not positions that differ only by one or two pieces/pawns.
Bob, please, are you serious? Did I claim that they did? How many disclaimers do I have to make when I take an extreme example to illustrate a point so I can keep you from latching onto that example and using it as a straw man argument?

Here is some blunt advice. Grow up. I responded to this:

"My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't. "

I didn't say you claimed anything. I don't require disclaimers. I simply pointed out that the positions Albert posted don't suffer from this similarity problem you seemed to be concerned about. Why so defensive and argumentative?

I pointed out that the test positions I use come from a wide variety of openings, so that you will get tactics, attacks, defenses, endgames, middlegames, repetitions, you-name-it.

Don't take everything as an insult or attack. That's kid stuff.

bob · Post by **bob** » Mon Sep 17, 2007 11:47 pm

nczempin wrote:
nczempin wrote: Perhaps my question(s) have been lost in my background explanations, I can try and extract them so they stand there purely by themselves.
Here are my questions. For expanations please look at the very first post of this thread:

For those 40 positions you use in your test (or any other set of such positions), has anyone done a correlation analysis?

What are you going to analyze? Results? what would that mean. I have looked at each and every position carefully, I have analyzed them with multiple programs, I have played thousands of games starting with each one. But what does "correlation analysis" mean here since I don't see anything that fits. So to continue, do you analyze results by position (there must be correlation there since there are only 3 possible outcomes and 80 possible games to play (ignoring randomness). Do you analyze the positions to see how different they are by some sort of hamming distance measure? Material left? Pawn structure (which is hard to quantify). Etc? So no, no one has done that that I am aware of because I don't see how it makes sense unless you explain something completely different than what I am thinking of.

For those same 40 positions, has anyone analysed the relevance of each position to the general set of games in computer (or human, or any kind) chess?

Yes. Nunn chose "representative positions" based on opening system popularity. Silver did the same thing. Positions that are very common in each type of opening one might choose.

Oh, and one more question (because I assume the answer would be "no" for both of them, and I don't want to be accused of asking rhetorical questions): Do you agree with me that such an analysis would be a worthwhile undertaking?

For the first case, I don't even understand what it means in this context. For the second, it's already been done. You could easily google "nunn test" and discover what he did, then look at CCC archives to see the discussion about the Silver positions when they were released. They are not just picked out of thin air on a whim...

BTW this would be 10 times easier for everyone to follow if you just answer _once_. And not write three answers one minute apart.

bob · Post by **bob** » Tue Sep 18, 2007 5:02 am

nczempin wrote:If you were to use 40 positions and defining a match between two engines ( regardless of level, whether very high or potato-like) to encompass one game with each color:

My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't.

Shouldn't it be possible to find this out, playing enough games with a wide enough selection of engines, to be able to find such correlations? And if there are such correlations, it would be feasible to remove one of the positions, yet still get a result that is very similar to the previous result, yet reducing the necessary effort?

It would be ideal if the positions are as independent as possible, say one highly tactical position and one that involves the finer points of knight maneuvering and/or rook endgames.

Has this kind of analysis been done (mathematically, not intuitively like I assume it has been) for the Nunn positions or that set of 40 positions that Bob uses for his tests?

In addition, using those 40 positions' results equally weighted will likely result in differences from the underlying proposition that is to be proven.
For example, if (another extreme example, for illustration purposes only; I have not looked at the actual positions) one of the positions were a pawn ending, and such pawn endings occur less frequently in actual games than 1/40, the result of evaluating that position will be over-represented.

One could also somehow (theoretically, I have not examined how this would be possible in practice) find the contribution of each position to the underlying set of all games, and again either take the position out of the test suite if the contribution is insignificant, or at least reduce its weight in the analysis?[/i]

I missed one key point in your post. There are _no_ endgame positions in this set. All are early to late opening positions. The middlegame is still to be played before reaching endgames. So they cover the gamut of chess knowledge and tactics.

Ali Baba and the 40 positions

Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions