Ali Baba and the 40 positions

nczempin · Post by **nczempin** » Tue Sep 18, 2007 10:34 am

bob wrote:
nczempin wrote:
bob wrote:They are not positions that differ only by one or two pieces/pawns.
Bob, please, are you serious? Did I claim that they did? How many disclaimers do I have to make when I take an extreme example to illustrate a point so I can keep you from latching onto that example and using it as a straw man argument?
Here is some blunt advice. Grow up. I responded to this:

It seems to be you who is getting defensive, first attacking hgm personally, now me. I will not return the favour.

"My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't. "

I didn't say you claimed anything. I don't require disclaimers. I simply pointed out that the positions Albert posted don't suffer from this similarity problem you seemed to be concerned about. Why so defensive and argumentative?

Well, why did I say "For an extreme illustration of the principle"? To make sure that you don't take that example literally, but to illustrate a point that there are still possibly dependencies between positions, albeit at a much subtler level (that is not as obvious as the one I mentioned, in fact so non-obvious that it would only be revealed by statistical analysis).

Yet despite my deliberate and obvious attempt at explaining that it is only an extreme example, you latch onto that very example and dismiss it. You could have just given me the benefit of the doubt, and that I'm not trying to attack you, but I am seriously interested in finding out if a statistical analysis would find that, say, a Sicilian Dragon position and a King's Indian position somehow have a correlation, because those engines good/bad in one will be good/bad in the other.

And I know who John Nunn is, and I know that he came up with these positions. I hadn't heard of Silver before, but I assume he also took good care in selecting his positions.

That doesn't preclude that the effectivity of these positions for engine-engine tests can be further increased.

I pointed out that the test positions I use come from a wide variety of openings, so that you will get tactics, attacks, defenses, endgames, middlegames, repetitions, you-name-it.

I know all that. Perhaps you should acknowledge that I am not merely a kid asking stupid questions. I know they have all been selected for a number of principles. I knew all that before I posted my questions. Perhaps you can take that into account and answer my question under the assumption that I can be taken seriously.

Don't take everything as an insult or attack. That's kid stuff.

Well, I guess I, at 37, am a kid to you. Still no need to tell me to grow up.

I did not take your statement as an attack, I was just disappointed that you would do exactly what I wanted you not to do, namely latch onto that one example. I don't know how else I could have done it. And, yes, saying "Bob, please, are you serious?" is slightly more emotional than most of the rest of the stuff I write.

Regarding the writing within minutes, well, I try to take each point individually; you sometimes do that, too. And since we are in different time zones, the effect that you find lots of posts by me on the next day is the same for me the other way round, even if you take longer breaks between your posts.

nczempin · Post by **nczempin** » Tue Sep 18, 2007 12:50 pm

bob wrote: I missed one key point in your post. There are _no_ endgame positions in this set. All are early to late opening positions. The middlegame is still to be played before reaching endgames. So they cover the gamut of chess knowledge and tactics.

Except for the gamut of endgames, which, if you wanted to test them more thoroughly, would need to be included specifically, rather than hoping that they will occur by chance.

Essentially, this is the logic you are using against just using the starting position.

bob · Post by **bob** » Tue Sep 18, 2007 7:28 pm

nczempin wrote:
bob wrote:
nczempin wrote:
bob wrote:They are not positions that differ only by one or two pieces/pawns.
Bob, please, are you serious? Did I claim that they did? How many disclaimers do I have to make when I take an extreme example to illustrate a point so I can keep you from latching onto that example and using it as a straw man argument?
No strawman argument in my post. You made a statement about a possible problem. I pointed out that these positions are not "closely related". Can they be after many moves? Certainly. I have seen d4 and e4 openings transpose to the same position if the players choose to do so. But they were intentionally chosen as "equal" positions from a large cross-section of standard openings, some before castling, some after castling, etc.

So, you posted a hypothesis, I simply pointed out it was not applicable in this case. Ditto for the endgame comment. There are _no_ endgame positions in the test set. The positions are randomly chosen positions with a reasonable cross-section of opening systems being covered, deep enough into the game so that the positions are not too similar, but not too deep into the game so that the program's opening skills are unimportant. "broad" is the word that best comes to mind.

Not just chosen at random without any thought. But chosen at random points in real openings to give a reasonable distribution of themes to work on.

Here is some blunt advice. Grow up. I responded to this:

It seems to be you who is getting defensive, first attacking hgm personally, now me. I will not return the favour.

you do not believe that suggesting that I made the data up is "personal"? I don't see how you could get much more personal than that.

"My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't. "

I didn't say you claimed anything. I don't require disclaimers. I simply pointed out that the positions Albert posted don't suffer from this similarity problem you seemed to be concerned about. Why so defensive and argumentative?

Well, why did I say "For an extreme illustration of the principle"? To make sure that you don't take that example literally, but to illustrate a point that there are still possibly dependencies between positions, albeit at a much subtler level (that is not as obvious as the one I mentioned, in fact so non-obvious that it would only be revealed by statistical analysis).

Look... either your statement was made with respect to the test at hand, or it had no business being made at all. Do we have to deal with just random noise tossed out from time to time in the middle of a bunch of factual discussions? So since you made the statement, I addressed it. The "for an extreme example" was irrelevant. Whether it was "extreme" or "typical" didn't matter. It didn't apply and I explained why.

Yet despite my deliberate and obvious attempt at explaining that it is only an extreme example, you latch onto that very example and dismiss it. You could have just given me the benefit of the doubt, and that I'm not trying to attack you, but I am seriously interested in finding out if a statistical analysis would find that, say, a Sicilian Dragon position and a King's Indian position somehow have a correlation, because those engines good/bad in one will be good/bad in the other.

And I know who John Nunn is, and I know that he came up with these positions. I hadn't heard of Silver before, but I assume he also took good care in selecting his positions.

Albert Silver. A long-time poster here. And I assume exactly the same about him because there was a long discussion about the positions back when he started collecting them.

That doesn't preclude that the effectivity of these positions for engine-engine tests can be further increased.

Of course it doesn't. however, these are what I am using, and what others have used, and I spend more than enough time already working on testing, without taking a month off to find yet another set of positions that would probably be no better than Silver's. Nunn's were just fine, I just preferred to have more.

I pointed out that the test positions I use come from a wide variety of openings, so that you will get tactics, attacks, defenses, endgames, middlegames, repetitions, you-name-it.

I know all that. Perhaps you should acknowledge that I am not merely a kid asking stupid questions. I know they have all been selected for a number of principles. I knew all that before I posted my questions. Perhaps you can take that into account and answer my question under the assumption that I can be taken seriously.

Read your own posts. You don't _seem_ to understand the above. You talked about endgame positions with very few pieces. Which doesn't apply. You hypothesized about positions differing only in the placement of a knight. Which doesn't apply. So exactly how do I figure out what you do know, when you turn the discussion that far afield???

Don't take everything as an insult or attack. That's kid stuff.
Well, I guess I, at 37, am a kid to you. Still no need to tell me to grow up.

I did not take your statement as an attack, I was just disappointed that you would do exactly what I wanted you not to do, namely latch onto that one example. I don't know how else I could have done it. And, yes, saying "Bob, please, are you serious?" is slightly more emotional than most of the rest of the stuff I write.

Regarding the writing within minutes, well, I try to take each point individually; you sometimes do that, too. And since we are in different time zones, the effect that you find lots of posts by me on the next day is the same for me the other way round, even if you take longer breaks between your posts.

Just edit your post. Multiple responses make it difficult for everyone to follow when the posts all line up on the right-hand side..

Albert Silver · Post by **Albert Silver** » Tue Sep 18, 2007 8:30 pm

nczempin wrote:And I know who John Nunn is, and I know that he came up with these positions. I hadn't heard of Silver before, but I assume he also took good care in selecting his positions.

Yes, I took exhaustive care. I tested many engines, and played thousands of games, making sure they didn't all play the same first moves. Some positions were removed, changed, or added as a result of this testing. I wasn't happy with Nunn's set for 3 reasons:

1) some positions led to the same opening moves 90% of them time, so what was the point?

2) There wasn't enough variety in the types of positions, whether in openings or strategic themes.

3) Several were very boring to watch play out.

Here are the comments I posted when I released it:

The selection is designed to provide typical opening situations, testing the range of openings as well as types of positions (closed, open, hedgehog, stonewall, isolani, etc.). The positions were chosen and tested to allow several possible moves, so there shouldn't be any single move to find, nor should all (or almost all) engines choose the same continuation.

I pointed out that the test positions I use come from a wide variety of openings, so that you will get tactics, attacks, defenses, endgames, middlegames, repetitions, you-name-it.

Then our purpose was different. I didn't choose any position for tactics (including repetitions). Tactics will always appear, especially in computer chess, without any special effort on our part. No endgames since it is much harder to find good ones to play out. They would have to have more than one good playable first move (otherwise it is a find-the-best-move suite, and not a play-from-here suite).

Another unspoken aspect of this suite IMO is one can use it to help build an opening book for an engine. I don't mean to use the suite as a basis per se, but to test how an engine handles various openings, as well as strategic themes.

You can find the PGN text at the Winboard forum or as a download here.

You can also see the positions (diagrams) from the full suite, with the main theoretic moves, as well as the rough theoretic evaluation, and the main alternatives at the Winboard forum

As a sidenote, now speaking as a moderator, a little less bickering with other members would be appreciated. Thanks.

Albert

bob · Post by **bob** » Wed Sep 19, 2007 4:41 am

nczempin wrote:
Pradu wrote:
nczempin wrote:If you were to use 40 positions and defining a match between two engines ( regardless of level, whether very high or potato-like) to encompass one game with each color:

My guess is that there will be many positions that will show very similar results. For an extreme illustration of the principle, just assume that two of the positions differ only in that in one Nf3 and Nf6 have been played, and in the other it hasn't.

Shouldn't it be possible to find this out, playing enough games with a wide enough selection of engines, to be able to find such correlations? And if there are such correlations, it would be feasible to remove one of the positions, yet still get a result that is very similar to the previous result, yet reducing the necessary effort?

It would be ideal if the positions are as independent as possible, say one highly tactical position and one that involves the finer points of knight maneuvering and/or rook endgames.

Has this kind of analysis been done (mathematically, not intuitively like I assume it has been) for the Nunn positions or that set of 40 positions that Bob uses for his tests?
How would you mathematically test if two different positions are even and that they test "orthogonal" parameters? Would you create it in such a way that all parameters, say evaluation terms, are the same except for the parameters you are changing? What if two parameters are interdependent, say the value of a bishop and bishop mobility.
No, I would not try to test anything that is intrinsic to the position.

I would run games out of this position between two engines, and run games out of the other position, and see if I find a significant correlation.

Correlation in what? The results? What would that show? That program A loses to program B in two different positions, and that if A loses in position 1, then it also loses in position 2? I don't see what that would mean. Suppose A loses _all_ games to B. That would look correlated when it is not at all, A is just much worse than B and can't win a game under any circumstances.

bob · Post by **bob** » Wed Sep 19, 2007 4:46 am

Albert Silver wrote:
nczempin wrote:And I know who John Nunn is, and I know that he came up with these positions. I hadn't heard of Silver before, but I assume he also took good care in selecting his positions.
Yes, I took exhaustive care. I tested many engines, and played thousands of games, making sure they didn't all play the same first moves. Some positions were removed, changed, or added as a result of this testing. I wasn't happy with Nunn's set for 3 reasons:

1) some positions led to the same opening moves 90% of them time, so what was the point?

2) There wasn't enough variety in the types of positions, whether in openings or strategic themes.

3) Several were very boring to watch play out.

Here are the comments I posted when I released it:

The selection is designed to provide typical opening situations, testing the range of openings as well as types of positions (closed, open, hedgehog, stonewall, isolani, etc.). The positions were chosen and tested to allow several possible moves, so there shouldn't be any single move to find, nor should all (or almost all) engines choose the same continuation.

I pointed out that the test positions I use come from a wide variety of openings, so that you will get tactics, attacks, defenses, endgames, middlegames, repetitions, you-name-it.
Then our purpose was different. I didn't choose any position for tactics (including repetitions). Tactics will always appear, especially in computer chess, without any special effort on our part. No endgames since it is much harder to find good ones to play out. They would have to have more than one good playable first move (otherwise it is a find-the-best-move suite, and not a play-from-here suite).

I hope I didn't imply that you tried to supply tactical positions explicitly, although you certainly do implicitly since some openings are tactical by their very nature. My point was that your positions are so scattered across opening theory that the resulting games will cross over lots of different chess themes during play. A few positions could easily just be tactical wars, or positional skirmishes. But these seem to "have it all". Some are definitely quite tactical, with king-side attacks brewing quickly. Others are not. I could probably run thru a few games and at least point out the ones that get wild quickly.

Another unspoken aspect of this suite IMO is one can use it to help build an opening book for an engine. I don't mean to use the suite as a basis per se, but to test how an engine handles various openings, as well as strategic themes.

You can find the PGN text at the Winboard forum or as a download here.

You can also see the positions (diagrams) from the full suite, with the main theoretic moves, as well as the rough theoretic evaluation, and the main alternatives at the Winboard forum

As a sidenote, now speaking as a moderator, a little less bickering with other members would be appreciated. Thanks.

Albert

bob · Post by **bob** » Wed Sep 19, 2007 4:59 am

nczempin wrote:
bob wrote: I missed one key point in your post. There are _no_ endgame positions in this set. All are early to late opening positions. The middlegame is still to be played before reaching endgames. So they cover the gamut of chess knowledge and tactics.
Except for the gamut of endgames, which, if you wanted to test them more thoroughly, would need to be included specifically, rather than hoping that they will occur by chance.

Essentially, this is the logic you are using against just using the starting position.

I have no idea what that means. I have played _millions_ of games with these positions. I have looked at tens of thousands of games. I have seen attacks. defenses. tactics. positional play. Endgames of all kinds. I mean these are strong chess players. Give them reasonable opening positions and you will see all sorts of game conclusions.

Otherwise we would never see such games since we all start in the same starting position in real games. Whether Silver's positions cross over every one of my eval terms or not is both impossible to judge and irrelevant to whether the testing methodology works or not.

Again we drift far from the original "how to test" discussion with respect to how many games are enough. We are now wandering around in a fog of

not enough positions or too many positions or not varied enough positions...

correlation between games and how one could screw up a test to cause that.

Are the results made up or are they real...

Why did I choose this particular set (mainly because at that instant in time it was the only set of that kind of data I had) as opposed to another set, or did I pick and choose to give the worst cases... or something else...

And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...

nczempin · Post by **nczempin** » Wed Sep 19, 2007 11:54 am

bob wrote: Again we drift far from the original "how to test" discussion with respect to how many games are enough. We are now wandering around in a fog of

No, I explicitly opened this new thread to discuss a question that is not directly related to the testing discussion, but came to my mind there.

You don't need to participate if it doesn't interest you.

nczempin · Post by **nczempin** » Wed Sep 19, 2007 11:59 am

bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...

I don't contest your point that "it takes more games than most believe"; I agree that that many are too quick in their judgement.

But not all are, and you seem to have fixated on putting hgm and me in that crowd, and are not open to discussing that perhaps there is a middle ground.

nczempin · Post by **nczempin** » Wed Sep 19, 2007 12:01 pm

bob wrote:
nczempin wrote:
bob wrote: I missed one key point in your post. There are _no_ endgame positions in this set. All are early to late opening positions. The middlegame is still to be played before reaching endgames. So they cover the gamut of chess knowledge and tactics.
Except for the gamut of endgames, which, if you wanted to test them more thoroughly, would need to be included specifically, rather than hoping that they will occur by chance.

Essentially, this is the logic you are using against just using the starting position.
I have no idea what that means. I have played _millions_ of games with these positions. I have looked at tens of thousands of games. I have seen attacks. defenses. tactics. positional play. Endgames of all kinds. I mean these are strong chess players. Give them reasonable opening positions and you will see all sorts of game conclusions.

You could have played millions of games from the starting position, and it would not change one iota of your statement.

Okay, I overlooked the next paragraph, where you specifically claim that indeed it does change the kinds of games you would see.

The question is: Is your goal to see as many situations as possible that don't even occur when starting from the starting position (how useful ist that??) or is it your goal to write an engine that plays as well as possible from the starting position?

And for my engine this argument is even stronger. Many of the positions my engine would never get into, and thus all the finer points of, say an isolated QP and all the millions of games would not make it stronger.

Of course as a human player or an engine gets stronger, they need to become more well-rounded. But again, that usually starts to become an issue not before an Elo of around 2000 in human terms.

I don't understand why you find it so hard to take one step back and empathise with an engine programmer who has an engine weaker than 2000, NOW (and not in 1978 or whenever your engine last had that strength, in a completely different environment, when that strength was considered state-of-the art).

I don't have any problems in empathising with you and acknowledging all the problems you have at the top. I try not to claim that I know anything about the top, except that I know a few things that I know are true for my situation, and where you have made clear that they are an issue at your level.

Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

The Silver Suite

Re: Ali Baba and the 40 positions

Re: The Silver Suite

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions