Ali Baba and the 40 positions

nczempin · Post by **nczempin** » Wed Sep 19, 2007 12:13 pm

bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...

Okay, I'll try to ask the question the other way round:
Under what, if any, circumstances would you deem a low, but not too low, number of games (let's say, a gauntlet against 20 engines, match of two against each), to be acceptable? Be as extreme as you like.

Albert Silver · Post by **Albert Silver** » Wed Sep 19, 2007 4:12 pm

nczempin wrote:
bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...
Okay, I'll try to ask the question the other way round:
Under what, if any, circumstances would you deem a low, but not too low, number of games (let's say, a gauntlet against 20 engines, match of two against each), to be acceptable? Be as extreme as you like.

I suppose it depends on whether you find 2-game matches to be acceptable. I presume it isn't the same position repeated 200 times (100 with each color). The point of a suite like Nunn's is that it creates reproducible equal conditions for each engine without book quality interference. In other words, it tests only the engine. If you allow random opening choices, you are increasing the variance in the results, no?

Albert

bob · Post by **bob** » Wed Sep 19, 2007 4:37 pm

nczempin wrote:
bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...
I don't contest your point that "it takes more games than most believe"; I agree that that many are too quick in their judgement.

But not all are, and you seem to have fixated on putting hgm and me in that crowd, and are not open to discussing that perhaps there is a middle ground.

If you use 100 games, you are "in that crowd". If you use 1000+ then you are doing things fairly reasonably.

where do you fit?

100 games clearly won't cut it unless you have a completely deterministic program. Most are not... I'm not sure why hgm is interested since his program is obviously not designed for strength in the first place, but has a different "guiding constraint". But real programs are a bit more sophisticated in timing (among other things) and that introduces a level of non-determinism that makes 100 game decisions highly inaccurate.

bob · Post by **bob** » Wed Sep 19, 2007 4:45 pm

nczempin wrote:
bob wrote:
nczempin wrote:
bob wrote: I missed one key point in your post. There are _no_ endgame positions in this set. All are early to late opening positions. The middlegame is still to be played before reaching endgames. So they cover the gamut of chess knowledge and tactics.
Except for the gamut of endgames, which, if you wanted to test them more thoroughly, would need to be included specifically, rather than hoping that they will occur by chance.

Essentially, this is the logic you are using against just using the starting position.
I have no idea what that means. I have played _millions_ of games with these positions. I have looked at tens of thousands of games. I have seen attacks. defenses. tactics. positional play. Endgames of all kinds. I mean these are strong chess players. Give them reasonable opening positions and you will see all sorts of game conclusions.
You could have played millions of games from the starting position, and it would not change one iota of your statement.

Okay, I overlooked the next paragraph, where you specifically claim that indeed it does change the kinds of games you would see.

The question is: Is your goal to see as many situations as possible that don't even occur when starting from the starting position (how useful ist that??) or is it your goal to write an engine that plays as well as possible from the starting position?

Question does not compute for me, in any context I think about with respect to my program. I use an opening book. I have a fairly wide book to avoid "cooked lines" by my opponent. If I just started from the initial position with no book, this might be an issue. But I don't. And most other programs don't either. So the positions in the Silver test are positions I would probably encounter regularly playing on ICC.

So I don't understand how there could be "many situations that don't even occur from the starting position" when all the positions in the test are standard opening positions. What exactly are you asking or thinking here??

And for my engine this argument is even stronger. Many of the positions my engine would never get into, and thus all the finer points of, say an isolated QP and all the millions of games would not make it stronger.

Just continue working for another year or two. You will find that either you are going to go into iso-qp openings, or your opening choices are going to be so restricted that everyone is going to cook your book and you won't win any tournament games whatsoever.

Of course as a human player or an engine gets stronger, they need to become more well-rounded. But again, that usually starts to become an issue not before an Elo of around 2000 in human terms.

If you wait until you get to 2000+ before you start to plan on doing things a 2000+ program has to do, something tells me you are never going to get there...

I don't understand why you find it so hard to take one step back and empathise with an engine programmer who has an engine weaker than 2000, NOW (and not in 1978 or whenever your engine last had that strength, in a completely different environment, when that strength was considered state-of-the art).

This sounds like faulty development to me. My program played at a 1600 level in 1970. It had a large opening book, and it played many different sorts of openings including 1. e4, 1. d4 and 1. f4 to name three. In 1979 it was around 2000. Still played the same openings. Had evaluation terms for handling isolated pawns (and other pawn structure issues like backward pawns and the like).

So I don't follow where you believe you can ignore most of that until you pass 2000. I am not sure you can _reach_ 2000 without most of that in place.

I don't have any problems in empathising with you and acknowledging all the problems you have at the top. I try not to claim that I know anything about the top, except that I know a few things that I know are true for my situation, and where you have made clear that they are an issue at your level.

bob · Post by **bob** » Wed Sep 19, 2007 4:51 pm

nczempin wrote:
bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...
Okay, I'll try to ask the question the other way round:
Under what, if any, circumstances would you deem a low, but not too low, number of games (let's say, a gauntlet against 20 engines, match of two against each), to be acceptable? Be as extreme as you like.

I can't think of one personally. Normal engine-engine matches produce even more variance than I am seeing, because the opening book quality becomes a _major_ issue. As does the algorithm you use to choose moves from your book. I chose to eliminate this significant part of the randomness for these tests. I will certainly, at some point, play games using a book, but then I will be only looking at how the book affects the results against several opponents, so that I have an idea of whether the book needs work or not. But I am not trying to test that in these matches, only evaluate change that were made to the engine itself.

To make the point, look at a rating list where the programs include parallel engines. Now they have random book effects, non-determinism in the search caused by the parallel search effects, non-determinism in the timing as already mentioned, an sometimes you see a report like "dual-cpu X is weaker than single cpu X when we played a 30 game match." 30 rounds is not enough to make that kind of determinism with any accuracy at all. 300 would not be enough.

Uri Blass · Post by **Uri Blass** » Wed Sep 19, 2007 6:22 pm

programs today do not need more than piece square table evaluation and fast search to reach 2000

one of the main problems of Eden is that it is very slow and I say very slow I mean that it is probably possible to make it more than 10 times faster.

If you also add better order of moves and hash then I believe that it can get above 2000 with no evaluation change.

Uri

nczempin · Post by **nczempin** » Wed Sep 19, 2007 6:52 pm

bob wrote: If you wait until you get to 2000+ before you start to plan on doing things a 2000+ program has to do, something tells me you are never going to get there...

Gee, thanks for the encouragement. How about you invite me to dinner when I reach that level?

nczempin · Post by **nczempin** » Wed Sep 19, 2007 6:57 pm

Uri Blass wrote:programs today do not need more than piece square table evaluation and fast search to reach 2000

one of the main problems of Eden is that it is very slow and I say very slow I mean that it is probably possible to make it more than 10 times faster.

If you also add better order of moves and hash then I believe that it can get above 2000 with no evaluation change.

Uri

I know it is very slow (and I know a large number of things that I can change easily that will make it faster), and I know that the evaluation will not have to be changed considerably. I would like it to make more use of the good attacking position into which it gets because the other programs violate some of the very basics, grabbing material but disregarding king safety. Once I get the speed up to a decent level, those programs will be crushed. For now they get by, because at a certain level the attack fizzles out, because open lines alone don't cut it, you need to take into account some of Ed Schroeder's ideas regarding squares attacked near the other king for Eden to actually want to do anything with its usually well-compensated position.

bob · Post by **bob** » Wed Sep 19, 2007 7:38 pm

nczempin wrote:
bob wrote: If you wait until you get to 2000+ before you start to plan on doing things a 2000+ program has to do, something tells me you are never going to get there...

Gee, thanks for the encouragement. How about you invite me to dinner when I reach that level?

I suspect you will have changed your methodology by then so it will be moot. It is going to be _very_ difficult to get to 2000 on tactics alone. I've not tried to play games with crafty using no eval but material in the last 10 years, but when I last did that, it was very ugly. Not every position has a tactical solution. In fact, most don't. Making horrible moves in those positions means you only reach positions where you will finally see tactically that you are lost.

I was doing positional stuff in my 1600 program. I had "square of the king" stuff for endgames in 1976 when my program was under 1600 on hardware of the era. I put it in because I saw humans tricking me in too many games into winning a pawn but by doing so I ended up in a position where one of his pawns ran and the depth I could reach back then was not enough to see the consequence.

nczempin · Post by **nczempin** » Wed Sep 19, 2007 7:56 pm

bob wrote: I suspect you will have changed your methodology by then so it will be moot. It is going to be _very_ difficult to get to 2000 on tactics alone. I've not tried to play games with crafty using no eval but material in the last 10 years, but when I last did that, it was very ugly. Not every position has a tactical solution. In fact, most don't. Making horrible moves in those positions means you only reach positions where you will finally see tactically that you are lost.

Bob:
What do you know about the positional strength of my engine?
Where did you get this information?
Did I mention anything about the relative strengths of my positional play vs. my tactical play anywhere?
Have you ever even touched my engine?

Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions