Ali Baba and the 40 positions

nczempin · Post by **nczempin** » Wed Sep 19, 2007 7:57 pm

bob wrote:
nczempin wrote:
bob wrote: If you wait until you get to 2000+ before you start to plan on doing things a 2000+ program has to do, something tells me you are never going to get there...

Gee, thanks for the encouragement. How about you invite me to dinner when I reach that level?
I suspect you will have changed your methodology by then so it will be moot.

But I thought that "by then" will never happen?

nczempin · Post by **nczempin** » Wed Sep 19, 2007 9:10 pm

bob wrote:
nczempin wrote:
bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...
Okay, I'll try to ask the question the other way round:
Under what, if any, circumstances would you deem a low, but not too low, number of games (let's say, a gauntlet against 20 engines, match of two against each), to be acceptable? Be as extreme as you like.
I can't think of one personally. Normal engine-engine matches produce even more variance than I am seeing, because the opening book quality becomes a _major_ issue. As does the algorithm you use to choose moves from your book. I chose to eliminate this significant part of the randomness for these tests. I will certainly, at some point, play games using a book, but then I will be only looking at how the book affects the results against several opponents, so that I have an idea of whether the book needs work or not. But I am not trying to test that in these matches, only evaluate change that were made to the engine itself.

So even a result of 0-40 for version A, and 40-0 for version 40' wouldn't let you conclude that version A' ist stronger, at your chosen confidence level (or whatever the confidence level would be for this result, you do the math)?

nczempin · Post by **nczempin** » Wed Sep 19, 2007 9:18 pm

Does anybody besides Robert understand what I was trying to say? If not, I must assume it must be my fault for being unable to bring the point across.

Jan Brouwer · Post by **Jan Brouwer** » Wed Sep 19, 2007 9:20 pm

So I don't follow where you believe you can ignore most of that until you pass 2000. I am not sure you can _reach_ 2000 without most of that in place.

Here I can provide a data point for a program with a very simple evaluation function (Rotor 0.1a).
The evaluation function consists of the following features (all weights have a midgame and an endgame value and are combined in the Fruit way):

- material balance (according Kaufman's article)
- piece square tables
- doubled pawns (constant weight)
- "open" pawns (pawns with no enemy pawn in front, weight linear function of rank)
- "king safety" (counting friendly pawns on six squares in front of king)

This, combined with reasonably high NPS, and a reasonably standard search (PVS, hash-table, null-move, check extension) was sufficient to reach about 2200-2300 (on UEL / CEGT / CCRL).
My guess is that is possible to reach 2100-2200 with just material balance + piece square tables.

bob · Post by **bob** » Wed Sep 19, 2007 10:00 pm

nczempin wrote:
bob wrote: I suspect you will have changed your methodology by then so it will be moot. It is going to be _very_ difficult to get to 2000 on tactics alone. I've not tried to play games with crafty using no eval but material in the last 10 years, but when I last did that, it was very ugly. Not every position has a tactical solution. In fact, most don't. Making horrible moves in those positions means you only reach positions where you will finally see tactically that you are lost.
Bob:
What do you know about the positional strength of my engine?
Where did you get this information?
Did I mention anything about the relative strengths of my positional play vs. my tactical play anywhere?
Have you ever even touched my engine?

1. absolutely nothing. And I intend to keep it that way.

2. nowhere.

3. Yes. You at least mentioned that you believed it better to teach tactics first, and then positional/strategic play. You also said your local chess federation teaches that way. And then you have said several times that you are not making positional changes, you are making speedup and/or basic search changes. So based on past comments by you, you certainly implied that your eval was very simplistic at present.

4. No, and I intend to keep that that way as well.

Now, for my question: What is your problem? You want to pick minute points and argue those to death, without worrying about the larger picture I am trying to paint.

bob · Post by **bob** » Wed Sep 19, 2007 10:02 pm

nczempin wrote:
bob wrote:
nczempin wrote:
bob wrote: If you wait until you get to 2000+ before you start to plan on doing things a 2000+ program has to do, something tells me you are never going to get there...

Gee, thanks for the encouragement. How about you invite me to dinner when I reach that level?
I suspect you will have changed your methodology by then so it will be moot.
But I thought that "by then" will never happen?

Can't read between the lines? You are going to have a tough time passing 2000 without working on the evaluation some. So by the time you pass 2000, you will have figured that out and will have been working on it for a while.

I thought that was pretty intuitive from my comment...

I do assume that you can figure out when an approach is not working and modify the approach???

bob · Post by **bob** » Wed Sep 19, 2007 10:04 pm

Jan Brouwer wrote:
So I don't follow where you believe you can ignore most of that until you pass 2000. I am not sure you can _reach_ 2000 without most of that in place.
Here I can provide a data point for a program with a very simple evaluation function (Rotor 0.1a).
The evaluation function consists of the following features (all weights have a midgame and an endgame value and are combined in the Fruit way):

- material balance (according Kaufman's article)
- piece square tables
- doubled pawns (constant weight)
- "open" pawns (pawns with no enemy pawn in front, weight linear function of rank)
- "king safety" (counting friendly pawns on six squares in front of king)

This, combined with reasonably high NPS, and a reasonably standard search (PVS, hash-table, null-move, check extension) was sufficient to reach about 2200-2300 (on UEL / CEGT / CCRL).
My guess is that is possible to reach 2100-2200 with just material balance + piece square tables.

I would personally rather see games against real humans. I have seen 2000 humans stomp 2200 programs when the 2200 programs make blind positional mistakes and don't understand passed pawns or long-term kingside attacks, and go pawn-grabbing instead...

I suppose I could try Crafty with just material and piece/square tables on ICC for a while to see what happens...

mhull · Post by **mhull** » Wed Sep 19, 2007 10:06 pm

bob wrote: would personally rather see games against real humans. I have seen 2000 humans stomp 2200 programs when the 2200 programs make blind positional mistakes and don't understand passed pawns or long-term kingside attacks, and go pawn-grabbing instead...

I suppose I could try Crafty with just material and piece/square tables on ICC for a while to see what happens...

Scrappy returns?

bob · Post by **bob** » Wed Sep 19, 2007 10:07 pm

nczempin wrote:
bob wrote:
nczempin wrote:
bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...
Okay, I'll try to ask the question the other way round:
Under what, if any, circumstances would you deem a low, but not too low, number of games (let's say, a gauntlet against 20 engines, match of two against each), to be acceptable? Be as extreme as you like.
I can't think of one personally. Normal engine-engine matches produce even more variance than I am seeing, because the opening book quality becomes a _major_ issue. As does the algorithm you use to choose moves from your book. I chose to eliminate this significant part of the randomness for these tests. I will certainly, at some point, play games using a book, but then I will be only looking at how the book affects the results against several opponents, so that I have an idea of whether the book needs work or not. But I am not trying to test that in these matches, only evaluate change that were made to the engine itself.

So even a result of 0-40 for version A, and 40-0 for version 40' wouldn't let you conclude that version A' ist stronger, at your chosen confidence level (or whatever the confidence level would be for this result, you do the math)?

First I don't ever expect, nor have I seen such a result. I've never lost a match of any significant length with zero wins or draws, so I can't go that far in speculating. But one thing is for sure, the more random features you add in, and books, pondering, SMP are big ones, the more games you have to play to produce a result with a reasonable level of confidence.

bob · Post by **bob** » Wed Sep 19, 2007 11:07 pm

mhull wrote:
bob wrote: would personally rather see games against real humans. I have seen 2000 humans stomp 2200 programs when the 2200 programs make blind positional mistakes and don't understand passed pawns or long-term kingside attacks, and go pawn-grabbing instead...

I suppose I could try Crafty with just material and piece/square tables on ICC for a while to see what happens...
Scrappy returns?

Scrappy has been on some but nobody plays it. I'm not sure I would want to take a program with a 3500 rating or whatever it is on ICC right now and lobotomize it and let it play. Someone would use that to abuse the rating and ICC would probably not be so happy about it happening.

Main problem, probably, is that scrappy runs on my core-2 laptop which is _way_ faster than my dual xeon in the office. By a factor of 3+

Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions