But I thought that "by then" will never happen?bob wrote:I suspect you will have changed your methodology by then so it will be moot.nczempin wrote:Gee, thanks for the encouragement. How about you invite me to dinner when I reach that level?bob wrote: If you wait until you get to 2000+ before you start to plan on doing things a 2000+ program has to do, something tells me you are never going to get there...
Ali Baba and the 40 positions
Moderator: Ras
- 
				nczempin
Re: Ali Baba and the 40 positions
- 
				nczempin
Re: Ali Baba and the 40 positions
So even a result of 0-40 for version A, and 40-0 for version 40' wouldn't let you conclude that version A' ist stronger, at your chosen confidence level (or whatever the confidence level would be for this result, you do the math)?bob wrote:I can't think of one personally. Normal engine-engine matches produce even more variance than I am seeing, because the opening book quality becomes a _major_ issue. As does the algorithm you use to choose moves from your book. I chose to eliminate this significant part of the randomness for these tests. I will certainly, at some point, play games using a book, but then I will be only looking at how the book affects the results against several opponents, so that I have an idea of whether the book needs work or not. But I am not trying to test that in these matches, only evaluate change that were made to the engine itself.nczempin wrote:Okay, I'll try to ask the question the other way round:bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...
Under what, if any, circumstances would you deem a low, but not too low, number of games (let's say, a gauntlet against 20 engines, match of two against each), to be acceptable? Be as extreme as you like.
- 
				nczempin
Re: Ali Baba and the 40 positions
Does anybody besides Robert understand what I was trying to say? If not, I must assume it must be my fault for being unable to bring the point across.
			
			
									
						
										
						- 
				Jan Brouwer
- Posts: 201
- Joined: Thu Mar 22, 2007 7:12 pm
- Location: Netherlands
Re: Ali Baba and the 40 positions
Here I can provide a data point for a program with a very simple evaluation function (Rotor 0.1a).So I don't follow where you believe you can ignore most of that until you pass 2000. I am not sure you can _reach_ 2000 without most of that in place.
The evaluation function consists of the following features (all weights have a midgame and an endgame value and are combined in the Fruit way):
- material balance (according Kaufman's article)
- piece square tables
- doubled pawns (constant weight)
- "open" pawns (pawns with no enemy pawn in front, weight linear function of rank)
- "king safety" (counting friendly pawns on six squares in front of king)
This, combined with reasonably high NPS, and a reasonably standard search (PVS, hash-table, null-move, check extension) was sufficient to reach about 2200-2300 (on UEL / CEGT / CCRL).
My guess is that is possible to reach 2100-2200 with just material balance + piece square tables.
- 
				bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Ali Baba and the 40 positions
1. absolutely nothing. And I intend to keep it that way.nczempin wrote:Bob:bob wrote: I suspect you will have changed your methodology by then so it will be moot. It is going to be _very_ difficult to get to 2000 on tactics alone. I've not tried to play games with crafty using no eval but material in the last 10 years, but when I last did that, it was very ugly. Not every position has a tactical solution. In fact, most don't. Making horrible moves in those positions means you only reach positions where you will finally see tactically that you are lost.
What do you know about the positional strength of my engine?
Where did you get this information?
Did I mention anything about the relative strengths of my positional play vs. my tactical play anywhere?
Have you ever even touched my engine?
2. nowhere.
3. Yes. You at least mentioned that you believed it better to teach tactics first, and then positional/strategic play. You also said your local chess federation teaches that way. And then you have said several times that you are not making positional changes, you are making speedup and/or basic search changes. So based on past comments by you, you certainly implied that your eval was very simplistic at present.
4. No, and I intend to keep that that way as well.
Now, for my question: What is your problem? You want to pick minute points and argue those to death, without worrying about the larger picture I am trying to paint.
- 
				bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Ali Baba and the 40 positions
Can't read between the lines? You are going to have a tough time passing 2000 without working on the evaluation some. So by the time you pass 2000, you will have figured that out and will have been working on it for a while.nczempin wrote:But I thought that "by then" will never happen?bob wrote:I suspect you will have changed your methodology by then so it will be moot.nczempin wrote:Gee, thanks for the encouragement. How about you invite me to dinner when I reach that level?bob wrote: If you wait until you get to 2000+ before you start to plan on doing things a 2000+ program has to do, something tells me you are never going to get there...
I thought that was pretty intuitive from my comment...
I do assume that you can figure out when an approach is not working and modify the approach???
- 
				bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Ali Baba and the 40 positions
I would personally rather see games against real humans. I have seen 2000 humans stomp 2200 programs when the 2200 programs make blind positional mistakes and don't understand passed pawns or long-term kingside attacks, and go pawn-grabbing instead...Jan Brouwer wrote:Here I can provide a data point for a program with a very simple evaluation function (Rotor 0.1a).So I don't follow where you believe you can ignore most of that until you pass 2000. I am not sure you can _reach_ 2000 without most of that in place.
The evaluation function consists of the following features (all weights have a midgame and an endgame value and are combined in the Fruit way):
- material balance (according Kaufman's article)
- piece square tables
- doubled pawns (constant weight)
- "open" pawns (pawns with no enemy pawn in front, weight linear function of rank)
- "king safety" (counting friendly pawns on six squares in front of king)
This, combined with reasonably high NPS, and a reasonably standard search (PVS, hash-table, null-move, check extension) was sufficient to reach about 2200-2300 (on UEL / CEGT / CCRL).
My guess is that is possible to reach 2100-2200 with just material balance + piece square tables.
I suppose I could try Crafty with just material and piece/square tables on ICC for a while to see what happens...
- 
				mhull  
- Posts: 13447
- Joined: Wed Mar 08, 2006 9:02 pm
- Location: Dallas, Texas
- Full name: Matthew Hull
Re: Ali Baba and the 40 positions
Scrappy returns?bob wrote: would personally rather see games against real humans. I have seen 2000 humans stomp 2200 programs when the 2200 programs make blind positional mistakes and don't understand passed pawns or long-term kingside attacks, and go pawn-grabbing instead...
I suppose I could try Crafty with just material and piece/square tables on ICC for a while to see what happens...

Matthew Hull
			
						- 
				bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Ali Baba and the 40 positions
First I don't ever expect, nor have I seen such a result. I've never lost a match of any significant length with zero wins or draws, so I can't go that far in speculating. But one thing is for sure, the more random features you add in, and books, pondering, SMP are big ones, the more games you have to play to produce a result with a reasonable level of confidence.nczempin wrote:So even a result of 0-40 for version A, and 40-0 for version 40' wouldn't let you conclude that version A' ist stronger, at your chosen confidence level (or whatever the confidence level would be for this result, you do the math)?bob wrote:I can't think of one personally. Normal engine-engine matches produce even more variance than I am seeing, because the opening book quality becomes a _major_ issue. As does the algorithm you use to choose moves from your book. I chose to eliminate this significant part of the randomness for these tests. I will certainly, at some point, play games using a book, but then I will be only looking at how the book affects the results against several opponents, so that I have an idea of whether the book needs work or not. But I am not trying to test that in these matches, only evaluate change that were made to the engine itself.nczempin wrote:Okay, I'll try to ask the question the other way round:bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...
Under what, if any, circumstances would you deem a low, but not too low, number of games (let's say, a gauntlet against 20 engines, match of two against each), to be acceptable? Be as extreme as you like.
- 
				bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Ali Baba and the 40 positions
Scrappy has been on some but nobody plays it. I'm not sure I would want to take a program with a 3500 rating or whatever it is on ICC right now and lobotomize it and let it play. Someone would use that to abuse the rating and ICC would probably not be so happy about it happening.mhull wrote:Scrappy returns?bob wrote: would personally rather see games against real humans. I have seen 2000 humans stomp 2200 programs when the 2200 programs make blind positional mistakes and don't understand passed pawns or long-term kingside attacks, and go pawn-grabbing instead...
I suppose I could try Crafty with just material and piece/square tables on ICC for a while to see what happens...
Main problem, probably, is that scrappy runs on my core-2 laptop which is _way_ faster than my dual xeon in the office. By a factor of 3+