Ali Baba and the 40 positions

bob · Post by **bob** » Thu Sep 20, 2007 4:33 am

Uri Blass wrote:programs today do not need more than piece square table evaluation and fast search to reach 2000

one of the main problems of Eden is that it is very slow and I say very slow I mean that it is probably possible to make it more than 10 times faster.

If you also add better order of moves and hash then I believe that it can get above 2000 with no evaluation change.

Uri

I remain unconvinced of that. In blitz games, on a chess server, perhaps, and I plan on testing this hypothesis in real games myself before long. But for more serious games at longer time controls, I don't believe just centralizing pieces and being tactically sharp is good enough. It is very easy to play into a dead lost position before you have any idea the roof is falling in. And by that time it is all over.

I'm not sure my idea of taking crafty and cutting out everything but the piece/square tables is reasonable for this test, but it is the best I can do. It is probably a bit more mature in other areas such as the null-move stuff, reductions, extensions, etc.

nczempin · Post by **nczempin** » Thu Sep 20, 2007 10:47 am

bob wrote: Can't read between the lines? You are going to have a tough time passing 2000 without working on the evaluation some. So by the time you pass 2000, you will have figured that out and will have been working on it for a while.

If I am making minute comments, why do you perpetuate them?

You claim that I have an insufficient evaluation.

This is very far from the truth. I explicitly stated that my engine regularly beats engines that go two plies deeper. How is that going to happen if my evaluation is so bad?

I'll spell it out for you: The eval is the strongest part; it is highly likely that it will be sufficient till 2000.

That, and the comment about my local association explicitly talked about strategy "beyond the basics". Not about having no evaluation.

nczempin · Post by **nczempin** » Thu Sep 20, 2007 10:49 am

bob wrote:
Uri Blass wrote:programs today do not need more than piece square table evaluation and fast search to reach 2000

one of the main problems of Eden is that it is very slow and I say very slow I mean that it is probably possible to make it more than 10 times faster.

If you also add better order of moves and hash then I believe that it can get above 2000 with no evaluation change.

Uri
I remain unconvinced of that. In blitz games, on a chess server, perhaps, and I plan on testing this hypothesis in real games myself before long.

How can you test THAT hypothesis if you are not going to touch my engine? Oh, I see, you're going to test a different hypothesis and then claim it'll apply to my engine.

nczempin · Post by **nczempin** » Thu Sep 20, 2007 10:53 am

bob wrote:
nczempin wrote:
bob wrote: I suspect you will have changed your methodology by then so it will be moot. It is going to be _very_ difficult to get to 2000 on tactics alone. I've not tried to play games with crafty using no eval but material in the last 10 years, but when I last did that, it was very ugly. Not every position has a tactical solution. In fact, most don't. Making horrible moves in those positions means you only reach positions where you will finally see tactically that you are lost.
Bob:
What do you know about the positional strength of my engine?
Where did you get this information?
Did I mention anything about the relative strengths of my positional play vs. my tactical play anywhere?
Have you ever even touched my engine?
1. absolutely nothing. And I intend to keep it that way.

This is yet another rude, uncalled for, comment.

If you intend to stay away from my engine, I ask you to stay away from threads where I explicitly talk about it. Feel free to discuss similar questions in other threads.

2. nowhere.

3. Yes. You at least mentioned that you believed it better to teach tactics first, and then positional/strategic play. You also said your local chess federation teaches that way. And then you have said several times that you are not making positional changes, you are making speedup and/or basic search changes. So based on past comments by you, you certainly implied that your eval was very simplistic at present.

It is far less simplistic than that of a large majority of the engines I play against. Without you knowing anything about my engine, please refrain from commenting.

If you're not interested, leave me alone.

4. No, and I intend to keep that that way as well.

Now, for my question: What is your problem? You want to pick minute points and argue those to death, without worrying about the larger picture I am trying to paint.

nczempin · Post by **nczempin** » Thu Sep 20, 2007 11:08 am

bob wrote:
nczempin wrote:
bob wrote:
nczempin wrote:
bob wrote: And we dance around how many games are needed, quoting statistics that depend on an infinite number of games, or on a few games, or on the phase of the moon. When the only point I started out with was that it takes far more games than most believe to really understand which program is better or which version is better or which programming change is better. And nothing has changed that point at all, we've just wasted a ton of time on side-issues...
Okay, I'll try to ask the question the other way round:
Under what, if any, circumstances would you deem a low, but not too low, number of games (let's say, a gauntlet against 20 engines, match of two against each), to be acceptable? Be as extreme as you like.
I can't think of one personally. Normal engine-engine matches produce even more variance than I am seeing, because the opening book quality becomes a _major_ issue. As does the algorithm you use to choose moves from your book. I chose to eliminate this significant part of the randomness for these tests. I will certainly, at some point, play games using a book, but then I will be only looking at how the book affects the results against several opponents, so that I have an idea of whether the book needs work or not. But I am not trying to test that in these matches, only evaluate change that were made to the engine itself.

So even a result of 0-40 for version A, and 40-0 for version 40' wouldn't let you conclude that version A' ist stronger, at your chosen confidence level (or whatever the confidence level would be for this result, you do the math)?
First I don't ever expect, nor have I seen such a result. I've never lost a match of any significant length with zero wins or draws, so I can't go that far in speculating. But one thing is for sure, the more random features you add in, and books, pondering, SMP are big ones, the more games you have to play to produce a result with a reasonable level of confidence.

Just answer the question please. Or just say explicitly that you're not interested in an objective discussion. No, come to think of it, you don't need to do that, you've made it very clear already.

I have certainly learned a few things in this discussion.
1. My intuitive approach is sound at closer inspection.
2. My image of Prof. Hyatt as a person, and as a researcher, has considerably deteriorated. I know he will care about this as much as if a bicycle in China has fallen over, but it makes me sad.

hgm · Post by **hgm** » Thu Sep 20, 2007 1:18 pm

bob wrote:
Uri Blass wrote:programs today do not need more than piece square table evaluation and fast search to reach 2000

one of the main problems of Eden is that it is very slow and I say very slow I mean that it is probably possible to make it more than 10 times faster.

If you also add better order of moves and hash then I believe that it can get above 2000 with no evaluation change.

Uri
I remain unconvinced of that. In blitz games, on a chess server, perhaps, and I plan on testing this hypothesis in real games myself before long. But for more serious games at longer time controls, I don't believe just centralizing pieces and being tactically sharp is good enough. It is very easy to play into a dead lost position before you have any idea the roof is falling in. And by that time it is all over.

I'm not sure my idea of taking crafty and cutting out everything but the piece/square tables is reasonable for this test, but it is the best I can do. It is probably a bit more mature in other areas such as the null-move stuff, reductions, extensions, etc.

Well, not all rating scales are comparable, of course, as they can have arbitrary offsets.

But I would say that the position of uMax 4.8 on the CCRL scale pretty much proves the point. And uMax does not even have piece-square tables. It just has a single centralization table that tabulates the square of the distance between each board square and the point between e4 and e5. That table is than used by P,N,B and K alike (and ignored by R,Q).

And uMax isn't even particularly fast, as it uses a mailbox move generator that scans the board looking for own pieces, TSCP style. Of course having no true eval compensates that a lot, but having virtually no move ordering is again very bad.

Yet is is able to compete at a level of 2000 CCRL-Elo in 40/40' games. (Oh, and it does finish all iterations...

) On the ICC server it even got blitz and standard ratings (28min games) of 2260.

mhull · Post by **mhull** » Thu Sep 20, 2007 4:57 pm

bob wrote:Scrappy has been on some but nobody plays it.

I've noticed that programs that haven't played in a long time won't show up for example with "who C2000-9999" because their ratings are out of date. Only after the program satisfies the [need] quota will it be displayed. Some people check this way for active computers instead of consulting the seek lists.

xsadar · Post by **xsadar** » Thu Sep 20, 2007 7:28 pm

nczempin wrote:
bob wrote:
Uri Blass wrote:programs today do not need more than piece square table evaluation and fast search to reach 2000

one of the main problems of Eden is that it is very slow and I say very slow I mean that it is probably possible to make it more than 10 times faster.

If you also add better order of moves and hash then I believe that it can get above 2000 with no evaluation change.

Uri
I remain unconvinced of that. In blitz games, on a chess server, perhaps, and I plan on testing this hypothesis in real games myself before long.
How can you test THAT hypothesis if you are not going to touch my engine? Oh, I see, you're going to test a different hypothesis and then claim it'll apply to my engine.

You seem to be confused about what THAT hypothesis is. The hypothesis is:

Uri Blass wrote:programs today do not need more than piece square table evaluation and fast search to reach 2000

This has nothing to do with YOUR engine, but engines today in general. Would you please stop annoying the rest of us by desperately trying to argue against everything Bob says, or if you insist on arguing worthless points, do it in private messages where the rest of us don't have to see it. But if you refuse to do either of those, at least you could actually try to understand what he's saying before you try to argue.

bob · Post by **bob** » Fri Sep 21, 2007 4:29 am

nczempin wrote:
bob wrote:
Uri Blass wrote:programs today do not need more than piece square table evaluation and fast search to reach 2000

one of the main problems of Eden is that it is very slow and I say very slow I mean that it is probably possible to make it more than 10 times faster.

If you also add better order of moves and hash then I believe that it can get above 2000 with no evaluation change.

Uri
I remain unconvinced of that. In blitz games, on a chess server, perhaps, and I plan on testing this hypothesis in real games myself before long.
How can you test THAT hypothesis if you are not going to touch my engine? Oh, I see, you're going to test a different hypothesis and then claim it'll apply to my engine.

Again, put brain in gear before putting keyboard in motion. I stated _exactly_ how _I_ would test that hypothesis. I didn't stutter. I didn't speak in a foreign language. I said _specifically_ :

I am going to take crafty, and cut out all eval but material and piece/square values, and run it on ICC to see what that does. If Crafty doesn't reach 2000 that way, I can guarantee you your program won't reach 2000 that way either. If Crafty does reach 2000, then it would suggest Uri's hypothesis that material + PC/sq + todays hardware is good enough.

Not one person mentioned _your_ program in this particular idea, so why do you keep coming up with that kind of crap???

I am not, and never have, claimed _anything_ applied to your engine, except for normal sound software engineering principles which apply to _all_ program development efforts...

sheesh...

bob · Post by **bob** » Fri Sep 21, 2007 4:34 am

hgm wrote:
bob wrote:
Uri Blass wrote:programs today do not need more than piece square table evaluation and fast search to reach 2000

one of the main problems of Eden is that it is very slow and I say very slow I mean that it is probably possible to make it more than 10 times faster.

If you also add better order of moves and hash then I believe that it can get above 2000 with no evaluation change.

Uri
I remain unconvinced of that. In blitz games, on a chess server, perhaps, and I plan on testing this hypothesis in real games myself before long. But for more serious games at longer time controls, I don't believe just centralizing pieces and being tactically sharp is good enough. It is very easy to play into a dead lost position before you have any idea the roof is falling in. And by that time it is all over.

I'm not sure my idea of taking crafty and cutting out everything but the piece/square tables is reasonable for this test, but it is the best I can do. It is probably a bit more mature in other areas such as the null-move stuff, reductions, extensions, etc.
Well, not all rating scales are comparable, of course, as they can have arbitrary offsets.

But I would say that the position of uMax 4.8 on the CCRL scale pretty much proves the point. And uMax does not even have piece-square tables. It just has a single centralization table that tabulates the square of the distance between each board square and the point between e4 and e5. That table is than used by P,N,B and K alike (and ignored by R,Q).

And uMax isn't even particularly fast, as it uses a mailbox move generator that scans the board looking for own pieces, TSCP style. Of course having no true eval compensates that a lot, but having virtually no move ordering is again very bad.

Yet is is able to compete at a level of 2000 CCRL-Elo in 40/40' games. (Oh, and it does finish all iterations... ) On the ICC server it even got blitz and standard ratings (28min games) of 2260.

Actually it doesn't prove a thing IMHO. When we talk about ratings, most of us are thinking in terms of some consistent sort of human scale we can appreciate. The rating list ratings don't have a thing to do with that. And then computer vs computer games are far different animals from computer vs human games as well.

So my intent remains unchanged. I think tactics + bad positional play can only get you so far up the scale before the humans begin to rip you up, so long as they are at least good enough to avoid the tactical problems such a program will present.

I'm even going to play some games against that "mindless crafty" myself...

Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions

Re: Ali Baba and the 40 positions