How effective is move ordering from TT?

Uri Blass · Post by **Uri Blass** » Sun Aug 12, 2012 9:21 pm

Rebel wrote:
lkaufman wrote:
Rebel wrote:
Don wrote:
Rebel wrote: And another fun expiriment, the "best" full static eval. No QS but in the middle of the chaos on the board with multiple hanging pieces and counter attacks one evaluates and checking moves needs special code to see if the move is not mate, cool.

So we have 2 challenges, the best dynamic eval (with QS) and the best static eval. Now we need participants
This is just a silly experiment. Probably all the top programs would lose badly to any program that make an attempt to resolve tactics in some way. I'm not going to devote the next week to fixing up Komodo to win some competition that has nothing to do with real chess skill.

There is a contest you can have right now - just test several programs doing a 1 ply search. I don't know what it will prove - but it's a lot better than trying to design an experiment that requires everyone except Vincent to dumb down their search (presumably to prove that Vincent really does have the best program of all.)
Then don't make statements you can not proof

New idea:

1. depth=8 (9 or 10) match. As long as it is fast so many games can be played.
2. Brute force
3. No extensions.
4. Standard QS.

So 100% equal search.

Not much work involved, in my case (2) and (3) are parameter driven and (4) I don't expect more than an hour work.
The winning program will be the one that does the most tactical work in the eval function, so probably Diep. So what? This tells us nothing about which program finds better moves in non-tactical positions, which is (or should be) the relevant questiion.
Quite funny you think Diep would win such a match, I would put my money on Komodo. One requirement of a top-engine programmer is accuracy and punctuality. I see that in Don's postings, I don't see that quality in Vincent's postings. I think it matters.

I am clearly not sure here.

It is possible that Komodo wins at big fixed depth but I am afraid that depth that you can practically test are not big enough.

Vincent claims that diep does not make move in the evaluation.
I think that it is dependent on defintions and it is also possible to say that diep make moves in the evaluation because if diep evaluate trapped knight based on the fact that the knight has no safe squares to move then his evaluation searches all the knight moves to see that they lose the knight.

Rebel · Post by **Rebel** » Sun Aug 12, 2012 10:39 pm

chrisw wrote:
Rebel wrote:
lkaufman wrote:
Rebel wrote:
Don wrote:
Rebel wrote: And another fun expiriment, the "best" full static eval. No QS but in the middle of the chaos on the board with multiple hanging pieces and counter attacks one evaluates and checking moves needs special code to see if the move is not mate, cool.

So we have 2 challenges, the best dynamic eval (with QS) and the best static eval. Now we need participants
This is just a silly experiment. Probably all the top programs would lose badly to any program that make an attempt to resolve tactics in some way. I'm not going to devote the next week to fixing up Komodo to win some competition that has nothing to do with real chess skill.

There is a contest you can have right now - just test several programs doing a 1 ply search. I don't know what it will prove - but it's a lot better than trying to design an experiment that requires everyone except Vincent to dumb down their search (presumably to prove that Vincent really does have the best program of all.)
Then don't make statements you can not proof

New idea:

1. depth=8 (9 or 10) match. As long as it is fast so many games can be played.
2. Brute force
3. No extensions.
4. Standard QS.

So 100% equal search.

Not much work involved, in my case (2) and (3) are parameter driven and (4) I don't expect more than an hour work.
The winning program will be the one that does the most tactical work in the eval function, so probably Diep. So what? This tells us nothing about which program finds better moves in non-tactical positions, which is (or should be) the relevant questiion.
Quite funny you think Diep would win such a match, I would put my money on Komodo. One requirement of a top-engine programmer is accuracy and punctuality. I see that in Don's postings, I don't see that quality in Vincent's postings. I think it matters.
Interesting. I think we should not underestimate Vincent, he is one of the smartest guys here. His problem, and his strength, is the great speed with which he sees things. He can't wait for the rest of the world to catch up and he makes blind leaps into false conclusions, when a little more thought would lead him to stop and consider other possibilities. Its as if Vincent plays the world as if he plays blitz chess, gotta make a move, any move, all that matters is speed and who makes the final mistake, when often it is better to stop and think and sometimes do nothing.

The idea that we can see our failures and flaws when designing complex monsters such as chess programs within our written texts and posts is cool. I see your strengths as attention to details, determination and preparedness for hard work. For example, I am far too lazy to have done all that web page work on the Rybka case, but you not, and it worked well.

Vincent is like Lucky Luke, he thinks and programs faster than his shadow. IMO with someone looking over his shoulder Diep would be a lot stronger. Chess programming most of the time is a lonely experience, a sounding board, a person who isn't afraid to confront you with the errors you are not aware off yourself is a valuable treasure. In my case those were Jan Louwman in the beginning and Jeroen later.

I won't go into your flaws though

You won't beat the list my wife maintains

Don · Post by **Don** » Sun Aug 12, 2012 11:50 pm

Houdini wrote:
Uri Blass wrote:Based on this logic houdini1.5 64 bits has a better evaluation than houdini1.5 32 bits because houdini1.5 64 bits wins at very fast time control.
My main point was that Don made a foolish claim about using tactical test suite results to deduce the quality of the evaluation function. It's just marketing drivel which shouldn't appear in a technical thread.

I did not make any claim. Here is what I said:

I have written many time that I BELIEVE we have the best positional program in the world. There is no test that can prove that I am right or wrong. I base this on the fact that we are one of the top 2 programs and yet we are probably only top 10 in tactical problem sets. We must be doing something right.

So I did not make ANY claims about ANYTHING here and in fact I draw attention to the fact that this is just a hypothesis. I don't see any reason for you to take this shot as I have been nice to you in this thread.

My secondary point was that it's pointless to compare "quality of evaluation function" between different engines, one would need to remove the search completely. It's a meaningless concept - the only relevant issue is the Elo strength of the engine.

Robert

Thank you. That is EXACTLY what I keep saying. So finally we agree on something? The one ply search competition would be a meaningless competition.

Having said that, I cannot help but wonder how well Diep would do - as irrelevant as it would actually be he claims that his program would crush everyone else at 1 ply due to massive knowledge engineering.

bob · Post by **bob** » Mon Aug 13, 2012 12:26 am

lkaufman wrote:
bob wrote: You are paying too much attention to the "L" in LMR. I was taking about a MR instead, where you reduce a move based on some static analysis of the move itself, rather than how the move compares to the rest of the moves in the list. I've not been a big fan of LMR, due to one specific example where all moves are good, equally good, yet the "L" causes those appearing later in the list to be reduced more than those up front. My limited approach to this has been simple static rules as to whether a move should be reduced or not, regardless of whether it is sorted first or last. I've discovered, during testing, that one can safely reduce checks, if reasonable static analysis is done (for example, Qxh7+ where the queen is undefended and instantly lost is very safe to reduce.
This moves the discussion back to ideas for improving our engines. We never got good results for reducing "bad-SEE" checks. I'm wondering why it works for you. Do you have further conditions beyond "bad-SEE"? What was the typical search depth in the tests that showed a benefit for reducing checks? How much elo gain did you observe?

The elo gain was not major. In looking at old results, it was +6, with a small error bar. My rule is simply "Extend SEE-safe checks. Do not extend the rest, and reduce them as well...

This was tested on fast games, and then on 5'+5" games. NO idea what the average depth was, but for the 5 minute+5sec inc games, it was not real shallow... Elo gain was stable across both so I did not test at very long games as a 30K match takes 2 weeks at 60+60.

bob · Post by **bob** » Mon Aug 13, 2012 12:28 am

chrisw wrote:
bob wrote:
chrisw wrote:
Don wrote:
lkaufman wrote:
Rebel wrote:
Don wrote: I think you are ducking a REAL match. Trying to make a 1 ply match happen is way down on my list of priorities and I would not allow myself to be distracted by such a thing.
Don,

On several occasions you have said Komodo has the best eval in the world. I think you should proof it now that you have a challenger.

In good old Rome tradition we want to see the gladiators blood flow
We have a different definition of eval than Vince. He refers to the eval function, while we are talking about eval in positions where search won't help appreciably. Probably Diep has a better eval function, because it gives up search depth for evaluating tactics. We claim that Komodo has the best eval when tactics don't matter, and I don't know of a way to prove this. When tactics do matter, search depth is extremely important, and comparing us on equal depth to Diep has no value.
Just to illustrate how differently our definition really is, Vincent proposes to "prove" which program has the most sophisticated evaluation function by doing a 1 ply search.

As almost anyone in this field knows, a 1 ply search is completely dominated by tactics and fine positional understanding is almost completely irrelevant.

I'm trying really hard to parse this sentence.

"as almost anyone in the field knows" is an attempt to democratise and thus legitimise the text which follows. Except that unverifiable woffle can't do that.

"a 1 ply search is completely dominated by tactics", actually what does this mean? A one ply search has no tactics? A one ply search would be overwhelmed by an N ply search? The game would appear to be tactical? No reason why it should be. "Completely" sounds strong, but with what justification? "Dominated"? Again strong word, but what justification? Heavy adjectival use but no backup for them. Are you in marketing mode?

"fine positional understanding is almost completely irrelevant" Is it really? Well you qualified "completely" this time, so you are not too sure it seems. Actually positional understanding seems perfectly relevent, what would you suggest as an alternative? Random understanding? Partially right and partially wrong understanding? Would they be better?

Any yet he believes that is a legitimate test of how strong a program is in the positional sense.
False straw man. It's a legitimate test of the evaluation function. It really is intellectual cheating to switch his argument from "eval" to the "whole program", "in a positional sense" (what does that mean btw?) and then attack that. Don't you think?

I can only say that his definition of what is positional is different than ours.

it would be different when your positional definition keeps changing at will. From part (the non-tactical) to ALL (see below), for example

I think the best test is to simply play a long match at long time controls.

yes, that finds the stongest program, but this thread is about strongest eval function. Anyway, you won't change your tune, for why would a marketeer enter a competition to find scientific truth which by its nature runs the risk of his product appearing dumb?

The program that wins is playing the best chess and we don't have to debate "positional" vs "tactical" play, as has been pointed out it is ALL positional play, right?

Don
All this "experiment" proves is "who wrote the most complex eval, including several different tactical motifs such as pins, trapped pieces and such?"

One can evaluate pins, or let the search "evaluate" them. Ditto for trapped pieces. So this "test" doesn't find the "best" evaluation. It finds the most complex one, that likely has many bugs due to the complexity.

You've been on this "heavy eval" them for years, calling it by different names including "new paradigm." Yet the original "bean-counter" approach is STILL at the top of the performance heap. Most of us understand exactly why this is. Chess is a combination of search AND evaluation. It won't be dominated by an "evaluation only" program any more than a GM player will never rely on search and just make all his moves based on instant evaluation.

This experiment is flawed in exactly the same way as fixed-depth comparisons between different programs are flawed. No one wants to use SEE inside the evaluation for every piece and pawn, yet this would be a good idea for an eval-only 1-ply - no q-search test like this... What would it prove???
Yes, I have indeed been on about "heavy eval" for years. Curious now how my "heavy eval" from early 1990s is the basis of Fruit. I think you use it too now.

On GM play, the sad truth is that you have no idea.

Fruit? Heavy Eval? You need to stop using whatever recreational drugs/beverages you are using. Fruit is typical bean-counter fast. You will find zero tactical analysis in Fruit's eval, nor in mine. Everything is designed for speed in BOTH. Hence the NPS numbers they produce.

As far as GM play goes, you might try talking to Dzhindi a bit to see what I do/don't know... One never stops learning.

bob · Post by **bob** » Mon Aug 13, 2012 12:34 am

chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
diep wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?

kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?

Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?

Oh comeon we can call everything tactical.

I want a 1 ply match

Ed?
Make some noise!
Completely agree with Vincent. Only beancounter programmers would oppose Ed's idea, always using the same false dichotomy, search=tactics, eval=positional. Nonsense of course. I'ld take it further, ban the QS, which can contain all manner of check search tricks btw and force the beancounters to write a SEE. Then we'll see how really crap their evals are

One way you can also test btw, is put the zero search program onto ICC and test it against rated players. Then shoot any programmer who can't get 2000 ELO out of raw evaluation only.
There is a major flaw in your reasoning.

There may be but your rambling texts do not show it

You are going back to the 70's,

No I am not, I was suggesting a static eval comparison

when the mantra was "you must do chess on a computer like a human does it." Problem was, then, and still is, now, "We don't know HOW a human plays chess."

Well, you may not but I used to play chess rather well, and I know how I played, well enough to design an evaluation function based on my own play style

"playing chess rather well" and "knowing HOW you make chess decisions" are NOT the same thing. I have a good friend that STILL plays chess "rather well" (at the GM level for those that know who I am talking about.) and he STILL can not tell my why this move is better than that move in many cases. "It just is". So don't give me this "I know how I do it" because you don't. If we DID know how humans played chess, computers would be absolutely unbeatable today, period. Because of the speed advantage silicon has over electro-chemical reactions. But we don't know, YET.

So saying "no search" is a meaningless constraint.

which itself is a meaningless non-sequitur. What's up with you?

Not to mention the obvious dichotomy where one can write a complex eval, or use search to fill in issues the eval doesn't handle well, and either should eventually reach exactly the same level of skill.

That is not a dichotomy. If you want to copy my language usage, perhaps a dictionary would be useful first?

Pick up a dictionary:
Noun:

1. A division or contrast between two things that are or are represented as being opposed or entirely different.

Search and eval are completely different issues.

But with computers, it is easier to rely on recursive high-speed stuff rather than on overly complex code that contains too many bugs to ever work well..

this might be true for an old-style software engineer who generates buggy code, but working with Japanese taught me that it is possible to produce complex code that works well and does more, reliably. It's all a question of testing and quality control. Ask Sony.
Don't need to "ask". I read enough software engineering horror stories when teaching a SE course here. Given the choice of complex or simple, simple wins every time when they are equivalent in final results...
We do know how humans play chess and why they play particular moves. Computer engineers are not very good at understanding this, and appear to have very little idea of how to implement, but the idea that a GM plays a move because "it just is" is nonsense. He may say that to you, rather like I would say the same to my wife if she asked me to explain "exponential squash function", I can explain but no explanation will get through the lack of lower knowledge needed.

Total nonsense, as anyone in the neurosciences will tell you. We barely understand the basic processes that go on inside the brain. Barely.

What you described is not a dichotomy. A dichotomy is any splitting of a whole into exactly two non-overlapping parts, meaning it is a procedure in which a whole is divided into two parts. It is a partition of a whole (or a set) into two parts (subsets) that are:
jointly exhaustive: everything must belong to one part or the other, and
mutually exclusive: nothing can belong simultaneously to both parts.
Which is why I previously termed it a FALSE dichotomy. Tactical/positional are neither jointly exhaustive nor mutually exclusive. Look it up again.

Don't need to. Some things are solved via search. Some are solved via evaluation. RARELY can something be solved by either, with equal effectiveness and efficiency. Close enough to a dichotomy for me...

I won't bother arguing Japanese QC and engineering complexity, but suffice to say, your attitude is strikingly similar to the old fashioned car and motorbike designers who were comprehensively wiped out by Japanese engineering.

I am just aware of the very simple principle, "Given two options, one complex and one simple, choose the simple option unless there is a significant advantage to using the other." In the case of hanging pieces and such, search is the trick. In the case of positional concepts such as pawn structure, evaluation is the trick.

Houdini · Post by **Houdini** » Mon Aug 13, 2012 12:36 am

Don wrote:I did not make any claim. Here is what I said:

I have written many time that I BELIEVE we have the best positional program in the world. There is no test that can prove that I am right or wrong. I base this on the fact that we are one of the top 2 programs and yet we are probably only top 10 in tactical problem sets. We must be doing something right.

So I did not make ANY claims about ANYTHING here and in fact I draw attention to the fact that this is just a hypothesis. I don't see any reason for you to take this shot as I have been nice to you in this thread.

If you don't want your believes to be interpreted as claims, don't talk about them in technical threads but keep them for the marketing brochure.

But the issue is not whether you made a claim or expressed a believe - that's just semantics. The issue was the sentence that I put in bold and which doesn't make sense, as you acknowledged in a later post.

Robert

Don · Post by **Don** » Mon Aug 13, 2012 12:44 am

Houdini wrote:
Don wrote:I did not make any claim. Here is what I said:

I have written many time that I BELIEVE we have the best positional program in the world. There is no test that can prove that I am right or wrong. I base this on the fact that we are one of the top 2 programs and yet we are probably only top 10 in tactical problem sets. We must be doing something right.

So I did not make ANY claims about ANYTHING here and in fact I draw attention to the fact that this is just a hypothesis. I don't see any reason for you to take this shot as I have been nice to you in this thread.
If you don't want your believes to be interpreted as claims, don't talk about them in technical threads but keep them for the marketing brochure.

But the issue is not whether you made a claim or expressed a believe - that's just semantics. The issue was the sentence that I put in bold and which doesn't make sense, as you acknowledged in a later post.

Robert

I can put this another way. I DO believe our evaluation function is one of the best and possibly the best. I do not feel nearly as confident about the search. It's just a belief that I cannot back up nor can I construct any test to prove it and I agree that it's difficult to even define in formal terms but I have no reason to make an issue out of this either way. In either case I would be admitting that something could be better and a lot of people want Komodo to be better at tactics. So I don't understand why you think that claiming Komodo's tactics is not up to par is marketing hype. Would you put something like that on your site? Would you say, "Get Houdini now! Tactics not quite up to par - but it does other stuff really well." ????

So it really escapes me how this can be interpreted as some sort of sales pitch.

Rebel · Post by **Rebel** » Mon Aug 13, 2012 12:53 am

Don wrote:
Houdini wrote:
Uri Blass wrote:Based on this logic houdini1.5 64 bits has a better evaluation than houdini1.5 32 bits because houdini1.5 64 bits wins at very fast time control.
My main point was that Don made a foolish claim about using tactical test suite results to deduce the quality of the evaluation function. It's just marketing drivel which shouldn't appear in a technical thread.
I did not make any claim. Here is what I said:

I have written many time that I BELIEVE we have the best positional program in the world. There is no test that can prove that I am right or wrong. I base this on the fact that we are one of the top 2 programs and yet we are probably only top 10 in tactical problem sets. We must be doing something right.

So I did not make ANY claims about ANYTHING here and in fact I draw attention to the fact that this is just a hypothesis. I don't see any reason for you to take this shot as I have been nice to you in this thread.

My secondary point was that it's pointless to compare "quality of evaluation function" between different engines, one would need to remove the search completely. It's a meaningless concept - the only relevant issue is the Elo strength of the engine.

Robert
Thank you. That is EXACTLY what I keep saying. So finally we agree on something? The one ply search competition would be a meaningless competition.

Meaningless in my view is inaccurate, a better word would be inconclusive provided a 1 ply match would end between 40-60%. With a classic 10 ply brute force search I would draw the inconclusive line at 45-55%.

Having said that, I cannot help but wonder how well Diep would do - as irrelevant as it would actually be he claims that his program would crush everyone else at 1 ply due to massive knowledge engineering.

I can not help but wonder how well

you are undermining your own initial statement in this discussion. Take the challenge and win, you will I am sure.

syzygy · Post by **syzygy** » Mon Aug 13, 2012 1:21 am

Don wrote:I can put this another way. I DO believe our evaluation function is one of the best and possibly the best.

"The best" relative to what... How do you compare the strength of two evaluation functions of two different engines?

How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?