How effective is move ordering from TT?

lkaufman · Post by **lkaufman** » Sun Aug 12, 2012 6:35 pm

bob wrote: You are paying too much attention to the "L" in LMR. I was taking about a MR instead, where you reduce a move based on some static analysis of the move itself, rather than how the move compares to the rest of the moves in the list. I've not been a big fan of LMR, due to one specific example where all moves are good, equally good, yet the "L" causes those appearing later in the list to be reduced more than those up front. My limited approach to this has been simple static rules as to whether a move should be reduced or not, regardless of whether it is sorted first or last. I've discovered, during testing, that one can safely reduce checks, if reasonable static analysis is done (for example, Qxh7+ where the queen is undefended and instantly lost is very safe to reduce.

This moves the discussion back to ideas for improving our engines. We never got good results for reducing "bad-SEE" checks. I'm wondering why it works for you. Do you have further conditions beyond "bad-SEE"? What was the typical search depth in the tests that showed a benefit for reducing checks? How much elo gain did you observe?

bob · Post by **bob** » Sun Aug 12, 2012 6:39 pm

chrisw wrote:
bob wrote:
chrisw wrote:
diep wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?

kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?

Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?

Oh comeon we can call everything tactical.

I want a 1 ply match

Ed?
Make some noise!
Completely agree with Vincent. Only beancounter programmers would oppose Ed's idea, always using the same false dichotomy, search=tactics, eval=positional. Nonsense of course. I'ld take it further, ban the QS, which can contain all manner of check search tricks btw and force the beancounters to write a SEE. Then we'll see how really crap their evals are

One way you can also test btw, is put the zero search program onto ICC and test it against rated players. Then shoot any programmer who can't get 2000 ELO out of raw evaluation only.
There is a major flaw in your reasoning.

There may be but your rambling texts do not show it

You are going back to the 70's,

No I am not, I was suggesting a static eval comparison

when the mantra was "you must do chess on a computer like a human does it." Problem was, then, and still is, now, "We don't know HOW a human plays chess."

Well, you may not but I used to play chess rather well, and I know how I played, well enough to design an evaluation function based on my own play style

"playing chess rather well" and "knowing HOW you make chess decisions" are NOT the same thing. I have a good friend that STILL plays chess "rather well" (at the GM level for those that know who I am talking about.) and he STILL can not tell my why this move is better than that move in many cases. "It just is". So don't give me this "I know how I do it" because you don't. If we DID know how humans played chess, computers would be absolutely unbeatable today, period. Because of the speed advantage silicon has over electro-chemical reactions. But we don't know, YET.

So saying "no search" is a meaningless constraint.

which itself is a meaningless non-sequitur. What's up with you?

Not to mention the obvious dichotomy where one can write a complex eval, or use search to fill in issues the eval doesn't handle well, and either should eventually reach exactly the same level of skill.

That is not a dichotomy. If you want to copy my language usage, perhaps a dictionary would be useful first?

Pick up a dictionary:
Noun:

1. A division or contrast between two things that are or are represented as being opposed or entirely different.

Search and eval are completely different issues.

But with computers, it is easier to rely on recursive high-speed stuff rather than on overly complex code that contains too many bugs to ever work well..

this might be true for an old-style software engineer who generates buggy code, but working with Japanese taught me that it is possible to produce complex code that works well and does more, reliably. It's all a question of testing and quality control. Ask Sony.

Don't need to "ask". I read enough software engineering horror stories when teaching a SE course here. Given the choice of complex or simple, simple wins every time when they are equivalent in final results...

Uri Blass · Post by **Uri Blass** » Sun Aug 12, 2012 6:57 pm

Houdini wrote:
Don wrote:I have written many time that I BELIEVE we have the best positional program in the world. There is no test that can prove that I am right or wrong. I base this on the fact that we are one of the top 2 programs and yet we are probably only top 10 in tactical problem sets. We must be doing something right.
Apparently you don't understand your own engine .

Being poor in tactics but having a strong engine over-all doesn't demonstrate the quality of the evaluation, it's a by-product of the LMR and null move reductions. Tactics are based on playing non-obvious, apparently unsound moves. If you LMR/NMR much, you'll miss tactics, it's as simple as that.
Stockfish is, probably to an even higher degree than Komodo, relatively poor in tactical tests but very good over-all, for exactly the same reason.

Instead I would measure the quality of the evaluation function by the performance at very fast TC. If you take out most of the search, what remains is evaluation.

Robert

Based on this logic houdini1.5 64 bits has a better evaluation than houdini1.5 32 bits because houdini1.5 64 bits wins at very fast time control.

chrisw · Post by **chrisw** » Sun Aug 12, 2012 7:19 pm

bob wrote:
chrisw wrote:
Don wrote:
lkaufman wrote:
Rebel wrote:
Don wrote: I think you are ducking a REAL match. Trying to make a 1 ply match happen is way down on my list of priorities and I would not allow myself to be distracted by such a thing.
Don,

On several occasions you have said Komodo has the best eval in the world. I think you should proof it now that you have a challenger.

In good old Rome tradition we want to see the gladiators blood flow
We have a different definition of eval than Vince. He refers to the eval function, while we are talking about eval in positions where search won't help appreciably. Probably Diep has a better eval function, because it gives up search depth for evaluating tactics. We claim that Komodo has the best eval when tactics don't matter, and I don't know of a way to prove this. When tactics do matter, search depth is extremely important, and comparing us on equal depth to Diep has no value.
Just to illustrate how differently our definition really is, Vincent proposes to "prove" which program has the most sophisticated evaluation function by doing a 1 ply search.

As almost anyone in this field knows, a 1 ply search is completely dominated by tactics and fine positional understanding is almost completely irrelevant.

I'm trying really hard to parse this sentence.

"as almost anyone in the field knows" is an attempt to democratise and thus legitimise the text which follows. Except that unverifiable woffle can't do that.

"a 1 ply search is completely dominated by tactics", actually what does this mean? A one ply search has no tactics? A one ply search would be overwhelmed by an N ply search? The game would appear to be tactical? No reason why it should be. "Completely" sounds strong, but with what justification? "Dominated"? Again strong word, but what justification? Heavy adjectival use but no backup for them. Are you in marketing mode?

"fine positional understanding is almost completely irrelevant" Is it really? Well you qualified "completely" this time, so you are not too sure it seems. Actually positional understanding seems perfectly relevent, what would you suggest as an alternative? Random understanding? Partially right and partially wrong understanding? Would they be better?

Any yet he believes that is a legitimate test of how strong a program is in the positional sense.
False straw man. It's a legitimate test of the evaluation function. It really is intellectual cheating to switch his argument from "eval" to the "whole program", "in a positional sense" (what does that mean btw?) and then attack that. Don't you think?

I can only say that his definition of what is positional is different than ours.

it would be different when your positional definition keeps changing at will. From part (the non-tactical) to ALL (see below), for example

I think the best test is to simply play a long match at long time controls.

yes, that finds the stongest program, but this thread is about strongest eval function. Anyway, you won't change your tune, for why would a marketeer enter a competition to find scientific truth which by its nature runs the risk of his product appearing dumb?

The program that wins is playing the best chess and we don't have to debate "positional" vs "tactical" play, as has been pointed out it is ALL positional play, right?

Don
All this "experiment" proves is "who wrote the most complex eval, including several different tactical motifs such as pins, trapped pieces and such?"

One can evaluate pins, or let the search "evaluate" them. Ditto for trapped pieces. So this "test" doesn't find the "best" evaluation. It finds the most complex one, that likely has many bugs due to the complexity.

You've been on this "heavy eval" them for years, calling it by different names including "new paradigm." Yet the original "bean-counter" approach is STILL at the top of the performance heap. Most of us understand exactly why this is. Chess is a combination of search AND evaluation. It won't be dominated by an "evaluation only" program any more than a GM player will never rely on search and just make all his moves based on instant evaluation.

This experiment is flawed in exactly the same way as fixed-depth comparisons between different programs are flawed. No one wants to use SEE inside the evaluation for every piece and pawn, yet this would be a good idea for an eval-only 1-ply - no q-search test like this... What would it prove???

Yes, I have indeed been on about "heavy eval" for years. Curious now how my "heavy eval" from early 1990s is the basis of Fruit. I think you use it too now.

On GM play, the sad truth is that you have no idea.

syzygy · Post by **syzygy** » Sun Aug 12, 2012 7:29 pm

Uri Blass wrote:
Houdini wrote:
Don wrote:I have written many time that I BELIEVE we have the best positional program in the world. There is no test that can prove that I am right or wrong. I base this on the fact that we are one of the top 2 programs and yet we are probably only top 10 in tactical problem sets. We must be doing something right.
Apparently you don't understand your own engine .

Being poor in tactics but having a strong engine over-all doesn't demonstrate the quality of the evaluation, it's a by-product of the LMR and null move reductions. Tactics are based on playing non-obvious, apparently unsound moves. If you LMR/NMR much, you'll miss tactics, it's as simple as that.
Stockfish is, probably to an even higher degree than Komodo, relatively poor in tactical tests but very good over-all, for exactly the same reason.

Instead I would measure the quality of the evaluation function by the performance at very fast TC. If you take out most of the search, what remains is evaluation.

Robert
Based on this logic houdini1.5 64 bits has a better evaluation than houdini1.5 32 bits because houdini1.5 64 bits wins at very fast time control.

And of course it has, because its evaluation function is faster

Houdini · Post by **Houdini** » Sun Aug 12, 2012 7:34 pm

Uri Blass wrote:Based on this logic houdini1.5 64 bits has a better evaluation than houdini1.5 32 bits because houdini1.5 64 bits wins at very fast time control.

My main point was that Don made a foolish claim about using tactical test suite results to deduce the quality of the evaluation function. It's just marketing drivel which shouldn't appear in a technical thread.

My secondary point was that it's pointless to compare "quality of evaluation function" between different engines, one would need to remove the search completely. It's a meaningless concept - the only relevant issue is the Elo strength of the engine.

Robert

chrisw · Post by **chrisw** » Sun Aug 12, 2012 7:34 pm

bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
diep wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?

kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?

Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?

Oh comeon we can call everything tactical.

I want a 1 ply match

Ed?
Make some noise!
Completely agree with Vincent. Only beancounter programmers would oppose Ed's idea, always using the same false dichotomy, search=tactics, eval=positional. Nonsense of course. I'ld take it further, ban the QS, which can contain all manner of check search tricks btw and force the beancounters to write a SEE. Then we'll see how really crap their evals are

One way you can also test btw, is put the zero search program onto ICC and test it against rated players. Then shoot any programmer who can't get 2000 ELO out of raw evaluation only.
There is a major flaw in your reasoning.

There may be but your rambling texts do not show it

You are going back to the 70's,

No I am not, I was suggesting a static eval comparison

when the mantra was "you must do chess on a computer like a human does it." Problem was, then, and still is, now, "We don't know HOW a human plays chess."

Well, you may not but I used to play chess rather well, and I know how I played, well enough to design an evaluation function based on my own play style

"playing chess rather well" and "knowing HOW you make chess decisions" are NOT the same thing. I have a good friend that STILL plays chess "rather well" (at the GM level for those that know who I am talking about.) and he STILL can not tell my why this move is better than that move in many cases. "It just is". So don't give me this "I know how I do it" because you don't. If we DID know how humans played chess, computers would be absolutely unbeatable today, period. Because of the speed advantage silicon has over electro-chemical reactions. But we don't know, YET.

So saying "no search" is a meaningless constraint.

which itself is a meaningless non-sequitur. What's up with you?

Not to mention the obvious dichotomy where one can write a complex eval, or use search to fill in issues the eval doesn't handle well, and either should eventually reach exactly the same level of skill.

That is not a dichotomy. If you want to copy my language usage, perhaps a dictionary would be useful first?

Pick up a dictionary:
Noun:

1. A division or contrast between two things that are or are represented as being opposed or entirely different.

Search and eval are completely different issues.

But with computers, it is easier to rely on recursive high-speed stuff rather than on overly complex code that contains too many bugs to ever work well..

this might be true for an old-style software engineer who generates buggy code, but working with Japanese taught me that it is possible to produce complex code that works well and does more, reliably. It's all a question of testing and quality control. Ask Sony.
Don't need to "ask". I read enough software engineering horror stories when teaching a SE course here. Given the choice of complex or simple, simple wins every time when they are equivalent in final results...

We do know how humans play chess and why they play particular moves. Computer engineers are not very good at understanding this, and appear to have very little idea of how to implement, but the idea that a GM plays a move because "it just is" is nonsense. He may say that to you, rather like I would say the same to my wife if she asked me to explain "exponential squash function", I can explain but no explanation will get through the lack of lower knowledge needed.

What you described is not a dichotomy. A dichotomy is any splitting of a whole into exactly two non-overlapping parts, meaning it is a procedure in which a whole is divided into two parts. It is a partition of a whole (or a set) into two parts (subsets) that are:
jointly exhaustive: everything must belong to one part or the other, and
mutually exclusive: nothing can belong simultaneously to both parts.
Which is why I previously termed it a FALSE dichotomy. Tactical/positional are neither jointly exhaustive nor mutually exclusive. Look it up again.

I won't bother arguing Japanese QC and engineering complexity, but suffice to say, your attitude is strikingly similar to the old fashioned car and motorbike designers who were comprehensively wiped out by Japanese engineering.

Rebel · Post by **Rebel** » Sun Aug 12, 2012 7:51 pm

lkaufman wrote:
Rebel wrote:
Don wrote:
Rebel wrote: And another fun expiriment, the "best" full static eval. No QS but in the middle of the chaos on the board with multiple hanging pieces and counter attacks one evaluates and checking moves needs special code to see if the move is not mate, cool.

So we have 2 challenges, the best dynamic eval (with QS) and the best static eval. Now we need participants
This is just a silly experiment. Probably all the top programs would lose badly to any program that make an attempt to resolve tactics in some way. I'm not going to devote the next week to fixing up Komodo to win some competition that has nothing to do with real chess skill.

There is a contest you can have right now - just test several programs doing a 1 ply search. I don't know what it will prove - but it's a lot better than trying to design an experiment that requires everyone except Vincent to dumb down their search (presumably to prove that Vincent really does have the best program of all.)
Then don't make statements you can not proof

New idea:

1. depth=8 (9 or 10) match. As long as it is fast so many games can be played.
2. Brute force
3. No extensions.
4. Standard QS.

So 100% equal search.

Not much work involved, in my case (2) and (3) are parameter driven and (4) I don't expect more than an hour work.
The winning program will be the one that does the most tactical work in the eval function, so probably Diep. So what? This tells us nothing about which program finds better moves in non-tactical positions, which is (or should be) the relevant questiion.

Quite funny you think Diep would win such a match, I would put my money on Komodo. One requirement of a top-engine programmer is accuracy and punctuality. I see that in Don's postings, I don't see that quality in Vincent's postings. I think it matters.

Jan Brouwer · Post by **Jan Brouwer** » Sun Aug 12, 2012 8:10 pm

Rebel wrote:Quite funny you think Diep would win such a match, I would put my money on Komodo. One requirement of a top-engine programmer is accuracy and punctuality. I see that in Don's postings, I don't see that quality in Vincent's postings. I think it matters.

Agreed!
Vincent regularly complains about the complexity (== bugs) and mis-tuned weights of his evaluation function.
Komodo has fought its way to the top with a methodical approach, so should be relatively bug-free.
My money would also be on Komodo, even with possible tactical handicap.

Come on, let's all agree that such a match proves nothing, and do it.

chrisw · Post by **chrisw** » Sun Aug 12, 2012 8:11 pm

Rebel wrote:
lkaufman wrote:
Rebel wrote:
Don wrote:
Rebel wrote: And another fun expiriment, the "best" full static eval. No QS but in the middle of the chaos on the board with multiple hanging pieces and counter attacks one evaluates and checking moves needs special code to see if the move is not mate, cool.

So we have 2 challenges, the best dynamic eval (with QS) and the best static eval. Now we need participants
This is just a silly experiment. Probably all the top programs would lose badly to any program that make an attempt to resolve tactics in some way. I'm not going to devote the next week to fixing up Komodo to win some competition that has nothing to do with real chess skill.

There is a contest you can have right now - just test several programs doing a 1 ply search. I don't know what it will prove - but it's a lot better than trying to design an experiment that requires everyone except Vincent to dumb down their search (presumably to prove that Vincent really does have the best program of all.)
Then don't make statements you can not proof

New idea:

1. depth=8 (9 or 10) match. As long as it is fast so many games can be played.
2. Brute force
3. No extensions.
4. Standard QS.

So 100% equal search.

Not much work involved, in my case (2) and (3) are parameter driven and (4) I don't expect more than an hour work.
The winning program will be the one that does the most tactical work in the eval function, so probably Diep. So what? This tells us nothing about which program finds better moves in non-tactical positions, which is (or should be) the relevant questiion.
Quite funny you think Diep would win such a match, I would put my money on Komodo. One requirement of a top-engine programmer is accuracy and punctuality. I see that in Don's postings, I don't see that quality in Vincent's postings. I think it matters.

Interesting. I think we should not underestimate Vincent, he is one of the smartest guys here. His problem, and his strength, is the great speed with which he sees things. He can't wait for the rest of the world to catch up and he makes blind leaps into false conclusions, when a little more thought would lead him to stop and consider other possibilities. Its as if Vincent plays the world as if he plays blitz chess, gotta make a move, any move, all that matters is speed and who makes the final mistake, when often it is better to stop and think and sometimes do nothing.

The idea that we can see our failures and flaws when designing complex monsters such as chess programs within our written texts and posts is cool. I see your strengths as attention to details, determination and preparedness for hard work. For example, I am far too lazy to have done all that web page work on the Rybka case, but you not, and it worked well. I won't go into your flaws though

How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?