How effective is move ordering from TT?

diep · Post by **diep** » Sun Aug 12, 2012 2:32 pm

Don wrote:
Rebel wrote: And another fun expiriment, the "best" full static eval. No QS but in the middle of the chaos on the board with multiple hanging pieces and counter attacks one evaluates and checking moves needs special code to see if the move is not mate, cool.

So we have 2 challenges, the best dynamic eval (with QS) and the best static eval. Now we need participants
This is just a silly experiment. Probably all the top programs would lose badly to any program that make an attempt to resolve tactics in some way. I'm not going to devote the next week to fixing up Komodo to win some competition that has nothing to do with real chess skill.

There is a contest you can have right now - just test several programs doing a 1 ply search. I don't know what it will prove - but it's a lot better than trying to design an experiment that requires everyone except Vincent to dumb down their search (presumably to prove that Vincent really does have the best program of all.)

Counting material is not tactics?

by the way... I don't make moves in diep's evaluation. I EVALUATE. Just like a human is doing that as well.

But we already know you try to talk your way out with an evaluation that looks a lot like the other derivates, which i classify as beancounters.

Don · Post by **Don** » Sun Aug 12, 2012 2:43 pm

diep wrote:
Don wrote:
Rebel wrote: And another fun expiriment, the "best" full static eval. No QS but in the middle of the chaos on the board with multiple hanging pieces and counter attacks one evaluates and checking moves needs special code to see if the move is not mate, cool.

So we have 2 challenges, the best dynamic eval (with QS) and the best static eval. Now we need participants
This is just a silly experiment. Probably all the top programs would lose badly to any program that make an attempt to resolve tactics in some way. I'm not going to devote the next week to fixing up Komodo to win some competition that has nothing to do with real chess skill.

There is a contest you can have right now - just test several programs doing a 1 ply search. I don't know what it will prove - but it's a lot better than trying to design an experiment that requires everyone except Vincent to dumb down their search (presumably to prove that Vincent really does have the best program of all.)
Counting material is not tactics?

If you want my cooperation then we should just play a head to head match. We can use any time control you choose. I am not interested in some highly artificial contest that has nothing to do with real chess.

Why won't you play a real match with Komodo? You made several posts explaining how horrible Komodo was and how it will make all sorts of blunders in the tree, and yet you won't do anything to back this up.

I want Komodo vs Diep - a REAL match - you choose the time control.

Don

lucasart · Post by **lucasart** » Sun Aug 12, 2012 2:45 pm

how did this thread degenerate into this ? the original question was about move ordering...

Don · Post by **Don** » Sun Aug 12, 2012 2:50 pm

lucasart wrote:how did this thread degenerate into this ? the original question was about move ordering...

It started with Vincent claiming that LMR works better with bad move ordering then several disparaging remarks he made about Komodo.

He attacks other programs but will not expose his own program to any sort of serious test. He is willing to play a 1 ply match if we modify our programs to be weaker at 1 ply.

Houdini · Post by **Houdini** » Sun Aug 12, 2012 2:59 pm

lucasart wrote:how did this thread degenerate into this ? the original question was about move ordering...

Whenever I read Vincent's replies, I have visions of Monty Python and the Holy Grail, with horse sounds made using coconuts.
O, Knights of Nee, we have brought you your shrubbery. May we go now?

Don · Post by **Don** » Sun Aug 12, 2012 3:02 pm

Houdini wrote:
lucasart wrote:how did this thread degenerate into this ? the original question was about move ordering...
Whenever I read Vincent's replies, I have visions of Monty Python and the Holy Grail, with horse sounds made using coconuts.
O, Knights of Nee, we have brought you your shrubbery. May we go now?

Yes! You hit the nail on the head. The humor in the Holy Grail to me was the fact that these characters took themselves so seriously while being so ridiculous.

I hate to admit how many times I have seen that one!

diep · Post by **diep** » Sun Aug 12, 2012 3:04 pm

Don wrote:
lucasart wrote:how did this thread degenerate into this ? the original question was about move ordering...
It started with Vincent claiming that LMR works better with bad move ordering then several disparaging remarks he made about Komodo.

He attacks other programs but will not expose his own program to any sort of serious test. He is willing to play a 1 ply match if we modify our programs to be weaker at 1 ply.

Don't misrepresent me too much.

I'm claiming that LMR searches DEEPER when ordering the most stupid moves first.

Which makes the classic 'depth tests' to see whether something gains search depth, and THEREFORE works, that gets very difficult.

Furthermore Ed proposes to play 1 ply matches. This after your claim that your evaluation function is better, whereas i honestely don't see Komodo 5 evaluation function even 1 penny better than deepsjeng 2011, but well i probably missed another "cut'n pasting isn't legal but doing the same thing is" type thread here.

I said yes to those 1 ply matches to prove which evaluation is better.

Now with a lame excuse you seem to not want to compare evaluation heads on.

Vincent

Don · Post by **Don** » Sun Aug 12, 2012 3:51 pm

diep wrote:
Don wrote:
lucasart wrote:how did this thread degenerate into this ? the original question was about move ordering...
It started with Vincent claiming that LMR works better with bad move ordering then several disparaging remarks he made about Komodo.

He attacks other programs but will not expose his own program to any sort of serious test. He is willing to play a 1 ply match if we modify our programs to be weaker at 1 ply.
Don't misrepresent me too much.

I'm claiming that LMR searches DEEPER when ordering the most stupid moves first.

I don't how this is relevant to anything even if it's true. But you are basically saying that bad move ordering will the make the search faster which seems like an odd thing.

Larry and I have had the same discussion when working on move ordering. We generally assume as a first estimate that if the program does a fixed depth search faster then we have improved the move ordering and thus improved the program. Although that is probably true in the general case it's not 100% obvious to us that it ALWAYS holds. Of course we test everything ANYWAY and don't make too many assumption. The counter hypothesis is that you could possible improve the move ordering in some way that harms LMR slightly - perhaps because certain types of moves are more fragile with respect to LMR and putting them up front shields them from problems even if on average they are slightly weaker.

Which makes the classic 'depth tests' to see whether something gains search depth, and THEREFORE works, that gets very difficult.

You should know that we NEVER base a change on whether it gains depth. If we make some change to pruning and it is now searches 1/2 ply more, we still have no opinion on whether it helped the program. We run actual games and progressively scale up to longer time controls. Due to serious lack of testing resources we have to combine many changes for the longer tests but each version has to be proved (at least statistically) in order to become our new working dev version.

Furthermore Ed proposes to play 1 ply matches. This after your claim that your evaluation function is better, whereas i honestely don't see Komodo 5 evaluation function even 1 penny better than deepsjeng 2011, but well i probably missed another "cut'n pasting isn't legal but doing the same thing is" type thread here.

I said yes to those 1 ply matches to prove which evaluation is better.

Now with a lame excuse you seem to not want to compare evaluation heads on.

Vincent

Ed proposed a 1 ply match to "prove" which program has the best evaluation function but any serious program author knows this is a fools errand. This won't test the general sophistication of the evaluation function - but it would be a good test to see if the evaluation knows that a piece is hanging. In fact it be ONLY about that.

Komodo probably would play a reasonable 1 ply game because it does give small penalties for hanging pieces and even has a term for pins - but it's not designed to be a full static analysis in the style of the ancient program like Sargon 1. As far as I know NONE of the good programs do that any longer as a quies search is far more reliable way to accomplish the same thing. So IF Diep does a full static analysis to avoid any tactical errors (even without a quies search) then it would be hard to win a 1 ply battle, but that doesn't prove your EF is sophisticated in the positional sense.

I have written many time that I BELIEVE we have the best positional program in the world. There is no test that can prove that I am right or wrong. I base this on the fact that we are one of the top 2 programs and yet we are probably only top 10 in tactical problem sets. We must be doing something right.

Whatever the case, it's all semantics. Many have emailed me saying that we are a bit weaker tactically than the other top programs and the assumption is that we are doing something wrong - but it could be that these other programs are doing something wrong. I don't really know.

Do you remember Dick Fosbury? He is the "inventor" of the Fosbury flop - a style of high jumping that nobody else was doing except for him. Was he doing it wrong or was everyone else doing it wrong? He won gold in 68 Olympics and now EVERY high jumper used the Fosbury flop. I bring this up because several of your posts were critical of Komodo's approach and tactical strength and you directly equated that with "weak play", an obsession with tactics that strong players usually get over. 1300 plays are impressed with finding mate in 2 and sometimes you even here them announce it to their opponents in tournaments. Strong players think strategically.

But the truth of the matter is that Komodo is still one the best tactical programs in the world, it just happens that this is not its primary strength. I remember when Ivan Lendl dominated tennis, serve and volley was NOT the strong point of his game but he he didn't suck at it - he could come to the net and probably was as good or better than most of the other players at this but compared to his awesome power and groundstrokes his volley didn't stand out. Whatever he was doing worked quite well as he was one of the most dominant players in the history of tennis. He was at number 1 longer than any player before him.

lkaufman · Post by **lkaufman** » Sun Aug 12, 2012 3:54 pm

Rebel wrote:
lkaufman wrote: We have a different definition of eval than Vince. He refers to the eval function, while we are talking about eval in positions where search won't help appreciably. Probably Diep has a better eval function, because it gives up search depth for evaluating tactics. We claim that Komodo has the best eval when tactics don't matter, and I don't know of a way to prove this. When tactics do matter, search depth is extremely important, and comparing us on equal depth to Diep has no value.
So a redefined claim which you can not proof? What purpose serves that?

Well, it's like claiming to be the best artist or best musician; it is a subjective claim, judged by the opinion of those who see the art or hear the music. So it could be judged by a poll here, if we can trust people to answer only by their own experience and not by whay they have read. It won't work for Diep, because no one has it. If you really want an objective test, I suggest a direct match at a long time control using a testbook with positional openings only, nothing sharp or tactical, and reversing colors. There will still be tactics, so it's not perfect, but it's the best I can suggest.

Houdini · Post by **Houdini** » Sun Aug 12, 2012 4:06 pm

Don wrote:I have written many time that I BELIEVE we have the best positional program in the world. There is no test that can prove that I am right or wrong. I base this on the fact that we are one of the top 2 programs and yet we are probably only top 10 in tactical problem sets. We must be doing something right.

Apparently you don't understand your own engine

.

Being poor in tactics but having a strong engine over-all doesn't demonstrate the quality of the evaluation, it's a by-product of the LMR and null move reductions. Tactics are based on playing non-obvious, apparently unsound moves. If you LMR/NMR much, you'll miss tactics, it's as simple as that.
Stockfish is, probably to an even higher degree than Komodo, relatively poor in tactical tests but very good over-all, for exactly the same reason.

Instead I would measure the quality of the evaluation function by the performance at very fast TC. If you take out most of the search, what remains is evaluation.

Robert

How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?