How effective is move ordering from TT?

lkaufman · Post by **lkaufman** » Mon Aug 13, 2012 4:47 pm

diep wrote: If they don't accept any match as proving anything, then it doesn't make sense to do any test Uri.

If i have a normal match against Komodo, they claim diep is parallel and uses all cores and they just 1, so logically they lose.

On other hand i didn't optimize Diep for single core contests as just forward pruning a lot and super selective search helps you there, as todays evaluations seem to need 20 ply (selective plies - not really comparable with real plies) for the evaluation functions to get the maximum eloscaling and above 20 plies you see most engines hardly win elo each ply.

So the struggle is to get that 20 plies quickly no matter how dubious you search. If you search SMP you can get it at rapid levels if you have a bunch of cores.

In most of those superbullet tests they do nowadays of course no one gets that 20 ply yet.

Okay, Vincent, here is my proposal. If you look at the CCRL 40/40 rating list you will see that Komodo 5 on one core outrates a recent Ivanhoe running on FOUR cores by 13 elo. So let's have a match between Komodo 5 on one core and Diep on FOUR cores. If you win the match it will at least imply that Diep is stronger than Ivanhoe and is therefore one of the top 6 engines. Regarding time limit, I suggest 30' + 30" increment, since repeat time control testing is a big waste of time, playing out dead drawn endings at the same slow pace as the early middlegame. Any fairly short opening book/test suite is fine, as long as you reverse colors with the same opening after each game. Maybe fifty game match?
Note that this test does not require us to do anything, it only requires that you send a version of DIEP to the person who will run the match.
I expect Komodo will win the match. I say this because I believe that if Diep already had the level of Ivanhoe you would be selling it now. If you win the match you can safely go commercial and expect decent sales.

chrisw · Post by **chrisw** » Mon Aug 13, 2012 4:57 pm

bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
Don wrote:
lkaufman wrote:
Rebel wrote:
Don wrote: I think you are ducking a REAL match. Trying to make a 1 ply match happen is way down on my list of priorities and I would not allow myself to be distracted by such a thing.
Don,

On several occasions you have said Komodo has the best eval in the world. I think you should proof it now that you have a challenger.

In good old Rome tradition we want to see the gladiators blood flow
We have a different definition of eval than Vince. He refers to the eval function, while we are talking about eval in positions where search won't help appreciably. Probably Diep has a better eval function, because it gives up search depth for evaluating tactics. We claim that Komodo has the best eval when tactics don't matter, and I don't know of a way to prove this. When tactics do matter, search depth is extremely important, and comparing us on equal depth to Diep has no value.
Just to illustrate how differently our definition really is, Vincent proposes to "prove" which program has the most sophisticated evaluation function by doing a 1 ply search.

As almost anyone in this field knows, a 1 ply search is completely dominated by tactics and fine positional understanding is almost completely irrelevant.

I'm trying really hard to parse this sentence.

"as almost anyone in the field knows" is an attempt to democratise and thus legitimise the text which follows. Except that unverifiable woffle can't do that.

"a 1 ply search is completely dominated by tactics", actually what does this mean? A one ply search has no tactics? A one ply search would be overwhelmed by an N ply search? The game would appear to be tactical? No reason why it should be. "Completely" sounds strong, but with what justification? "Dominated"? Again strong word, but what justification? Heavy adjectival use but no backup for them. Are you in marketing mode?

"fine positional understanding is almost completely irrelevant" Is it really? Well you qualified "completely" this time, so you are not too sure it seems. Actually positional understanding seems perfectly relevent, what would you suggest as an alternative? Random understanding? Partially right and partially wrong understanding? Would they be better?

Any yet he believes that is a legitimate test of how strong a program is in the positional sense.
False straw man. It's a legitimate test of the evaluation function. It really is intellectual cheating to switch his argument from "eval" to the "whole program", "in a positional sense" (what does that mean btw?) and then attack that. Don't you think?

I can only say that his definition of what is positional is different than ours.

it would be different when your positional definition keeps changing at will. From part (the non-tactical) to ALL (see below), for example

I think the best test is to simply play a long match at long time controls.

yes, that finds the stongest program, but this thread is about strongest eval function. Anyway, you won't change your tune, for why would a marketeer enter a competition to find scientific truth which by its nature runs the risk of his product appearing dumb?

The program that wins is playing the best chess and we don't have to debate "positional" vs "tactical" play, as has been pointed out it is ALL positional play, right?

Don
All this "experiment" proves is "who wrote the most complex eval, including several different tactical motifs such as pins, trapped pieces and such?"

One can evaluate pins, or let the search "evaluate" them. Ditto for trapped pieces. So this "test" doesn't find the "best" evaluation. It finds the most complex one, that likely has many bugs due to the complexity.

You've been on this "heavy eval" them for years, calling it by different names including "new paradigm." Yet the original "bean-counter" approach is STILL at the top of the performance heap. Most of us understand exactly why this is. Chess is a combination of search AND evaluation. It won't be dominated by an "evaluation only" program any more than a GM player will never rely on search and just make all his moves based on instant evaluation.

This experiment is flawed in exactly the same way as fixed-depth comparisons between different programs are flawed. No one wants to use SEE inside the evaluation for every piece and pawn, yet this would be a good idea for an eval-only 1-ply - no q-search test like this... What would it prove???
Yes, I have indeed been on about "heavy eval" for years. Curious now how my "heavy eval" from early 1990s is the basis of Fruit. I think you use it too now.

On GM play, the sad truth is that you have no idea.
Fruit? Heavy Eval? You need to stop using whatever recreational drugs/beverages you are using. Fruit is typical bean-counter fast. You will find zero tactical analysis in Fruit's eval, nor in mine. Everything is designed for speed in BOTH. Hence the NPS numbers they produce.

As far as GM play goes, you might try talking to Dzhindi a bit to see what I do/don't know... One never stops learning.

Still with the low level insulting? Drips from your pen as if the most natural thing in he world, doesn't it? For the record, although quite why I should be put into a position to have to deny your smears, I do not do recreational drugs, and I have my own rule which precludes me from posting if I drink wine with dinner.

We understand that your chess knowledge is very poor, that reflects in your personal rating and the lack of original chess ideas in your software. The fact that you "know someone who knows", apart from an irony George Speight will pick up on, will do you no good because your low level of chess knowledge means that a GM will find it almost impossible to find a way past the basic low level. Same problem as trying to explain differential calculus to my wife.

Fruit has exponentially ramped king attack code, testing for queen plus one unit as described by Mark Watkins in evalcomp. Rybka also and it seems now to be general practice, including wrapping in pawn shelter destruction. Even you do this now, but not back in the early 1990s, and nor did anyone else as far as I was able to determine. It may have been in closed source commercials but I did not notice it when looking at games played. Together with detailed material imbalance code, I claim that these ideas were unique to Chess System Tal in the mid 90s and therefore that current evals are a composite of old ideas since the year dot and unique ideas from CSTal, certainly in the eval department and presumably with less mania. I'm not complaining and i don't mind, this is how comp chess moves on.

Ironic though, is that it took your blindness something over ten years to get it into Crafty. Perhaps i should ask for a credit?

diep · Post by **diep** » Mon Aug 13, 2012 5:42 pm

lkaufman wrote:
diep wrote: If they don't accept any match as proving anything, then it doesn't make sense to do any test Uri.

If i have a normal match against Komodo, they claim diep is parallel and uses all cores and they just 1, so logically they lose.

On other hand i didn't optimize Diep for single core contests as just forward pruning a lot and super selective search helps you there, as todays evaluations seem to need 20 ply (selective plies - not really comparable with real plies) for the evaluation functions to get the maximum eloscaling and above 20 plies you see most engines hardly win elo each ply.

So the struggle is to get that 20 plies quickly no matter how dubious you search. If you search SMP you can get it at rapid levels if you have a bunch of cores.

In most of those superbullet tests they do nowadays of course no one gets that 20 ply yet.
Okay, Vincent, here is my proposal. If you look at the CCRL 40/40 rating list you will see that Komodo 5 on one core outrates a recent Ivanhoe running on FOUR cores by 13 elo. So let's have a match between Komodo 5 on one core and Diep on FOUR cores. If you win the match it will at least imply that Diep is stronger than Ivanhoe and is therefore one of the top 6 engines. Regarding time limit, I suggest 30' + 30" increment, since repeat time control testing is a big waste of time, playing out dead drawn endings at the same slow pace as the early middlegame. Any fairly short opening book/test suite is fine, as long as you reverse colors with the same opening after each game. Maybe fifty game match?
Note that this test does not require us to do anything, it only requires that you send a version of DIEP to the person who will run the match.
I expect Komodo will win the match. I say this because I believe that if Diep already had the level of Ivanhoe you would be selling it now. If you win the match you can safely go commercial and expect decent sales.

Oh you are sure you get 20 plies in 30 minutes a move i guess?

As for playing Diep, in some months from now it should play online so you can play whatever time control there i suppose.

But the whole point is, for a few months now i have seen you guys post about how 'strong' Komodo's evaluation is.

to me it seems very similar to deepsjeng's evaluation though which looks again from a distance at rybka/fruit evaluation with a few small additions - and no i don't claim anything has been copied.

I say *similar*.

You already admitted that you took over the material evaluation from Rybka.

Now all those months you claim that something much of a beancounter like that has worlds 'strongest' evaluation and a 'lot of knowledge' (my own words).

In fact Diep has 20x more knowledge. Sure very bad tuned, despite already big experiments from my viewpoint to see what bugs i can get out using automatic parameter tuning.

Those claims is what i responded against.
In fact it's trivial you both don't even know how things have been tuned in Komodo and what the effect of parameter tuning is, otherwise you would've blindfolded grabbed the opportunity for a 1 ply match.

In fact i'm sure had Christophe theron been in this forum still, he would've directly posted to play 1 ply matches against some very old Tiger (and i bet Tiger would win every single match), as he knows something about making your own evaluation and you guys obviously do NOT.

That's what i wanted to demonstrate and it's more than obvious to everyone.

There is like a 100 engines out there now with nearly identical evaluation, only some minor search differences and some of them like deepsjeng and komodo know a tad more about passed pawns seemingly. That's such tiny differences - it's not even funny to do any claim about your evaluation function i'd say.

Vincent

lkaufman · Post by **lkaufman** » Mon Aug 13, 2012 6:29 pm

diep wrote:
lkaufman wrote:
diep wrote: If they don't accept any match as proving anything, then it doesn't make sense to do any test Uri.

If i have a normal match against Komodo, they claim diep is parallel and uses all cores and they just 1, so logically they lose.

On other hand i didn't optimize Diep for single core contests as just forward pruning a lot and super selective search helps you there, as todays evaluations seem to need 20 ply (selective plies - not really comparable with real plies) for the evaluation functions to get the maximum eloscaling and above 20 plies you see most engines hardly win elo each ply.

So the struggle is to get that 20 plies quickly no matter how dubious you search. If you search SMP you can get it at rapid levels if you have a bunch of cores.

In most of those superbullet tests they do nowadays of course no one gets that 20 ply yet.
Okay, Vincent, here is my proposal. If you look at the CCRL 40/40 rating list you will see that Komodo 5 on one core outrates a recent Ivanhoe running on FOUR cores by 13 elo. So let's have a match between Komodo 5 on one core and Diep on FOUR cores. If you win the match it will at least imply that Diep is stronger than Ivanhoe and is therefore one of the top 6 engines. Regarding time limit, I suggest 30' + 30" increment, since repeat time control testing is a big waste of time, playing out dead drawn endings at the same slow pace as the early middlegame. Any fairly short opening book/test suite is fine, as long as you reverse colors with the same opening after each game. Maybe fifty game match?
Note that this test does not require us to do anything, it only requires that you send a version of DIEP to the person who will run the match.
I expect Komodo will win the match. I say this because I believe that if Diep already had the level of Ivanhoe you would be selling it now. If you win the match you can safely go commercial and expect decent sales.
Oh you are sure you get 20 plies in 30 minutes a move i guess?

As for playing Diep, in some months from now it should play online so you can play whatever time control there i suppose.

But the whole point is, for a few months now i have seen you guys post about how 'strong' Komodo's evaluation is.

to me it seems very similar to deepsjeng's evaluation though which looks again from a distance at rybka/fruit evaluation with a few small additions - and no i don't claim anything has been copied.

I say *similar*.

You already admitted that you took over the material evaluation from Rybka.

Now all those months you claim that something much of a beancounter like that has worlds 'strongest' evaluation and a 'lot of knowledge' (my own words).

In fact Diep has 20x more knowledge. Sure very bad tuned, despite already big experiments from my viewpoint to see what bugs i can get out using automatic parameter tuning.

Those claims is what i responded against.
In fact it's trivial you both don't even know how things have been tuned in Komodo and what the effect of parameter tuning is, otherwise you would've blindfolded grabbed the opportunity for a 1 ply match.

In fact i'm sure had Christophe theron been in this forum still, he would've directly posted to play 1 ply matches against some very old Tiger (and i bet Tiger would win every single match), as he knows something about making your own evaluation and you guys obviously do NOT.

That's what i wanted to demonstrate and it's more than obvious to everyone.

There is like a 100 engines out there now with nearly identical evaluation, only some minor search differences and some of them like deepsjeng and komodo know a tad more about passed pawns seemingly. That's such tiny differences - it's not even funny to do any claim about your evaluation function i'd say.

Vincent

So I guess you don't want to play the match Diep 4 cores vs. Komodo 1 core? I'm not fussy about the time control, I didn't even think about how many plies we search at 30" per move as I don't think it makes any difference. I know we scale well compared to other top programs, I have no idea about Diep so I have no idea as to whether more time would favor us or Diep. Anyway you don't even need my permission, you can make your own rules as our program is public and yours is not.
I never said anything about "taking over the material evaluation from Rybka". I said there are of course similarities in the eval of Rybka and Komodo as I was deeply involved with both, but the similarities are much less than the similarities between Ippolit and Rybka for example.
Regarding Deepsjeng neither Don nor I even have that engine as it wasn't rated high enough to interest us.
If you believe that 1 ply searches measure positional knowledge and not tactical knowledge built into eval, all I can say is that we disagree.

Don · Post by **Don** » Mon Aug 13, 2012 7:11 pm

lkaufman wrote:
diep wrote:
lkaufman wrote:
diep wrote: If they don't accept any match as proving anything, then it doesn't make sense to do any test Uri.

If i have a normal match against Komodo, they claim diep is parallel and uses all cores and they just 1, so logically they lose.

On other hand i didn't optimize Diep for single core contests as just forward pruning a lot and super selective search helps you there, as todays evaluations seem to need 20 ply (selective plies - not really comparable with real plies) for the evaluation functions to get the maximum eloscaling and above 20 plies you see most engines hardly win elo each ply.

So the struggle is to get that 20 plies quickly no matter how dubious you search. If you search SMP you can get it at rapid levels if you have a bunch of cores.

In most of those superbullet tests they do nowadays of course no one gets that 20 ply yet.
Okay, Vincent, here is my proposal. If you look at the CCRL 40/40 rating list you will see that Komodo 5 on one core outrates a recent Ivanhoe running on FOUR cores by 13 elo. So let's have a match between Komodo 5 on one core and Diep on FOUR cores. If you win the match it will at least imply that Diep is stronger than Ivanhoe and is therefore one of the top 6 engines. Regarding time limit, I suggest 30' + 30" increment, since repeat time control testing is a big waste of time, playing out dead drawn endings at the same slow pace as the early middlegame. Any fairly short opening book/test suite is fine, as long as you reverse colors with the same opening after each game. Maybe fifty game match?
Note that this test does not require us to do anything, it only requires that you send a version of DIEP to the person who will run the match.
I expect Komodo will win the match. I say this because I believe that if Diep already had the level of Ivanhoe you would be selling it now. If you win the match you can safely go commercial and expect decent sales.
Oh you are sure you get 20 plies in 30 minutes a move i guess?

As for playing Diep, in some months from now it should play online so you can play whatever time control there i suppose.

But the whole point is, for a few months now i have seen you guys post about how 'strong' Komodo's evaluation is.

to me it seems very similar to deepsjeng's evaluation though which looks again from a distance at rybka/fruit evaluation with a few small additions - and no i don't claim anything has been copied.

I say *similar*.

You already admitted that you took over the material evaluation from Rybka.

Now all those months you claim that something much of a beancounter like that has worlds 'strongest' evaluation and a 'lot of knowledge' (my own words).

In fact Diep has 20x more knowledge. Sure very bad tuned, despite already big experiments from my viewpoint to see what bugs i can get out using automatic parameter tuning.

Those claims is what i responded against.
In fact it's trivial you both don't even know how things have been tuned in Komodo and what the effect of parameter tuning is, otherwise you would've blindfolded grabbed the opportunity for a 1 ply match.

In fact i'm sure had Christophe theron been in this forum still, he would've directly posted to play 1 ply matches against some very old Tiger (and i bet Tiger would win every single match), as he knows something about making your own evaluation and you guys obviously do NOT.

That's what i wanted to demonstrate and it's more than obvious to everyone.

There is like a 100 engines out there now with nearly identical evaluation, only some minor search differences and some of them like deepsjeng and komodo know a tad more about passed pawns seemingly. That's such tiny differences - it's not even funny to do any claim about your evaluation function i'd say.

Vincent
So I guess you don't want to play the match Diep 4 cores vs. Komodo 1 core? I'm not fussy about the time control, I didn't even think about how many plies we search at 30" per move as I don't think it makes any difference. I know we scale well compared to other top programs, I have no idea about Diep so I have no idea as to whether more time would favor us or Diep. Anyway you don't even need my permission, you can make your own rules as our program is public and yours is not.
I never said anything about "taking over the material evaluation from Rybka". I said there are of course similarities in the eval of Rybka and Komodo as I was deeply involved with both, but the similarities are much less than the similarities between Ippolit and Rybka for example.
Regarding Deepsjeng neither Don nor I even have that engine as it wasn't rated high enough to interest us.
If you believe that 1 ply searches measure positional knowledge and not tactical knowledge built into eval, all I can say is that we disagree.

Please note that when Larry started working with me I ALREADY had an evaluation function and even though we made significant changes it was NOT to copy Rybka. Larry worked with the terms I already head, we modified a few of them, added several more, etc....

I don't think Larry even had complete knowledge of the implementation details of Rybka's evaluation function but Larry can correct me on that. He did not have Rybka source code and basically tuned what Vas provided and suggested many other terms which Vas either added to the program or declined from adding.

Probably a lot of what made Rybka good that Larry was responsible for is now in Komodo, but that makes sense since Larry is one of the Komodo authors.

michiguel · Post by **michiguel** » Mon Aug 13, 2012 7:51 pm

chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
Don wrote:
lkaufman wrote:
Rebel wrote:
Don wrote: I think you are ducking a REAL match. Trying to make a 1 ply match happen is way down on my list of priorities and I would not allow myself to be distracted by such a thing.
Don,

On several occasions you have said Komodo has the best eval in the world. I think you should proof it now that you have a challenger.

In good old Rome tradition we want to see the gladiators blood flow
We have a different definition of eval than Vince. He refers to the eval function, while we are talking about eval in positions where search won't help appreciably. Probably Diep has a better eval function, because it gives up search depth for evaluating tactics. We claim that Komodo has the best eval when tactics don't matter, and I don't know of a way to prove this. When tactics do matter, search depth is extremely important, and comparing us on equal depth to Diep has no value.
Just to illustrate how differently our definition really is, Vincent proposes to "prove" which program has the most sophisticated evaluation function by doing a 1 ply search.

As almost anyone in this field knows, a 1 ply search is completely dominated by tactics and fine positional understanding is almost completely irrelevant.

I'm trying really hard to parse this sentence.

"as almost anyone in the field knows" is an attempt to democratise and thus legitimise the text which follows. Except that unverifiable woffle can't do that.

"a 1 ply search is completely dominated by tactics", actually what does this mean? A one ply search has no tactics? A one ply search would be overwhelmed by an N ply search? The game would appear to be tactical? No reason why it should be. "Completely" sounds strong, but with what justification? "Dominated"? Again strong word, but what justification? Heavy adjectival use but no backup for them. Are you in marketing mode?

"fine positional understanding is almost completely irrelevant" Is it really? Well you qualified "completely" this time, so you are not too sure it seems. Actually positional understanding seems perfectly relevent, what would you suggest as an alternative? Random understanding? Partially right and partially wrong understanding? Would they be better?

Any yet he believes that is a legitimate test of how strong a program is in the positional sense.
False straw man. It's a legitimate test of the evaluation function. It really is intellectual cheating to switch his argument from "eval" to the "whole program", "in a positional sense" (what does that mean btw?) and then attack that. Don't you think?

I can only say that his definition of what is positional is different than ours.

it would be different when your positional definition keeps changing at will. From part (the non-tactical) to ALL (see below), for example

I think the best test is to simply play a long match at long time controls.

yes, that finds the stongest program, but this thread is about strongest eval function. Anyway, you won't change your tune, for why would a marketeer enter a competition to find scientific truth which by its nature runs the risk of his product appearing dumb?

The program that wins is playing the best chess and we don't have to debate "positional" vs "tactical" play, as has been pointed out it is ALL positional play, right?

Don
All this "experiment" proves is "who wrote the most complex eval, including several different tactical motifs such as pins, trapped pieces and such?"

One can evaluate pins, or let the search "evaluate" them. Ditto for trapped pieces. So this "test" doesn't find the "best" evaluation. It finds the most complex one, that likely has many bugs due to the complexity.

You've been on this "heavy eval" them for years, calling it by different names including "new paradigm." Yet the original "bean-counter" approach is STILL at the top of the performance heap. Most of us understand exactly why this is. Chess is a combination of search AND evaluation. It won't be dominated by an "evaluation only" program any more than a GM player will never rely on search and just make all his moves based on instant evaluation.

This experiment is flawed in exactly the same way as fixed-depth comparisons between different programs are flawed. No one wants to use SEE inside the evaluation for every piece and pawn, yet this would be a good idea for an eval-only 1-ply - no q-search test like this... What would it prove???
Yes, I have indeed been on about "heavy eval" for years. Curious now how my "heavy eval" from early 1990s is the basis of Fruit. I think you use it too now.

On GM play, the sad truth is that you have no idea.
Fruit? Heavy Eval? You need to stop using whatever recreational drugs/beverages you are using. Fruit is typical bean-counter fast. You will find zero tactical analysis in Fruit's eval, nor in mine. Everything is designed for speed in BOTH. Hence the NPS numbers they produce.

As far as GM play goes, you might try talking to Dzhindi a bit to see what I do/don't know... One never stops learning.
Still with the low level insulting? Drips from your pen as if the most natural thing in he world, doesn't it? For the record, although quite why I should be put into a position to have to deny your smears, I do not do recreational drugs, and I have my own rule which precludes me from posting if I drink wine with dinner.

We understand that your chess knowledge is very poor, that reflects in your personal rating and the lack of original chess ideas in your software. The fact that you "know someone who knows", apart from an irony George Speight will pick up on, will do you no good because your low level of chess knowledge means that a GM will find it almost impossible to find a way past the basic low level. Same problem as trying to explain differential calculus to my wife.

Fruit has exponentially ramped king attack code, testing for queen plus one unit as described by Mark Watkins in evalcomp. Rybka also and it seems now to be general practice, including wrapping in pawn shelter destruction. Even you do this now, but not back in the early 1990s, and nor did anyone else as far as I was able to determine. It may have been in closed source commercials but I did not notice it when looking at games played. Together with detailed material imbalance code, I claim that these ideas were unique to Chess System Tal in the mid 90s and therefore that current evals are a composite of old ideas since the year dot and unique ideas from CSTal, certainly in the eval department and presumably with less mania. I'm not complaining and i don't mind, this is how comp chess moves on.

Ironic though, is that it took your blindness something over ten years to get it into Crafty. Perhaps i should ask for a credit?

[MODERATION]
Just a friendly reminder to all members of this distinguished forum, that this is the technical sub-forum. Before any fight even starts, we would really like to prevent it. So, please, it will be of great help to restrain from introducing any comment that is designed to unnecessarily irritate the other person as a person, or with anything that has nothing to do with programming issues.

You can punch the other programmer with your technical arguments, of course.

Miguel

bob · Post by **bob** » Mon Aug 13, 2012 8:36 pm

Rebel wrote:
bob wrote:
chrisw wrote: Yes, I have indeed been on about "heavy eval" for years. Curious now how my "heavy eval" from early 1990s is the basis of Fruit. I think you use it too now.

On GM play, the sad truth is that you have no idea.
Fruit? Heavy Eval?
Chris did not say the Fruit eval is heavy.

How does that jive with the following quote?

Curious now how my "heavy eval" from early 1990s is the basis of Fruit. I think you use it too now.

You need to stop using whatever recreational drugs/beverages you are using. Fruit is typical bean-counter fast. You will find zero tactical analysis in Fruit's eval, nor in mine. Everything is designed for speed in BOTH. Hence the NPS numbers they produce.

As far as GM play goes, you might try talking to Dzhindi a bit to see what I do/don't know... One never stops learning.

bob · Post by **bob** » Mon Aug 13, 2012 8:47 pm

chrisw wrote:
bob wrote:
chrisw wrote:
bob wrote:
chrisw wrote:
Don wrote:
lkaufman wrote:
Rebel wrote:
Don wrote: I think you are ducking a REAL match. Trying to make a 1 ply match happen is way down on my list of priorities and I would not allow myself to be distracted by such a thing.
Don,

On several occasions you have said Komodo has the best eval in the world. I think you should proof it now that you have a challenger.

In good old Rome tradition we want to see the gladiators blood flow
We have a different definition of eval than Vince. He refers to the eval function, while we are talking about eval in positions where search won't help appreciably. Probably Diep has a better eval function, because it gives up search depth for evaluating tactics. We claim that Komodo has the best eval when tactics don't matter, and I don't know of a way to prove this. When tactics do matter, search depth is extremely important, and comparing us on equal depth to Diep has no value.
Just to illustrate how differently our definition really is, Vincent proposes to "prove" which program has the most sophisticated evaluation function by doing a 1 ply search.

As almost anyone in this field knows, a 1 ply search is completely dominated by tactics and fine positional understanding is almost completely irrelevant.

I'm trying really hard to parse this sentence.

"as almost anyone in the field knows" is an attempt to democratise and thus legitimise the text which follows. Except that unverifiable woffle can't do that.

"a 1 ply search is completely dominated by tactics", actually what does this mean? A one ply search has no tactics? A one ply search would be overwhelmed by an N ply search? The game would appear to be tactical? No reason why it should be. "Completely" sounds strong, but with what justification? "Dominated"? Again strong word, but what justification? Heavy adjectival use but no backup for them. Are you in marketing mode?

"fine positional understanding is almost completely irrelevant" Is it really? Well you qualified "completely" this time, so you are not too sure it seems. Actually positional understanding seems perfectly relevent, what would you suggest as an alternative? Random understanding? Partially right and partially wrong understanding? Would they be better?

Any yet he believes that is a legitimate test of how strong a program is in the positional sense.
False straw man. It's a legitimate test of the evaluation function. It really is intellectual cheating to switch his argument from "eval" to the "whole program", "in a positional sense" (what does that mean btw?) and then attack that. Don't you think?

I can only say that his definition of what is positional is different than ours.

it would be different when your positional definition keeps changing at will. From part (the non-tactical) to ALL (see below), for example

I think the best test is to simply play a long match at long time controls.

yes, that finds the stongest program, but this thread is about strongest eval function. Anyway, you won't change your tune, for why would a marketeer enter a competition to find scientific truth which by its nature runs the risk of his product appearing dumb?

The program that wins is playing the best chess and we don't have to debate "positional" vs "tactical" play, as has been pointed out it is ALL positional play, right?

Don
All this "experiment" proves is "who wrote the most complex eval, including several different tactical motifs such as pins, trapped pieces and such?"

One can evaluate pins, or let the search "evaluate" them. Ditto for trapped pieces. So this "test" doesn't find the "best" evaluation. It finds the most complex one, that likely has many bugs due to the complexity.

You've been on this "heavy eval" them for years, calling it by different names including "new paradigm." Yet the original "bean-counter" approach is STILL at the top of the performance heap. Most of us understand exactly why this is. Chess is a combination of search AND evaluation. It won't be dominated by an "evaluation only" program any more than a GM player will never rely on search and just make all his moves based on instant evaluation.

This experiment is flawed in exactly the same way as fixed-depth comparisons between different programs are flawed. No one wants to use SEE inside the evaluation for every piece and pawn, yet this would be a good idea for an eval-only 1-ply - no q-search test like this... What would it prove???
Yes, I have indeed been on about "heavy eval" for years. Curious now how my "heavy eval" from early 1990s is the basis of Fruit. I think you use it too now.

On GM play, the sad truth is that you have no idea.
Fruit? Heavy Eval? You need to stop using whatever recreational drugs/beverages you are using. Fruit is typical bean-counter fast. You will find zero tactical analysis in Fruit's eval, nor in mine. Everything is designed for speed in BOTH. Hence the NPS numbers they produce.

As far as GM play goes, you might try talking to Dzhindi a bit to see what I do/don't know... One never stops learning.
Still with the low level insulting? Drips from your pen as if the most natural thing in he world, doesn't it? For the record, although quite why I should be put into a position to have to deny your smears, I do not do recreational drugs, and I have my own rule which precludes me from posting if I drink wine with dinner.

We understand that your chess knowledge is very poor, that reflects in your personal rating and the lack of original chess ideas in your software. The fact that you "know someone who knows", apart from an irony George Speight will pick up on, will do you no good because your low level of chess knowledge means that a GM will find it almost impossible to find a way past the basic low level. Same problem as trying to explain differential calculus to my wife.

My "low level chess knowledge" was good enough to produce a 2250+ USCF rating in the early 1970's. I'm not an IM/GM player, do not know whether I could have become one or not. But I am far from a patzer...

Fruit has exponentially ramped king attack code, testing for queen plus one unit as described by Mark Watkins in evalcomp. Rybka also and it seems now to be general practice, including wrapping in pawn shelter destruction. Even you do this now, but not back in the early 1990s, and nor did anyone else as far as I was able to determine. It may have been in closed source commercials but I did not notice it when looking at games played. Together with detailed material imbalance code, I claim that these ideas were unique to Chess System Tal in the mid 90s and therefore that current evals are a composite of old ideas since the year dot and unique ideas from CSTal, certainly in the eval department and presumably with less mania. I'm not complaining and i don't mind, this is how comp chess moves on.

Ironic though, is that it took your blindness something over ten years to get it into Crafty. Perhaps i should ask for a credit?

Sorry, but we ALWAYS had a ramped king safety (not exponentially ramped, either) that was based on closeness of pieces to kings, the number of pieces, the pawn shelter, etc. My king safety has not changed much over the years, except in the tuning. The idea of a second-order evaluation for king safety was not new in Fruit. And I do not consider this a "heavy" evaluation at all.

If you go back in time, you NEVER gave any details about what YOU were doing. Always vague comments like "CSTal steers toward the fog" and such. Current king safety in Crafty was developed in Crafty, not copied/borrowed from ANYONE at all. And it was, in various forms, present in versions of Crafty well prior to Fruit... We did similar things in Cray Blitz which predates most everything in this discussion.

bob · Post by **bob** » Mon Aug 13, 2012 8:51 pm

diep wrote:
bob wrote:
chrisw wrote:
diep wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?

kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?

Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?

Oh comeon we can call everything tactical.

I want a 1 ply match

Ed?
Make some noise!
Completely agree with Vincent. Only beancounter programmers would oppose Ed's idea, always using the same false dichotomy, search=tactics, eval=positional. Nonsense of course. I'ld take it further, ban the QS, which can contain all manner of check search tricks btw and force the beancounters to write a SEE. Then we'll see how really crap their evals are

One way you can also test btw, is put the zero search program onto ICC and test it against rated players. Then shoot any programmer who can't get 2000 ELO out of raw evaluation only.
There is a major flaw in your reasoning. You are going back to the 70's, when the mantra was "you must do chess on a computer like a human does it." Problem was, then, and still is, now, "We don't know HOW a human plays chess." So saying "no search" is a meaningless constraint.

Not to mention the obvious dichotomy where one can write a complex eval, or use search to fill in issues the eval doesn't handle well, and either should eventually reach exactly the same level of skill. But with computers, it is easier to rely on recursive high-speed stuff rather than on overly complex code that contains too many bugs to ever work well..
We know very well how a human plays chess.

In fact the most important clue we already know from a research from De Groot in 1946.

That clue is simply that it's knowledge based and not search based.

That also explains why so many players who are analytically real strong, why they lose games - they make search mistakes - sometimes even 2 ply ones; simply missing the opponents move entirely.

After 1946, the research there usually focuses upon the wrong persons.

Everyone always wants to research the world champion. The world champion from scientific viewpoint is NOT interesting to research.

In computerchess we also know very clearly that if you have somewhere abug you ARE gonna lose everything based upon it. So avoiding bugs is what you want.

So in such researches they always make the same beginnersmistake, a zillion times again - you want to avoid making the same mistakes like the common man who plays chess.

But researching a guy who is 1200 elo is not sexy huh?

All the ladies who work for government always want to research those in society that are weird/more intelligent right?

Interesting is researching the common guys who without knowing anything still can win a game.

Interesting is knowing why a correspondence player who is elo 1100 himself over the board, not even knowing which endgame is a clear win for him, why he's world top correspondence chess.

No one is researching those cases.

Simply because we all already know the truth since 1946. There is nothing secret there.

It's all knowledge based with human players.

Of course the real interesting question then is: in how far is accurate parameter tuning important to humankind?

To me it seems chess engines are far better tuned than any human has knowledge tuned in his brain.

Yet THAT is an open discussion and an interesting one.

My claim is that the beancounter chess engines have been tuned far beyond what is interesting from research viewpoint. Total useless to even spend money on developing further engines like Komodo, Stockfish, Rybka or any similar engine. It's total trivial that they play far above the elostrength that their knowledge supports. Not interesting in short.

So if someone claims his evaluation function is better, then we simply can do the 1 ply test.

They now back off and claim their eval is ok when they add that 30 ply search they're doing

Next attempt will be: "but we must both run on 1 core, as Diep nearly always wins from engines when it has a big hardware advantage and being SMP is a hardware advantage" (that even was the case when Diep was a lot lower rated than it is now).

Then after that the attempt will be: "but the only thing that matters is superbullet".

And after all, they're still not better in evaluation, not even a penny, than DeepSjeng 2011

(which definitely is much better evaluation than rybka)

In fact it's very similar to it as well.

Yet DeepSjeng 2011 is from spring 2011, and Komodo5 is far more recent...

Who copied who?

Simple question... If we know how a human plays chess, how is this done in the human mind?

Someone asks you "name George Washington's wife"? Most in the US would think a bit and say "Martha". How did they know they knew, when they couldn't recall it instantly? Then ask them "Who was Henry Ford's wife" and they instantly say "no clue." How did they know that they didn't know? That kind of associative memory process is not understood today in terms of how the brain does it. We know more than we did when De Groot wrote his book, but we are no closer to understanding how a human can apparently search so few positions and produce moves of such high quality, when no computer comes even close.

bob · Post by **bob** » Mon Aug 13, 2012 8:54 pm

Don wrote:
Rebel wrote: And another fun expiriment, the "best" full static eval. No QS but in the middle of the chaos on the board with multiple hanging pieces and counter attacks one evaluates and checking moves needs special code to see if the move is not mate, cool.

So we have 2 challenges, the best dynamic eval (with QS) and the best static eval. Now we need participants
This is just a silly experiment. Probably all the top programs would lose badly to any program that make an attempt to resolve tactics in some way. I'm not going to devote the next week to fixing up Komodo to win some competition that has nothing to do with real chess skill.

There is a contest you can have right now - just test several programs doing a 1 ply search. I don't know what it will prove - but it's a lot better than trying to design an experiment that requires everyone except Vincent to dumb down their search (presumably to prove that Vincent really does have the best program of all.)

What such a 1-ply test will prove is that "The King" is the strongest chess program on the planet. Of course, MOST programs will move instantly at one ply, while the King will sometimes take minutes to report a score.

Really would tell a lot, wouldn't it? Or would it just show "a ply is not the same in all programs?"

A chess engine is a search integrated with an evaluation. What is the point of lobotomizing the engine by removing one of those two highly integrated and dependent components? Then it becomes an issue of how each program was integrated and where different things are getting done. Doesn't make any sense to me...

How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?