How effective is move ordering from TT?

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 2:27 pm

Re: How effective is move ordering from TT?

Post by Don » Sat Aug 11, 2012 4:26 pm

diep wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?

kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?

Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?

Oh comeon we can call everything tactical.

I want a 1 ply match :)

Ed?
Make some noise!
I think you are ducking a REAL match. Trying to make a 1 ply match happen is way down on my list of priorities and I would not allow myself to be distracted by such a thing. If you want to play a REAL match, Diep vs Komodo then I'm all for it.

I can understand why YOU might want a 1 ply match however because you would have a realistic chance of winning such a match. But you know that you would have no chance of winning a real match with your program vs Komodo.

I would not be making an issue of this but you have had the audacity to demean Komodo, calling it a beancounter and hurling other insults without any evidence whatsoever, so I suggest that you should be willing to put your reputation on the line and allow a real match, not some meaningless 1 ply match with ambiguous rules which would not prove anything about the actual strength of the program.

I know that your program is strong - but strong is relative. I think Larry's estimate is probably reasonable and perhaps even generous, but there must be a reason why you are unwilling to allow it to be tested.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.

Uri Blass
Posts: 8730
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: How effective is move ordering from TT?

Post by Uri Blass » Sat Aug 11, 2012 4:27 pm

lkaufman wrote:
Uri Blass wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
I think that pins are not only tactical knowledge because it is possible to have a pin for many moves without winning material.

I also think that tactical knowledge is part of the knowledge that humans use in their evaluation function.

For example if I see a white knight at c7 and a black rook at a8 that cannot move then I may know that the rook is probably trapped without calculating all possible moves of black and it is clearly fair to say that
being able to see it without searching all the moves of black means better evaluation function.

Practically capturing the a8 rook may be hidden even with some plies of additional search because black can delay the capture by some threats against white queen or by checks and it is also possible that one of these threats can also save the a8 rook later so you cannot be sure by evaluation that you win the rook but you can give some bonus for the fact that maybe you are going to win the rook.
Scoring of pins could be considered positional, I agree, but for example scoring pawn attacks on pinned pieces is basically tactical, it's an attempt to save (usually) one ply of search. We could very easily make our program look much better on one ply searches by including various tactical ideas like this in eval, but we have found that doing this sort of thing makes the program weaker on balance due to the slowdown.
I think that it may be a possible idea to have 2 evaluations(when one is optimized for one ply matches) and use the difference between evaluations for pruning decisions(if the difference between the evaluations is small then you can be more sure that there is no tactics and prune more and if the difference is big then you may do less pruning).

chrisw
Posts: 2947
Joined: Tue Apr 03, 2012 2:28 pm

Re: How effective is move ordering from TT?

Post by chrisw » Sat Aug 11, 2012 4:31 pm

diep wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?

kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?

Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?

Oh comeon we can call everything tactical.

I want a 1 ply match :)

Ed?
Make some noise!
Completely agree with Vincent. Only beancounter programmers would oppose Ed's idea, always using the same false dichotomy, search=tactics, eval=positional. Nonsense of course. I'ld take it further, ban the QS, which can contain all manner of check search tricks btw and force the beancounters to write a SEE. Then we'll see how really crap their evals are ;-)

One way you can also test btw, is put the zero search program onto ICC and test it against rated players. Then shoot any programmer who can't get 2000 ELO out of raw evaluation only.

User avatar
Desperado
Posts: 638
Joined: Mon Dec 15, 2008 10:45 am

Re: How effective is move ordering from TT?

Post by Desperado » Sat Aug 11, 2012 4:36 pm

lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. ...
Sorry Larry, i simply cannot follow you.
In most cases it is not possible to classify a feature exclusive as tactical,
positional or whatever type you think of. If you think of what a pin is
doing in first line, it is just restricting opponents piece mobility.
That would mean we are talking of some kind of mobility feature which
i am pretty sure is not an exlusive tactical feature.
Further positional,dynamic and also tactical features do belong to
evaluation, so you can evaluate double pawn attacks, mobility or
bad placed pieces. It doesnt matter what category a feature belongs to,
it just builds a heuristic named evaluation.

Second point is, that the attempt to use a heuristic (evaluation) independent from the search is nonsense.
(I already pointed out some things in my last post) It simply belongs together.
Search <-> Heuristic
Any conclusion, doesnt matter how the setup looks like of the experiment,
would be misleading, and would not really represent the evaluation
with the better chess knowledge at all.
After such an experiment the discussion will continue with, "but if the search..., our evaluation function will behave ..."

Much more interesting would be for me what the conclusion would be,
if komodo and Diep would exchange their evaluation, and attach them
as modul on their own framework. My guess would be that both engines
would drop down in strength, simply because there is a strong dependence between eval and search,
but maybe the opposite could happen. Doesnt matter, that would be at least be much more intersting
to me than a 1 ply search contest, which doesnt provide any imformation
at all.

imo

Michael

diep
Posts: 1780
Joined: Thu Mar 09, 2006 10:54 pm
Location: The Netherlands
Contact:

Re: How effective is move ordering from TT?

Post by diep » Sat Aug 11, 2012 4:50 pm

lkaufman wrote:
diep wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?

kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?

Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?

Oh comeon we can call everything tactical.

I want a 1 ply match :)

Ed?
Make some noise!
Certainly "evaluating attacks" is tactical, we do some of that too, but probably much less than you. It's basically an attempt to save one ply of search in specific situations. I didn't say anything about "material" being the most important positional term. Pawn structure, mobility, king safety, and many specific positional terms are also important. But anything that attempts to save a ply of search is in my opinion tactical. I don't object to anyone running one ply search matches. But they tell us nothing about which program is better able to evaluate positions with no tactics in them, which is what I would consider the question here.
Oh sure, the whole strength of all the derivatives right now is the material evaluation in combination with piece square tables and well tuned (yet simplistic) passer knowledge.

In fact that code is easy to build, the tuning is very difficult of it though. That's a special field of parameter optimization, very mathematical.

It's wrong to say that material is a positoinal term and that pins are a tactical term. Not sure i have as many chessbooks as you, but in classifying knowledge, 99.9% of all titled players are total beginners. They don't know how to factorize.

I remember analyzing with Anand and analyzing with some Russian world top players. It's all about 'feeling'. Not capable of distinguishing patterns.

The books reflect that.

In computerchess you have a difference between SEARCH and EVALUATION.
Search is factual playing moves on the board.

If something is inside your evaluation it's EVALUATION. A function that estimates how well your position is.

In the end in chess the goal is winning that king, so every evaluation pattern you can classify as tactical of course as its goal is winning the king.

A big problem is that most titled players, especially GM's, they are not capable of distinguishing between positional factors and strategical factors. They sure consider them - just don't distinguish - and if they play chess, the distinguishment doesn't matter much - let's face that.

However if those factors are inside your evaluation function it's the EVALUATION FUNCTION.

Now in 90s many under which professor Bob, they doubted that an evaluation function can make up for more than 1 ply of SEARCH, it sure wouldn't be worth 2 plies.

You have a program which evaluation function seems surprisingly much like that of the rybka beancounters, i would actually want to compare Komodo5 with DeepSjeng 2011 in terms of evaluation. It all is very close to each other. 0.01 difference here 0.05 there, and search makes up for some differences as well.

It's hard to maintain that Komodo has a lot of knowledge man.

If you compare with Diep, it's pawns away not seldom in STATIC EVALUATION from the derivatives.

Now that might be wrong sometimes or it might be right, but it sure is having more knowledge and it gives it a far more human style.

I want the 1 ply challenge.

All the sick lies about how parameter tuning happens, let's skip those - by now even the biggest fool here will realize you would need a million games to accurate tune even 1 parameter to the rest of the evaluation function (which doesn't say it's actually having the CORRECT VALUE).

It also seems to me that the neural nets did do a bad job for tuning compared to what i'm guessing is the 'objective tuning truth' (which still is a guess obviously).

lkaufman
Posts: 3883
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: How effective is move ordering from TT?

Post by lkaufman » Sat Aug 11, 2012 5:17 pm

diep wrote:Oh sure, the whole strength of all the derivatives right now is the material evaluation in combination with piece square tables and well tuned (yet simplistic) passer knowledge.

In fact that code is easy to build, the tuning is very difficult of it though. That's a special field of parameter optimization, very mathematical.

It's wrong to say that material is a positoinal term and that pins are a tactical term. Not sure i have as many chessbooks as you, but in classifying knowledge, 99.9% of all titled players are total beginners. They don't know how to factorize.

I remember analyzing with Anand and analyzing with some Russian world top players. It's all about 'feeling'. Not capable of distinguishing patterns.

The books reflect that.

In computerchess you have a difference between SEARCH and EVALUATION.
Search is factual playing moves on the board.

If something is inside your evaluation it's EVALUATION. A function that estimates how well your position is.

In the end in chess the goal is winning that king, so every evaluation pattern you can classify as tactical of course as its goal is winning the king.

A big problem is that most titled players, especially GM's, they are not capable of distinguishing between positional factors and strategical factors. They sure consider them - just don't distinguish - and if they play chess, the distinguishment doesn't matter much - let's face that.

However if those factors are inside your evaluation function it's the EVALUATION FUNCTION.

Now in 90s many under which professor Bob, they doubted that an evaluation function can make up for more than 1 ply of SEARCH, it sure wouldn't be worth 2 plies.

You have a program which evaluation function seems surprisingly much like that of the rybka beancounters, i would actually want to compare Komodo5 with DeepSjeng 2011 in terms of evaluation. It all is very close to each other. 0.01 difference here 0.05 there, and search makes up for some differences as well.

It's hard to maintain that Komodo has a lot of knowledge man.

If you compare with Diep, it's pawns away not seldom in STATIC EVALUATION from the derivatives.

Now that might be wrong sometimes or it might be right, but it sure is having more knowledge and it gives it a far more human style.

I want the 1 ply challenge.

All the sick lies about how parameter tuning happens, let's skip those - by now even the biggest fool here will realize you would need a million games to accurate tune even 1 parameter to the rest of the evaluation function (which doesn't say it's actually having the CORRECT VALUE).

It also seems to me that the neural nets did do a bad job for tuning compared to what i'm guessing is the 'objective tuning truth' (which still is a guess obviously).
If you are simply saying that your evaluation function, including all these tactical themes, is bigger/stronger than ours, ok, you are right. Our claim is that Komodo's eval is as good or better if you take out everything that is done in the eval to improve tactics. We can both be right. A one ply match won't clarify anything.
As far as any similarity between Komodo's eval and Rybka's and Ippolit's and Houdini's, yes, there should be some similarity, because I was primarily responsible for the terms and weights (not the coding) in Rybka 3, which were the basis for these other programs. So although we made the eval in Komodo from scratch, there are bound to be similarities as my basic beliefs haven't changed. Maybe your program is vastly different than all the strong programs of today, but as it's private only you know this.

diep
Posts: 1780
Joined: Thu Mar 09, 2006 10:54 pm
Location: The Netherlands
Contact:

Re: How effective is move ordering from TT?

Post by diep » Sat Aug 11, 2012 5:18 pm

Desperado wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. ...
Sorry Larry, i simply cannot follow you.
In most cases it is not possible to classify a feature exclusive as tactical,
positional or whatever type you think of. If you think of what a pin is


We see too many engine projects that have very little knowledge despite having been 'helped' by titled players.

It is not easy however, let me assure you that.

doing in first line, it is just restricting opponents piece mobility.
That would mean we are talking of some kind of mobility feature which
i am pretty sure is not an exlusive tactical feature.
Further positional,dynamic and also tactical features do belong to
evaluation, so you can evaluate double pawn attacks, mobility or
bad placed pieces. It doesnt matter what category a feature belongs to,
it just builds a heuristic named evaluation.

Second point is, that the attempt to use a heuristic (evaluation) independent from the search is nonsense.
This is what chessprograms are doing of course, yet the algorithms used are obviously depending upon what your evaluation is capable of.

(I already pointed out some things in my last post) It simply belongs together.
Search <-> Heuristic
Any conclusion, doesnt matter how the setup looks like of the experiment,
would be misleading, and would not really represent the evaluation
with the better chess knowledge at all.
I disagree there.
After such an experiment the discussion will continue with, "but if the search..., our evaluation function will behave ..."

Much more interesting would be for me what the conclusion would be,
if komodo and Diep would exchange their evaluation, and attach them
as modul on their own framework. My guess would be that both engines
would drop down in strength, simply because there is a strong dependence between eval and search,
Komodo is doing all kind of pruning which is not possible with a knowledgeable evaluation.

Also realize that Komodo is factor 20 faster in nps or so than Diep in terms of evaluation speed.

In Diep i simply CANNOT afford to just see the mainline deep.

Seeing mainline deep and playing defensive is something that works especially well if you have a program with a simple evaluation function.

Not having the knowledge means that every agressive move will possibly weaken your own position long term, and with a small evaluation odds for a mistake then is too big.

Search is much more there. Search also reflects how much you can trust upon your evaluation function.

If it has little knowledge, go take care you see 40 ply you know.

If you have more knowledge, go try find better moves!

Realize clearly that forward pruning last few plies also throws basically away with a few metarules knowledge in the evaluation function.

No matter how clever your evaluation function is - if you prune here, then a single RULE overrules with a margin your ENTIRE evaluation function.

As it appears the beancounters have gone to very high elo by means of accurate tuning. The tuning is basically taking care your program plays much better moves than it realizes positional. Not seldom a total silly move is 0.01 worse than the move it actually plays.

You see this clearly in todays beancounters. So tuning compensates there for knowledge. The advantage of such tuning and simple knowledge is you can easier forward prune last X plies.

In Diep i just cannot do this that easy.

So where there is a huge nps difference already, another 3 plies they win by means of forward pruning last few plies in whatever method (futility, razoring or something they grew themselves).

This appears to be very powerful combination.

Basically i need to do the parameter tuning myself with tens of thousands of parameters.

For gaining some insight in how Komodo5 with a 20 times slower evaluation function would do, first slow down Komodo factor 10, then report back how much elo that loses for Komodo in games against others.

Then you start to realize what Diep's evaluation makes up for.

In general the amazing test already for a while is when i give diep a hardware advantage. But we see more amazements.

For example TheBaron is interesting amazement. First strong engine that didn't copy half of the evaluation, in most cases most copy entire evaluation function, from some ivanhoe type fruit/rybka clone.

Best clone example is of course houdini. Entire evaluation 100% identical to ivanhoe in fact as it seems.

If we see how Baron is doing, we see 2 clear things. It's doing relative much better than you'd expect against the rybka clones, this despite a very weak endgame.

We also see directly the weakness. And how hard it is to fix the endgame.

If you want to recognize a clone, just test it in endgame. If it is same strength (elowise, not testsets as most testsets you solve by advancing passed pawns in stupid manner which then tactical somehow works) you know it's a clone. Not a single dude on the planet who didn't write a top engine before ever managed to get endgame strong.

Past year i was very amazed by how much Diep was behind in endgame strength to the rybka clones. This where start this century, when i defined more parameters to distinguish for every parameter the gamephase (in diep i have 256 phases which i compress into 16 phases later on), Diep totally dominated in endgame as long as it wasn't about passed pawns running (as some engines always used to extend that to the bitter end - so search takes over then from evaluation). Accurate parameter tuning really kicks butt there.

Yet it's obvious that knowledge works better in the end, provided it doesn't have major bugs.

But all jokes aside - all those derivatives - they just copy especially the material evaluation from each other.
but maybe the opposite could happen. Doesnt matter, that would be at least be much more intersting
to me than a 1 ply search contest, which doesnt provide any imformation
at all.

imo

Michael

Uri Blass
Posts: 8730
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: How effective is move ordering from TT?

Post by Uri Blass » Sat Aug 11, 2012 5:19 pm

chrisw wrote:
diep wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?

kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?

Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?

Oh comeon we can call everything tactical.

I want a 1 ply match :)

Ed?
Make some noise!
Completely agree with Vincent. Only beancounter programmers would oppose Ed's idea, always using the same false dichotomy, search=tactics, eval=positional. Nonsense of course. I'ld take it further, ban the QS, which can contain all manner of check search tricks btw and force the beancounters to write a SEE. Then we'll see how really crap their evals are ;-)

One way you can also test btw, is put the zero search program onto ICC and test it against rated players. Then shoot any programmer who can't get 2000 ELO out of raw evaluation only.
Note that humans use search even in bullet games of 1+0 so I do not understand how can we test quality of evaluation based on games with no search.

I also think that evaluating things like trapped pieces can be done only by some type of selective search even if you do not use the make move function.

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 2:27 pm

Re: How effective is move ordering from TT?

Post by Don » Sat Aug 11, 2012 5:22 pm

chrisw wrote:
diep wrote:
lkaufman wrote:
diep wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?

With some luck we'll see then how strong mobility and coordinated piece evaluation plays.

Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?

kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?

Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?

Oh comeon we can call everything tactical.

I want a 1 ply match :)

Ed?
Make some noise!
Completely agree with Vincent.
I don't think you see the irony of that.

Only beancounter programmers would oppose Ed's idea, always using the same false dichotomy, search=tactics, eval=positional. Nonsense of course. I'ld take it further, ban the QS, which can contain all manner of check search tricks btw and force the beancounters to write a SEE. Then we'll see how really crap their evals are ;-)

One way you can also test btw, is put the zero search program onto ICC and test it against rated players. Then shoot any programmer who can't get 2000 ELO out of raw evaluation only.
You don't need me to run this test nor do you need my permission. So run the test and then interpret the results and pretend it means something.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.

diep
Posts: 1780
Joined: Thu Mar 09, 2006 10:54 pm
Location: The Netherlands
Contact:

Re: How effective is move ordering from TT?

Post by diep » Sat Aug 11, 2012 5:38 pm

Rebel wrote:
hgm wrote:Micro-Max has a very poor move ordering (no killer, no history; it just does 1) null move, 2) hash move, 3) best MVV/LVA capture, 4) other captures in unspecified order, 5) non-captures in unspecified order). Yet LMR worked great. It reduces all non-captures that are not Pawn moves.
LMR might give no doubt, but the better the move ordering the better LMR will perform. Much of course will depend on the scheme in use for 1.0, 1.5, 2.0, 2.5, 3.0 reductions in relationship with the number of the move list. I am not using the latter myself but that's what I have understood what many programs do nowadays.
Ed, if i just look at rebel and compare it with the derivatives, the trick is the material evaluation of them.

It's more advanced than Rebel one. This where they indeed use a few simple concepts to get it done.

If none of your positional factors, other than passed pawn evaluation,
can ever overrule the material evaluation, then you can easily forward prune last 4 plies, not sure how this has been set in komodo.

That's what they do.

They win 3 plies there that i do not win.

Then they use nullmove with reduction factor that soon goes up 3,4,5,6,7 etc.

That they combine with agressive LMR.

If you'd retune Rebel, all you have to do is improve your material evaluation, have a 2 layer PSQ table and retune every parameter.

So layer 1 of the PSQ table is for opening, layer 2 is for endgame and then you find a formula to know how far in opening or endgame you are.

You cannot tune those parameters playing games.

You'd beat the crap out of Komodo in superbullet with Rebel then with your searchframe would you manage.

As for search, their strong point is Rebels strong point as well, namely see the mainline deep quickly. Additional to that you see more tactics with Rebel, so you win the superbullet bigtime from Komodo5 in such case.

Just you lose it based upon the combination of material, and piece square tables. They have copied Fruit's idea, some of them in fact just cut'n pasted it, to have a piece square table for every piece in both opening as well as endgame. Then they average over that using a 32 intervals or so (with piece = 1, rook = 3, queen = 6 ; so in a full board position it gets 100% from the first PSQ with 2 * (6 + 3 + 3 + 1 + 1 + 1 + 1 ) = 2 * 16 = 32 points being the openingstable.

Fruit was using 1,2,4 by the way as values instead of 1,3,6...

This concept is surprisingly strong simply for endgame.

You don't manage to tune that at home that accurately though.

I've been toying at 80 cores and now at home i have 64 core cluster with 2 Tesla cards each having 448 cores. So i have in total for tuning nearly a 1000 cores.

In my experiments on how to improve Diep's tuning, i encountered many problems.

The PSQ tables play a total crucial role in the derivatives. Remove them and they're dead. The material evaluation none of them would invent it.

It has been tuned as it seems by neural net.

Now i'm not too impressed by that neural net, but tuning of your values highly is dependant upon how much knowledge you have got.

An engine with no knowledge except material, a bishop will be worth like 2.85 or so.

Your first goal is to figure out how to mathematically tune.

Playing games won't work.

If you have quite some knowledge you need a million games to figure out whether something needs to be 0.013 pawn bonus or 0.014

Forget it.

LMR is not the holy grail. It's the tuning and a simple yet effective setup of the evaluation. Material for the material. PSQ's for the positional evaluation and simple passed pawn knowledge for the passers.

Kingsafety?

Ah that's the big bummer. In many positions in chess it simply APPEARS that you can get away with relative simple kingsafety provided you get 20 ply. Most positional factors in kingsafety give a TEMPORARILY advantage.

That doesn't stop me from still working on Diep's kingsafety by the way.

Yet nearly all fixes is do right now have to do with simple things. Like i'm fixing now the willingness of diep to castle quick. Sure it wants to castle, but the derivatives simply give a big penalty for a king in the center.

Tuning fixes it for them everywhere....

In Diep just the material already is a 300+ parameters or so. Diep's material concept is totally different from the Fruit concept.

I really use knowledge rules to decide whether i give a bonus. I don't use a neural net tuned entity.

though it's all tuned by hand this material evaluation, i get impression diep is doing it better than the derivatives. But that feeling is very recent and i'm not sure it's already ok for most endgames.

As Diep has more parameters i can easier independantly decide to tune specific endgames. More possibilities also mean more problems for tuning though.

Tuning Diep is far more complex i'd say - even then the effort needed is similar in terms of expertise.
Last edited by diep on Sat Aug 11, 2012 5:49 pm, edited 1 time in total.

Post Reply