How effective is move ordering from TT?

Rebel · Post by **Rebel** » Sat Aug 11, 2012 8:38 am

Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.

I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.

Rebel · Post by **Rebel** » Sat Aug 11, 2012 9:14 am

hgm wrote:Micro-Max has a very poor move ordering (no killer, no history; it just does 1) null move, 2) hash move, 3) best MVV/LVA capture, 4) other captures in unspecified order, 5) non-captures in unspecified order). Yet LMR worked great. It reduces all non-captures that are not Pawn moves.

LMR might give no doubt, but the better the move ordering the better LMR will perform. Much of course will depend on the scheme in use for 1.0, 1.5, 2.0, 2.5, 3.0 reductions in relationship with the number of the move list. I am not using the latter myself but that's what I have understood what many programs do nowadays.

Desperado · Post by **Desperado** » Sat Aug 11, 2012 10:54 am

Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.

here are just some thoughts on that:

what does mean "best" ?

here are some contents i believe a good evaluation function should include,
are essential somehow for the definition of "best",
but would not be measured in such a test frame.

* The evaluation "score" is not the only search controller.
Many evaluations may set flags to allow nullmove,lmr or whatever,
controlled by little parts (eg kingSafty) of the complete evaluation.
Such things wont be measured in that frame (one ply search).

* another point is the move ordering. Ordering the moves at the root
is simply done differently than in the complete search. In a search
tree heuristics like history are effected by the evaluation of course, too.
Doesnt matter if there is direct influence or somehow indirect.
This test would fail here too.

* not to talk about the implementation of features with respect to time
costs...

* to have a common QS is by far not enough. There are many other "little"
things which have to be considered, IMO. eg: search all moves with an
open window, or use nullwindow search.

Finally, IMO the better solution would be, to have, lets say, a
common engine, with an "Interface" for the evaluation.
So, we would come closer to what the effect of the evaluation finally is.
(But again, this would not catch some of the thoughts i already
mentioned above).

* well, the intention to know which evaluation represents the "best
chess knowledge" is only one attribute, but unfortunatelly does not
suffice to tell us sth about its real target(s).
Namely to guide the search into the right direction as fast as possible.
Or simply to transform a tree into a perfect ordered tree (as good as
possible) i am simply not sure that best chess knowledge and guiding
the search into the right direction are really congruent contents.

i really would like to see such a "fun" test, BUT to conclude which
is the "best" evaluation, will simply fail.
if all these points doesnt matter at all, i would suggest to
create a common engine (EvalTester) with an interface for the evaluation components.
Then you can say we are close as possible, with the only
difference produced by an evaluation score.

Just some thoughts adhoc...

Michael

Don · Post by **Don** » Sat Aug 11, 2012 2:29 pm

hgm wrote:Micro-Max has a very poor move ordering (no killer, no history; it just does 1) null move, 2) hash move, 3) best MVV/LVA capture, 4) other captures in unspecified order, 5) non-captures in unspecified order). Yet LMR worked great. It reduces all non-captures that are not Pawn moves.

I think LMR works on almost any program, but how well it works is the issue and what you have to do to make it work and how aggressive you can be. I remember when I was first experimenting with an ancient version of Doch, had a difficult time proving that it helped but it was a primitive program, with primitive evaluation and move ordering. I remember also that there were people claiming that it was worth very little. In fact I think I remember you saying it was questionable whether it helped your program - am I remember that correctly? That would have been a few years ago and I cannot find it but I clearly remember you making some such comment.

Don · Post by **Don** » Sat Aug 11, 2012 2:38 pm

Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.

Would be very difficult to do this and guarantee fairness. To attempt to do this fairly the program would have to be supplied to them and they provide only the evaluation function as an independent module. However it would be difficult to prevent a contenstant from building a "recusive" evaluation function - something that actually does a small search of it's own! So given a 1 ply search, if my evaluation function also did a 1 ply search secretly it would crush any of the other programs given that tactics is so dominant at these depths.

Anyway, I like your idea but I think it would be difficult getting enough people interested in doing the work. But as you say, even if we each modify our own programs using "scout's honor" rules we would have to agree on the exact form of the 1 ply search. It would be someone of a pain for me because we have some quies extensions and such that would have to be carefully eliminated.

diep · Post by **diep** » Sat Aug 11, 2012 2:52 pm

lkaufman wrote:
diep wrote:If we would do next experiment Diep versus Komodo. We both remove our piece square tables and have the engines play.

Komodo probably loses 1000 elopoints or so then playing Diep. Diep will lose really a lot less than Komodo elowise. It's still 2600+ then or so in case of Diep i bet.

It's not even a contest. It's gonna be a 100% butchering of Komodo.

Now you might say this is not fair compare, and it isn't, as i have far more chessknowledge in Diep than you have in Komodo, probably a factor 100+ more or so. Yet that should give you an indication that ordering Diep's moves doesn't work with a simple heuristic like history table, which basically reflects your piece square tables and tuning for that given position.
Just for fun I decided to check out your claim. I'll assume for round numbers that Diep on one core is 2600 by IPON standards (correct me if I'm way off) and we'll round Komodo down to 3000. I made a version of Komodo 5 with all the piece square tables zeroed out, and rated it against the normal Komodo at 10" + 0.1". Fast games, but still average depth in the 11 to 12 ply range, probably good enough to earn the IM title in human tournaments, or at least FM. The elo loss was 190, giving it a rating of 2810, still miles above DIEP with piece square tables. I only played a couple hundred games, so the margin of error is large, but I'm not interested in a precise rating, just ballpark. But even this is not fair, because Komodo does depend on the tables somewhat for the king; in particular it may not want to castle without piece square tables. So I made another version that restored piece square tables just for the king, and the rating came out 2920, only 80 below normal and 320 above DIEP. Now I admit that at longer time limits the elo loss might be somewhat greater, but I doubt that it would grow by more than 50% at a time limit such as IPON uses for example (5' +3"). As the time limit grows, the number of draws increases, so this should counteract any tendency of the elo loss to grow with depth. Also the rating loss was probably much exaggerated by the self-testing, as you yourself have often pointed out. If both were run against DIEP, the loss should be much smaller.
So while Komodo may lose far more than DIEP if piece square tables are removed (I'll take your word for this), the loss will still leave Komodo far above DIEP, with or without such tables.

Please email me the version of Komodo 5 without piece square tables. I'll carefully check then whether you have turned off the PSQ's, as i simply don't believe your claim here.

Also you 'test' it by accident within a few minutes after my posting. No one will believe you.

You're kind of wrong about Diep's elo by the way. A 500 elopoints or so.

I have plenty of 8 core Xeon machines here to test. Each program can get its own machine of course.

Don · Post by **Don** » Sat Aug 11, 2012 2:54 pm

diep wrote:
lkaufman wrote:
diep wrote:If we would do next experiment Diep versus Komodo. We both remove our piece square tables and have the engines play.

Komodo probably loses 1000 elopoints or so then playing Diep. Diep will lose really a lot less than Komodo elowise. It's still 2600+ then or so in case of Diep i bet.

It's not even a contest. It's gonna be a 100% butchering of Komodo.

Now you might say this is not fair compare, and it isn't, as i have far more chessknowledge in Diep than you have in Komodo, probably a factor 100+ more or so. Yet that should give you an indication that ordering Diep's moves doesn't work with a simple heuristic like history table, which basically reflects your piece square tables and tuning for that given position.
Just for fun I decided to check out your claim. I'll assume for round numbers that Diep on one core is 2600 by IPON standards (correct me if I'm way off) and we'll round Komodo down to 3000. I made a version of Komodo 5 with all the piece square tables zeroed out, and rated it against the normal Komodo at 10" + 0.1". Fast games, but still average depth in the 11 to 12 ply range, probably good enough to earn the IM title in human tournaments, or at least FM. The elo loss was 190, giving it a rating of 2810, still miles above DIEP with piece square tables. I only played a couple hundred games, so the margin of error is large, but I'm not interested in a precise rating, just ballpark. But even this is not fair, because Komodo does depend on the tables somewhat for the king; in particular it may not want to castle without piece square tables. So I made another version that restored piece square tables just for the king, and the rating came out 2920, only 80 below normal and 320 above DIEP. Now I admit that at longer time limits the elo loss might be somewhat greater, but I doubt that it would grow by more than 50% at a time limit such as IPON uses for example (5' +3"). As the time limit grows, the number of draws increases, so this should counteract any tendency of the elo loss to grow with depth. Also the rating loss was probably much exaggerated by the self-testing, as you yourself have often pointed out. If both were run against DIEP, the loss should be much smaller.
So while Komodo may lose far more than DIEP if piece square tables are removed (I'll take your word for this), the loss will still leave Komodo far above DIEP, with or without such tables.
Please email me the version of Komodo 5 without piece square tables.

You're kind of wrong about Diep's elo by the way. A 500 elopoints or so.

Send me your best Diep and I will check it out.

Don

diep · Post by **diep** » Sat Aug 11, 2012 2:57 pm

My claim is a simple one. Komodo relies upon getting huge search depth with accurate tuned PSQ's.

Whatever nonsense you posted here directy in response to me, claiming that 'by accident' you had tested this - the assembler of your engine will tell another story.

As for Diep it is a private engine.

diep · Post by **diep** » Sat Aug 11, 2012 3:02 pm

Don wrote:
Rebel wrote:
Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?

Its basic framework:

1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.

25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.

Details to be worked out of course.
Would be very difficult to do this and guarantee fairness. To attempt to do this fairly the program would have to be supplied to them and they provide only the evaluation function as an independent module. However it would be difficult to prevent a contenstant from building a "recusive" evaluation function - something that actually does a small search of it's own! So given a 1 ply search, if my evaluation function also did a 1 ply search secretly it would crush any of the other programs given that tactics is so dominant at these depths.

Anyway, I like your idea but I think it would be difficult getting enough people interested in doing the work. But as you say, even if we each modify our own programs using "scout's honor" rules we would have to agree on the exact form of the 1 ply search. It would be someone of a pain for me because we have some quies extensions and such that would have to be carefully eliminated.

I directly take it.

0 extensions, just normal qsearch. Note diep's qsearch can only do captures and a few giving checks and i severely limit those. I wouldn't modify it. All i would need to turn off is the check extensions in fact.

Not sure about Komodo - most of the derivatives produced now, not sure where Komodo is in that list - they extend the PV a lot. Diep is not doing that at all.

This is an easy manner to see how your beancounter is doing against a program that's having really some chessknowledge using mobility.

Realize accurate tuning is a big advantage in such contests - diep is not automatic tuned so far. Only weeded out buggy patterns.

If i were you i'd take it Don.

We can ship Ed a version doing a 1 ply search. I would allow Ed to verify that the engines aren't cheating (meaning allowance to use a disassembler only in order to verify the engines aren't cheating).

Don · Post by **Don** » Sat Aug 11, 2012 3:11 pm

diep wrote:My claim is a simple one. Komodo relies upon getting huge search depth with accurate tuned PSQ's.

Whatever nonsense you posted here directy in response to me, claiming that 'by accident' you had tested this - the assembler of your engine will tell another story.

As for Diep it is a private engine.

You are being ridiculous, trying to claim you have the best program in the world without actually exposing it to real testing. I think Diep would have no chance whatsoever against Komodo at long time control and I'm willing to let a third party test this assertion. Are you?

How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?

Re: How effective is move ordering from TT?