Most important eval elements

Uri Blass · Post by **Uri Blass** » Mon Sep 20, 2010 8:47 pm

OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.

Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1

Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285

OliverBr · Post by **OliverBr** » Mon Sep 20, 2010 8:51 pm

Uri Blass wrote:
OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.
Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1

Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285

It would be great to quote the newest results, that say ELO 2520 for OliThink. Other ELO rating systems say ELO>2500, too:

http://computerchess.org.uk/ccrl/404/cg ... 0%2064-bit

Dann Corbit · Post by **Dann Corbit** » Mon Sep 20, 2010 9:42 pm

Don wrote:
OliverBr wrote:
bhlangonijr wrote: I think what you are saying is not 100% accurate as your evaluation function takes into account free and hanging pawns too, right?
Correct, I mentioned it in my source code, that is the only non-mobil eval. But actually it's an indirect mobility eval. Hanging Pawn can't move, so very bad mobility and free pawn have a good mobility, of course.

I plan to implement it that way and so I have the 100% mobility eval.
Does your program count material? Then it's still not 100% mobility.
It does count material, but it does not use piece square tables. I suspect that for a program this strong, it contains the minimum evaluation.
{snip}

michiguel · Post by **michiguel** » Mon Sep 20, 2010 9:52 pm

bhlangonijr wrote:
Don wrote: It makes me wonder why you include material in your evaluation since mobility also crudely approximates material. Why did you make that concession to simplicity but not others? You don't mind having a program that is much weaker than it needs to be to keep it conceptually simple, but you were not willing to go all the way. I'm not criticizing you, I think it's cool but I still wonder why you chose to do it like you did. For example you could have included a few cheap evaluation terms and added 100-200 ELO without adding much to the program.
Don, I am not Oliver but I think it is an easy question to answer. There are some authors who doesn't have as main goal gaining more Elo points no matter what.
For example, some authors valuates more the reliability of their programs even at the cost of some sacrifice of overall performance. I think Gaviota from Miguel is one of them.

True, in terms of time spent by me, not necessarily the code. Any time I spend on this, is less time I spend on improving the search

Programmer's time is an important parameter that generally it is not consider into the equation For instance, 2 ELO points are important, but not if you spent 1 year coding the change. But I am willing to spent some time in tools or code that will catch bugs or decrease the chances to lose on time etc. If my engine crashes, I take it as a failure, not as 0.1 elo point out of the 999 games it does not crash. In terms of code, for instance, I am not willing to sacrifice a sane long PV for 3 elo points.

I think it is very good to have engines with different priorities. All of them are living experiments with different styles. Olithink shows us how important is mobility. Micromax shows us it is possible to be concise in an amazing way. I think that asking Oliver to add PSTs in order to increase 20 ELO points is like asking HG to increase 20 point of micromax adding 1k lines of code

It is not going to happen, and probably shouldn't.

Miguel

OliThink, Sungorus and Micro-Max are in my list of best chess engines because it seems the authors have the goal of creating very strong and minimalist chess programs. Just to reinforce Olithink is competing with Crafty 10 who has tens of thousands more lines of code. Well I think we can learn something from that.
I read in some posts that Bob took out from Crafty the fractional plies and all non-check extensions code and his program is doing well. Maybe it really doesn't help much and we are just adding junk code to our programs.

Regards,

michiguel · Post by **michiguel** » Mon Sep 20, 2010 9:57 pm

Don wrote:
OliverBr wrote:
Don wrote: It makes me wonder why you include material in your evaluation since mobility also crudely approximates material. Why did you make that concession to simplicity but not others?
Actually I have tried this one and it doesn't work.

Best example is the Rook.

The Rook has a crappy mobility the first 15-20 moves of the game, yet he has a high value and you won't want to give it away at any time of the game. You must (!) give him some value even though he has 0 mobility. That is called material value.

His mobility value comes very, very late in the game, far beyond any search horizon. So the approach just doesn't work. The Rook is only the clearest example for this.

Edit: Of course in the case that search depth is much, much bigger than now we will be able to rethink doing this approach. Anyway I personally think, that there is a limit(frontier) of searching depth and we will never reach this.
I think the relevant thing is how accurate the evaluation function is. I would not be looking for more and more excuses to drop features thinking that search will make up for it.

Based on what I have learned in the past 2 or 3 years, we want MORE evaluation in Komodo, not less. I too used to be more brute force in my thinking but I have changed

I am glad to read this because It makes me feel I am not crazy (or not the only one). I too believe that there is a big future in improving evaluations.

Miguel

Have you considered a pure material only search? I wonder how deeply you must search before the program would have reasonable looking PV's in the first 3 or 4 ply?

Uri Blass · Post by **Uri Blass** » Tue Sep 21, 2010 12:35 am

OliverBr wrote:
Uri Blass wrote:
OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.
Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1

Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285
It would be great to quote the newest results, that say ELO 2520 for OliThink. Other ELO rating systems say ELO>2500, too:

http://computerchess.org.uk/ccrl/404/cg ... etails.cgi?
print=Details&each_game=1&eng=OliThink%205.3.0%2064-bit

Your list is 40/4 and I talked about 40/40
It seems that Olithink is relatively weaker at long time control(or maybe at 40/40 the 32 bit version was tested) but even at 40/4 Crafty20.14 is clearly better even when I compare Crafty 32 bits with Olithink 64 bits

http://computerchess.org.uk/ccrl/404.li ... t_all.html

Crafty 20.14 32-bit 2628 +26 −26 38.2% +89.5 23.4% 561
OliThink 5.3.0 64-bit 2524 +16 −16 46.7% +24.0 23.6% 1493

jarkkop · Post by **jarkkop** » Tue Sep 21, 2010 12:56 pm

That would indicate that the slope of Crafty's evaluation function is very low and it could be the problem that hinders it reach its full potential.

Can evaluation function somehow be made narrower?

At least the top programs can't allow this wide material fluctuation to only cost 20 ELO.

Don · Post by **Don** » Tue Sep 21, 2010 2:01 pm

Uri Blass wrote:
OliverBr wrote:
Uri Blass wrote:
OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.
Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1

Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285
It would be great to quote the newest results, that say ELO 2520 for OliThink. Other ELO rating systems say ELO>2500, too:

http://computerchess.org.uk/ccrl/404/cg ... etails.cgi?
print=Details&each_game=1&eng=OliThink%205.3.0%2064-bit
Your list is 40/4 and I talked about 40/40
It seems that Olithink is relatively weaker at long time control(or maybe at 40/40 the 32 bit version was tested) but even at 40/4 Crafty20.14 is clearly better even when I compare Crafty 32 bits with Olithink 64 bits

http://computerchess.org.uk/ccrl/404.li ... t_all.html

Crafty 20.14 32-bit 2628 +26 −26 38.2% +89.5 23.4% 561
OliThink 5.3.0 64-bit 2524 +16 −16 46.7% +24.0 23.6% 1493

I believe that evaluation scales. Hans Berliner did an experiment many years ago that gave some empirical evidence that the program with a better evaluation function scales better. It was the hitech vs lotech study he did.

Unfortunately, the number of games he played were not enough to give convincing statistical proof.

But I believe that it is probably true. Imagine a program that does only material and mobility trying to do aggressive LMR. If the program relies on depth to discoverer positions "truths" then it seems to me that it would suffer more from reductions.

jacobbl · Post by **jacobbl** » Wed Sep 22, 2010 4:02 pm

It's impressing to hear how much OilThink gets from mobility. In my engine (Sjakk) I don't get much gain from mobility at all. So I was wondering if anyone could give me some advice on what are the most important factors to have in a mobility evaluation?

Regards
Jacob

Michael Sherwin · Post by **Michael Sherwin** » Wed Sep 22, 2010 5:04 pm

Don wrote:
Uri Blass wrote:
OliverBr wrote:
Uri Blass wrote:
OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.
Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1

Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285
It would be great to quote the newest results, that say ELO 2520 for OliThink. Other ELO rating systems say ELO>2500, too:

http://computerchess.org.uk/ccrl/404/cg ... etails.cgi?
print=Details&each_game=1&eng=OliThink%205.3.0%2064-bit
Your list is 40/4 and I talked about 40/40
It seems that Olithink is relatively weaker at long time control(or maybe at 40/40 the 32 bit version was tested) but even at 40/4 Crafty20.14 is clearly better even when I compare Crafty 32 bits with Olithink 64 bits

http://computerchess.org.uk/ccrl/404.li ... t_all.html

Crafty 20.14 32-bit 2628 +26 −26 38.2% +89.5 23.4% 561
OliThink 5.3.0 64-bit 2524 +16 −16 46.7% +24.0 23.6% 1493
I believe that evaluation scales. Hans Berliner did an experiment many years ago that gave some empirical evidence that the program with a better evaluation function scales better. It was the hitech vs lotech study he did.

Unfortunately, the number of games he played were not enough to give convincing statistical proof.

But I believe that it is probably true. Imagine a program that does only material and mobility trying to do aggressive LMR. If the program relies on depth to discoverer positions "truths" then it seems to me that it would suffer more from reductions.

IIRC, Oliver added more aggressive reductions (LMR and Null Move) for Olithink ver 5.30 (5.29) and got a rather large ELO gain.

Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements