Most important eval elements

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10309
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Most important eval elements

Post by Uri Blass »

OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.
Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1


Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Most important eval elements

Post by OliverBr »

Uri Blass wrote:
OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.
Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1


Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285
It would be great to quote the newest results, that say ELO 2520 for OliThink. Other ELO rating systems say ELO>2500, too:

http://computerchess.org.uk/ccrl/404/cg ... 0%2064-bit
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Most important eval elements

Post by Dann Corbit »

Don wrote:
OliverBr wrote:
bhlangonijr wrote: I think what you are saying is not 100% accurate as your evaluation function takes into account free and hanging pawns too, right?
Correct, I mentioned it in my source code, that is the only non-mobil eval. But actually it's an indirect mobility eval. Hanging Pawn can't move, so very bad mobility and free pawn have a good mobility, of course.

I plan to implement it that way and so I have the 100% mobility eval.
Does your program count material? Then it's still not 100% mobility.
It does count material, but it does not use piece square tables. I suspect that for a program this strong, it contains the minimum evaluation.
{snip}
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Most important eval elements

Post by michiguel »

bhlangonijr wrote:
Don wrote: It makes me wonder why you include material in your evaluation since mobility also crudely approximates material. Why did you make that concession to simplicity but not others? You don't mind having a program that is much weaker than it needs to be to keep it conceptually simple, but you were not willing to go all the way. I'm not criticizing you, I think it's cool but I still wonder why you chose to do it like you did. For example you could have included a few cheap evaluation terms and added 100-200 ELO without adding much to the program.
Don, I am not Oliver but I think it is an easy question to answer. There are some authors who doesn't have as main goal gaining more Elo points no matter what.
For example, some authors valuates more the reliability of their programs even at the cost of some sacrifice of overall performance. I think Gaviota from Miguel is one of them.
True, in terms of time spent by me, not necessarily the code. Any time I spend on this, is less time I spend on improving the search :-( Programmer's time is an important parameter that generally it is not consider into the equation For instance, 2 ELO points are important, but not if you spent 1 year coding the change. But I am willing to spent some time in tools or code that will catch bugs or decrease the chances to lose on time etc. If my engine crashes, I take it as a failure, not as 0.1 elo point out of the 999 games it does not crash. In terms of code, for instance, I am not willing to sacrifice a sane long PV for 3 elo points.

I think it is very good to have engines with different priorities. All of them are living experiments with different styles. Olithink shows us how important is mobility. Micromax shows us it is possible to be concise in an amazing way. I think that asking Oliver to add PSTs in order to increase 20 ELO points is like asking HG to increase 20 point of micromax adding 1k lines of code :-) It is not going to happen, and probably shouldn't.

Miguel

OliThink, Sungorus and Micro-Max are in my list of best chess engines because it seems the authors have the goal of creating very strong and minimalist chess programs. Just to reinforce Olithink is competing with Crafty 10 who has tens of thousands more lines of code. :) Well I think we can learn something from that.
I read in some posts that Bob took out from Crafty the fractional plies and all non-check extensions code and his program is doing well. Maybe it really doesn't help much and we are just adding junk code to our programs.

Regards,
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Most important eval elements

Post by michiguel »

Don wrote:
OliverBr wrote:
Don wrote: It makes me wonder why you include material in your evaluation since mobility also crudely approximates material. Why did you make that concession to simplicity but not others?
Actually I have tried this one and it doesn't work.

Best example is the Rook.

The Rook has a crappy mobility the first 15-20 moves of the game, yet he has a high value and you won't want to give it away at any time of the game. You must (!) give him some value even though he has 0 mobility. That is called material value.

His mobility value comes very, very late in the game, far beyond any search horizon. So the approach just doesn't work. The Rook is only the clearest example for this.

Edit: Of course in the case that search depth is much, much bigger than now we will be able to rethink doing this approach. Anyway I personally think, that there is a limit(frontier) of searching depth and we will never reach this.
I think the relevant thing is how accurate the evaluation function is. I would not be looking for more and more excuses to drop features thinking that search will make up for it.

Based on what I have learned in the past 2 or 3 years, we want MORE evaluation in Komodo, not less. I too used to be more brute force in my thinking but I have changed :-)
I am glad to read this because It makes me feel I am not crazy (or not the only one). I too believe that there is a big future in improving evaluations.

Miguel

Have you considered a pure material only search? I wonder how deeply you must search before the program would have reasonable looking PV's in the first 3 or 4 ply?
Uri Blass
Posts: 10309
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Most important eval elements

Post by Uri Blass »

OliverBr wrote:
Uri Blass wrote:
OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.
Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1


Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285
It would be great to quote the newest results, that say ELO 2520 for OliThink. Other ELO rating systems say ELO>2500, too:

http://computerchess.org.uk/ccrl/404/cg ... etails.cgi?
print=Details&each_game=1&eng=OliThink%205.3.0%2064-bit
Your list is 40/4 and I talked about 40/40
It seems that Olithink is relatively weaker at long time control(or maybe at 40/40 the 32 bit version was tested) but even at 40/4 Crafty20.14 is clearly better even when I compare Crafty 32 bits with Olithink 64 bits

http://computerchess.org.uk/ccrl/404.li ... t_all.html

Crafty 20.14 32-bit 2628 +26 −26 38.2% +89.5 23.4% 561
OliThink 5.3.0 64-bit 2524 +16 −16 46.7% +24.0 23.6% 1493
jarkkop
Posts: 198
Joined: Thu Mar 09, 2006 2:44 am
Location: Helsinki, Finland

Re: Most important eval elements

Post by jarkkop »

That would indicate that the slope of Crafty's evaluation function is very low and it could be the problem that hinders it reach its full potential.

Can evaluation function somehow be made narrower?

At least the top programs can't allow this wide material fluctuation to only cost 20 ELO.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Most important eval elements

Post by Don »

Uri Blass wrote:
OliverBr wrote:
Uri Blass wrote:
OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.
Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1


Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285
It would be great to quote the newest results, that say ELO 2520 for OliThink. Other ELO rating systems say ELO>2500, too:

http://computerchess.org.uk/ccrl/404/cg ... etails.cgi?
print=Details&each_game=1&eng=OliThink%205.3.0%2064-bit
Your list is 40/4 and I talked about 40/40
It seems that Olithink is relatively weaker at long time control(or maybe at 40/40 the 32 bit version was tested) but even at 40/4 Crafty20.14 is clearly better even when I compare Crafty 32 bits with Olithink 64 bits

http://computerchess.org.uk/ccrl/404.li ... t_all.html

Crafty 20.14 32-bit 2628 +26 −26 38.2% +89.5 23.4% 561
OliThink 5.3.0 64-bit 2524 +16 −16 46.7% +24.0 23.6% 1493
I believe that evaluation scales. Hans Berliner did an experiment many years ago that gave some empirical evidence that the program with a better evaluation function scales better. It was the hitech vs lotech study he did.

Unfortunately, the number of games he played were not enough to give convincing statistical proof.

But I believe that it is probably true. Imagine a program that does only material and mobility trying to do aggressive LMR. If the program relies on depth to discoverer positions "truths" then it seems to me that it would suffer more from reductions.
jacobbl
Posts: 80
Joined: Wed Feb 17, 2010 3:57 pm

Re: Most important eval elements

Post by jacobbl »

It's impressing to hear how much OilThink gets from mobility. In my engine (Sjakk) I don't get much gain from mobility at all. So I was wondering if anyone could give me some advice on what are the most important factors to have in a mobility evaluation?

Regards
Jacob
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Most important eval elements

Post by Michael Sherwin »

Don wrote:
Uri Blass wrote:
OliverBr wrote:
Uri Blass wrote:
OliverBr wrote:
Don wrote: I am surprised to hear that Oli is the same as Crafty in strength. I see a 350 ELO difference looking at one of the rating lists for single CPU programs. Is Olithink under a different name?
OliThink 5.3.0 has ELO 2500, so has Crafty 20.14.
Newer versions of Crafty are definitely stronger, I would say about 2750, so they are 250 points better.
Looking at the CCRL 40/40 rating list I do not see it and I see that Crafty20.14 is more than 100 elo stronger even after considering the possible statistical error.

http://computerchess.org.uk/ccrl/4040.l ... ons_only=1


Crafty 20.14 32-bit 2631 +34 −34 41.9% +56.6 32.4% 296
OliThink 5.3.0 2459 +35 −35 47.7% +16.1 26.0% 285
It would be great to quote the newest results, that say ELO 2520 for OliThink. Other ELO rating systems say ELO>2500, too:

http://computerchess.org.uk/ccrl/404/cg ... etails.cgi?
print=Details&each_game=1&eng=OliThink%205.3.0%2064-bit
Your list is 40/4 and I talked about 40/40
It seems that Olithink is relatively weaker at long time control(or maybe at 40/40 the 32 bit version was tested) but even at 40/4 Crafty20.14 is clearly better even when I compare Crafty 32 bits with Olithink 64 bits

http://computerchess.org.uk/ccrl/404.li ... t_all.html

Crafty 20.14 32-bit 2628 +26 −26 38.2% +89.5 23.4% 561
OliThink 5.3.0 64-bit 2524 +16 −16 46.7% +24.0 23.6% 1493
I believe that evaluation scales. Hans Berliner did an experiment many years ago that gave some empirical evidence that the program with a better evaluation function scales better. It was the hitech vs lotech study he did.

Unfortunately, the number of games he played were not enough to give convincing statistical proof.

But I believe that it is probably true. Imagine a program that does only material and mobility trying to do aggressive LMR. If the program relies on depth to discoverer positions "truths" then it seems to me that it would suffer more from reductions.
IIRC, Oliver added more aggressive reductions (LMR and Null Move) for Olithink ver 5.30 (5.29) and got a rather large ELO gain.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through