Most important eval elements

hgm · Post by **hgm** » Sat Sep 18, 2010 10:57 am

Bob posted once that having the Queen value off by 2 Pawns (i.e. using 750 in stead of 950) would only cost about 10 Elo...

Having the Bishop base value equal to the Knight, rather than half a Pawn higher, seemed to cost Joker80 nearly 100 Elo in Gothic Chess, though.

Mincho Georgiev · Post by **Mincho Georgiev** » Sat Sep 18, 2010 12:10 pm

The most important of all (IMO) is the correlation between the evaluation parameters values. For example, if we had A and A' engines,
both maintaining the same evaluation terms, but in program A they are well balanced with each other, while in A' - rough and disbalanced
at the same time, there will be a SIGNIFICANT strength difference. When I added for the first time center control to my mobility functions for example,
that brought me pure 40 ELO gain. Now, I am very far from the thought that the E4,E5,D4,D5 and their 1st neighboring squares control gives 40 elo by itself,
just in my code the value of controlling them is balancing the final midgame value better that without it.
Here is a paradox by itself: If you add a term to your evaluation, which is inappropriate with the chess principles but it doesn't make significant difference for the final result, you may actually receive an ELO gain, despite that it adds a wrong +/- to the score, just because it may correct a part of the evaluation,wrongly caused by other term. The conclusion - the evaluation parameters ARE correlated and cannot be examined independently with high confidence.

Dann Corbit · Post by **Dann Corbit** » Sat Sep 18, 2010 12:40 pm

silentshark wrote:
Dann Corbit wrote:
phhnguyen wrote:
silentshark wrote:Hi all,

I've been looking again at my eval. It got me thinking, what are the most important things to include in an evaluation function, and what are they worth, ELO wise? I appreciate this isn't an exact science.
IMO, the most important thing is very clear and simple which programmers usually forget: materials. It can cost 99% of a program's ELO if wrongly set.
And yet it is sometimes surprising how well a program can play with the evaluation constants badly set away from their correct values of:

opening:
Pawn = 100
Knight = 413
Bishop = 422
Rook = 641
Queen =1273

endgame:
Pawn = 130
Knight = 427
Bishop = 433
Rook = 645
Queen =1292
Interesting.. why do you feel these are correct.

I've been using

pawn=100
knight=bishop=400
rook=600
queen=1200

for a long time now, but maybe these values are smarter.

The strongest open source program uses those values. Actually, the exact weights will depend on other details of your evaluation (bad bishop, bishop pair, knight outpost implementation clearly change the value of these chess pieces along with many other factors). I also believe that depth of search mitigates the importance of getting the right values. I theorize that if you could search infintely deep the only chessman that needs any value at all is the king.

Piotr Cichy · Post by **Piotr Cichy** » Sat Sep 18, 2010 5:34 pm

silentshark wrote:Hi all,

I've been looking again at my eval. It got me thinking, what are the most important things to include in an evaluation function, and what are they worth, ELO wise? I appreciate this isn't an exact science.

For instance, how many ELO points will you engine gain if it understands about isolated pawns? How many will it gain if it understands the concept of a bishop pair? Rooks on the 7th rank? Mobility?

I'm sure there are many other variables.. search being one.. but it'd be fascinating to take a certain engine, strip out most of it's eval, work out it's ELO, then add in the various "standard" eval elements seeing the effect on ELO..

About 2 years ago I made some tests for my engine nanoSzachy. I tested some more general terms (like pawn structure rather than double or isolated pawns). In each test I blocked one of eval's term and played a match between that version and original one. The match was 1000-3000 games in very fast time control 100K nodes per move. Then I converted match result into ELO gain. Here are the results:

Code: Select all

Mobility                                                 +130 ELO
Passed pawns                                              +60 ELO
Positional &#40;rook on &#40;half&#41;open file, knight outpost etc&#41;  +60 ELO
Bishop pair                                               +45 ELO
PST                                                       +45 ELO
Pawn shield around king                                   +21 ELO
Pawn structure                                            +20 ELO
King safety                                                +3 ELO

Note, that with 1000-3000 games the results are not accurate, just estimations. Note also, that my engine has very simple evaluation, in other engines one can expect bigger ELO gain of the same eval term.

silentshark · Post by **silentshark** » Sat Sep 18, 2010 7:11 pm

Piotr Cichy wrote:
silentshark wrote:Hi all,

I've been looking again at my eval. It got me thinking, what are the most important things to include in an evaluation function, and what are they worth, ELO wise? I appreciate this isn't an exact science.

For instance, how many ELO points will you engine gain if it understands about isolated pawns? How many will it gain if it understands the concept of a bishop pair? Rooks on the 7th rank? Mobility?

I'm sure there are many other variables.. search being one.. but it'd be fascinating to take a certain engine, strip out most of it's eval, work out it's ELO, then add in the various "standard" eval elements seeing the effect on ELO..
About 2 years ago I made some tests for my engine nanoSzachy. I tested some more general terms (like pawn structure rather than double or isolated pawns). In each test I blocked one of eval's term and played a match between that version and original one. The match was 1000-3000 games in very fast time control 100K nodes per move. Then I converted match result into ELO gain. Here are the results:
Code: Select all
Mobility                                                 +130 ELO
Passed pawns                                              +60 ELO
Positional &#40;rook on &#40;half&#41;open file, knight outpost etc&#41;  +60 ELO
Bishop pair                                               +45 ELO
PST                                                       +45 ELO
Pawn shield around king                                   +21 ELO
Pawn structure                                            +20 ELO
King safety                                                +3 ELO
Note, that with 1000-3000 games the results are not accurate, just estimations. Note also, that my engine has very simple evaluation, in other engines one can expect bigger ELO gain of the same eval term.

that's really interesting.. and surprising.. King safety only worth 3 ELO? I guess it depends on how sophisticated it is.

The other figures make interesting reading, too. Many thanks for posting them

Piotr Cichy · Post by **Piotr Cichy** » Sat Sep 18, 2010 7:40 pm

silentshark wrote:
Piotr Cichy wrote:
silentshark wrote:Hi all,

I've been looking again at my eval. It got me thinking, what are the most important things to include in an evaluation function, and what are they worth, ELO wise? I appreciate this isn't an exact science.

For instance, how many ELO points will you engine gain if it understands about isolated pawns? How many will it gain if it understands the concept of a bishop pair? Rooks on the 7th rank? Mobility?

I'm sure there are many other variables.. search being one.. but it'd be fascinating to take a certain engine, strip out most of it's eval, work out it's ELO, then add in the various "standard" eval elements seeing the effect on ELO..
About 2 years ago I made some tests for my engine nanoSzachy. I tested some more general terms (like pawn structure rather than double or isolated pawns). In each test I blocked one of eval's term and played a match between that version and original one. The match was 1000-3000 games in very fast time control 100K nodes per move. Then I converted match result into ELO gain. Here are the results:
Code: Select all
Mobility                                                 +130 ELO
Passed pawns                                              +60 ELO
Positional &#40;rook on &#40;half&#41;open file, knight outpost etc&#41;  +60 ELO
Bishop pair                                               +45 ELO
PST                                                       +45 ELO
Pawn shield around king                                   +21 ELO
Pawn structure                                            +20 ELO
King safety                                                +3 ELO
Note, that with 1000-3000 games the results are not accurate, just estimations. Note also, that my engine has very simple evaluation, in other engines one can expect bigger ELO gain of the same eval term.
that's really interesting.. and surprising.. King safety only worth 3 ELO? I guess it depends on how sophisticated it is.

The other figures make interesting reading, too. Many thanks for posting them

Yes, my king safety is very simple, based on number of attacked squares around king. But +3 ELO gain was very surprising to mee too. It may be the result of:
- not enough games in test
- wrong parameter tuning
- maybe king safety is not so important in blitz?
- maybe pawn shield is much more important than pressure on king?

I tried other, more sophisticated king safety evaluations, but never got satisfying results.

Uri Blass · Post by **Uri Blass** » Sat Sep 18, 2010 8:23 pm

Piotr Cichy wrote:
silentshark wrote:Hi all,

I've been looking again at my eval. It got me thinking, what are the most important things to include in an evaluation function, and what are they worth, ELO wise? I appreciate this isn't an exact science.

For instance, how many ELO points will you engine gain if it understands about isolated pawns? How many will it gain if it understands the concept of a bishop pair? Rooks on the 7th rank? Mobility?

I'm sure there are many other variables.. search being one.. but it'd be fascinating to take a certain engine, strip out most of it's eval, work out it's ELO, then add in the various "standard" eval elements seeing the effect on ELO..
About 2 years ago I made some tests for my engine nanoSzachy. I tested some more general terms (like pawn structure rather than double or isolated pawns). In each test I blocked one of eval's term and played a match between that version and original one. The match was 1000-3000 games in very fast time control 100K nodes per move. Then I converted match result into ELO gain. Here are the results:
Code: Select all
Mobility                                                 +130 ELO
Passed pawns                                              +60 ELO
Positional &#40;rook on &#40;half&#41;open file, knight outpost etc&#41;  +60 ELO
Bishop pair                                               +45 ELO
PST                                                       +45 ELO
Pawn shield around king                                   +21 ELO
Pawn structure                                            +20 ELO
King safety                                                +3 ELO
Note, that with 1000-3000 games the results are not accurate, just estimations. Note also, that my engine has very simple evaluation, in other engines one can expect bigger ELO gain of the same eval term.

I am sure that the value of PST is clearly more than 45 elo if you first remove the other positional factors(PST program is better than only material program by more than 45 elo).

I believe that it is only a private case and generally the value of a specific factor is lower when the program has more knowledge when the value of all the factors is clearly higher than the sum of the values of a single factor.

Don · Post by **Don** » Sun Sep 19, 2010 9:27 pm

silentshark wrote:Hi all,

I've been looking again at my eval. It got me thinking, what are the most important things to include in an evaluation function, and what are they worth, ELO wise? I appreciate this isn't an exact science.

For instance, how many ELO points will you engine gain if it understands about isolated pawns? How many will it gain if it understands the concept of a bishop pair? Rooks on the 7th rank? Mobility?

I'm sure there are many other variables.. search being one.. but it'd be fascinating to take a certain engine, strip out most of it's eval, work out it's ELO, then add in the various "standard" eval elements seeing the effect on ELO..

What matters and doesn't matter can be really surprising and of course impossible to measure with any real accuracy.

Material and some sort of general sense of where the pieces go, even if very crude are really important and everything else of lesser importance. "A general sense of where the pieces go" can be satisfied by a simple piece square table or it could be handled by a mobility function or a combination of both and of course this can be refined and improved on forever. A simple sense will not give very good play, but even if it's done badly it will add hundreds of ELO to an engine that only has material score.

The order you consider stuff will affect how important it "seems" to be because almost everything is at least slightly redundant. For example if you have only material and you add pawn structure, it will add a lot more ELO than if you do pawn structure later when you already have other things.

A lot of old programs did not have dynamic mobility but simulated it in other ways. Still, I think real mobility is pretty important on a deep searching modern program.

If all you did is material and good mobility the program starts to act like it knows how to play - you will see a lot of good looking moves but still you will see a lot of moves that will make you cringe in horror.

When you get to pawn structure, you need to cover weak pawns and doubled pawns. I hesitate to say "backward" pawns because that has many definitions but a lot of pawns can be "weak" in some sense. You need to identify those cases and cover them. Isolated pawns, etc.

Passed pawns is pretty important too and getting that done well. King safety is huge too. It's really impossible to answer your question with anything very definitive because everything works together. I keep finding that I can improve the evaluation and have probably squeezed over 100 ELO out of evaluation beyond the point that I thought it was already very good. I was either naive to think it was good, or perhaps there is a never ending source of improvement possible by constant work on the evaluation function.

In my mind, the old idea that you could write a strong chess program by building a super search combined with a "relatively simple" evaluation function is wrong. I have seen non-computer chess people make statement like that. I think it's not possible without a really good evaluation function.

We found that a lot of things that you might believe are really important cannot be measured as an improvement even though you might think it should be huge.

silentshark · Post by **silentshark** » Mon Sep 20, 2010 2:29 pm

Don wrote:
silentshark wrote:Hi all,

I've been looking again at my eval. It got me thinking, what are the most important things to include in an evaluation function, and what are they worth, ELO wise? I appreciate this isn't an exact science.

For instance, how many ELO points will you engine gain if it understands about isolated pawns? How many will it gain if it understands the concept of a bishop pair? Rooks on the 7th rank? Mobility?

I'm sure there are many other variables.. search being one.. but it'd be fascinating to take a certain engine, strip out most of it's eval, work out it's ELO, then add in the various "standard" eval elements seeing the effect on ELO..
What matters and doesn't matter can be really surprising and of course impossible to measure with any real accuracy.

Material and some sort of general sense of where the pieces go, even if very crude are really important and everything else of lesser importance. "A general sense of where the pieces go" can be satisfied by a simple piece square table or it could be handled by a mobility function or a combination of both and of course this can be refined and improved on forever. A simple sense will not give very good play, but even if it's done badly it will add hundreds of ELO to an engine that only has material score.

The order you consider stuff will affect how important it "seems" to be because almost everything is at least slightly redundant. For example if you have only material and you add pawn structure, it will add a lot more ELO than if you do pawn structure later when you already have other things.

A lot of old programs did not have dynamic mobility but simulated it in other ways. Still, I think real mobility is pretty important on a deep searching modern program.

If all you did is material and good mobility the program starts to act like it knows how to play - you will see a lot of good looking moves but still you will see a lot of moves that will make you cringe in horror.

When you get to pawn structure, you need to cover weak pawns and doubled pawns. I hesitate to say "backward" pawns because that has many definitions but a lot of pawns can be "weak" in some sense. You need to identify those cases and cover them. Isolated pawns, etc.

Passed pawns is pretty important too and getting that done well. King safety is huge too. It's really impossible to answer your question with anything very definitive because everything works together. I keep finding that I can improve the evaluation and have probably squeezed over 100 ELO out of evaluation beyond the point that I thought it was already very good. I was either naive to think it was good, or perhaps there is a never ending source of improvement possible by constant work on the evaluation function.

In my mind, the old idea that you could write a strong chess program by building a super search combined with a "relatively simple" evaluation function is wrong. I have seen non-computer chess people make statement like that. I think it's not possible without a really good evaluation function.

We found that a lot of things that you might believe are really important cannot be measured as an improvement even though you might think it should be huge.

This is an extremely interesting post. Particularly the last paragraph! Do you care to share any of these things "which should be good, but aren't"?

I find it fascinating that mobility is now seen - pretty much - as a must have. Back in the 1990's, there were some very strong programs which had no concept of mobility. I remember that Ferret was one of these. There was an amusing incident (I think 1995, Paderborn), which went along the lines of:

David Levy: "that's an interesting move. Looks like Ferret is trying to increase the mobility of its rooks"
Bruce Moreland: "Ferret has no concept of mobility"
David Levy: "oh.."

Don · Post by **Don** » Mon Sep 20, 2010 2:43 pm

silentshark wrote:
Don wrote:
silentshark wrote:Hi all,

I've been looking again at my eval. It got me thinking, what are the most important things to include in an evaluation function, and what are they worth, ELO wise? I appreciate this isn't an exact science.

For instance, how many ELO points will you engine gain if it understands about isolated pawns? How many will it gain if it understands the concept of a bishop pair? Rooks on the 7th rank? Mobility?

I'm sure there are many other variables.. search being one.. but it'd be fascinating to take a certain engine, strip out most of it's eval, work out it's ELO, then add in the various "standard" eval elements seeing the effect on ELO..
What matters and doesn't matter can be really surprising and of course impossible to measure with any real accuracy.

Material and some sort of general sense of where the pieces go, even if very crude are really important and everything else of lesser importance. "A general sense of where the pieces go" can be satisfied by a simple piece square table or it could be handled by a mobility function or a combination of both and of course this can be refined and improved on forever. A simple sense will not give very good play, but even if it's done badly it will add hundreds of ELO to an engine that only has material score.

The order you consider stuff will affect how important it "seems" to be because almost everything is at least slightly redundant. For example if you have only material and you add pawn structure, it will add a lot more ELO than if you do pawn structure later when you already have other things.

A lot of old programs did not have dynamic mobility but simulated it in other ways. Still, I think real mobility is pretty important on a deep searching modern program.

If all you did is material and good mobility the program starts to act like it knows how to play - you will see a lot of good looking moves but still you will see a lot of moves that will make you cringe in horror.

When you get to pawn structure, you need to cover weak pawns and doubled pawns. I hesitate to say "backward" pawns because that has many definitions but a lot of pawns can be "weak" in some sense. You need to identify those cases and cover them. Isolated pawns, etc.

Passed pawns is pretty important too and getting that done well. King safety is huge too. It's really impossible to answer your question with anything very definitive because everything works together. I keep finding that I can improve the evaluation and have probably squeezed over 100 ELO out of evaluation beyond the point that I thought it was already very good. I was either naive to think it was good, or perhaps there is a never ending source of improvement possible by constant work on the evaluation function.

In my mind, the old idea that you could write a strong chess program by building a super search combined with a "relatively simple" evaluation function is wrong. I have seen non-computer chess people make statement like that. I think it's not possible without a really good evaluation function.

We found that a lot of things that you might believe are really important cannot be measured as an improvement even though you might think it should be huge.
This is an extremely interesting post. Particularly the last paragraph! Do you care to share any of these things "which should be good, but aren't"?

I think most of them has been talked about. One example is the king and pawn vs king database. A very tiny, small and cheap database that give you perfect evaluation when you reach this ending. A master told me years ago that every good player should know this because it's so fundamental, the most common win is being a pawn up and one technique when a pawn up is to trade down to this. But when tested I have not been able to demonstrate that it makes the program stronger.

This is also true of a bunch of fundamental drawn endings, such as king+night vs king. Cannot e won but if your program doesn't know that it might trade a won ending for this, since it's a piece up! But once again I cannot prove this helps the program.

However, all of these endings makes the program look better.

A lot of positional terms that seem really important often do not help the program. A LOT of times I think it's because the concept is partially handled by other terms.

An example of one that helped doch a LOT was pawn mobility. When Larry said we should implement it I figured another waste of time (most ideas do not pay off.) It is defined simply as having an empty square in front of a pawn. This turned out to be worth more than 10 ELO measured with a high degree of accuracy (tens of thousands of games.) This same term might not help some other program that has a different evaluation. Maybe it addressed something else that should be in the program.

I find it fascinating that mobility is now seen - pretty much - as a must have. Back in the 1990's, there were some very strong programs which had no concept of mobility. I remember that Ferret was one of these. There was an amusing incident (I think 1995, Paderborn), which went along the lines of:

David Levy: "that's an interesting move. Looks like Ferret is trying to increase the mobility of its rooks"
Bruce Moreland: "Ferret has no concept of mobility"
David Levy: "oh.."

Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements

Re: Most important eval elements