tactical play or positional play for chess engine

Jan Brouwer · Post by **Jan Brouwer** » Thu Aug 02, 2007 5:54 pm

Posts like Tord's can not easily be too long, in my opinion

.

The present evaluation function of my chess program (Rotor) is really a hack, and I am looking for just this kind of discussion
on how to develop a good evaluation function.
Searching through the archives, I found the following ideas:

- use symmetries to catch bugs
- divide evaluation features into "static" and "dynamic" features; static features (e.g. material imbalance, pawn structure)
change less oftern, are important, can be determined quite reliably, and can not easily be replaced by deeper search (Don Dailey)
- consider non-linear functions
- battle horizon effects by avoiding “step” functions, try to “smell” big changes in evaluation far in advance

One idea I had was to consider an evaluation feature to consist of the feature proper, and of a noise component.
Is it possible to measure how large the noisy part of a particular feature is?

And how do you measure the goodness of an evaluation function in general? By playing many games?
Vasik Rajlich wrote: “The key to having a good evaluation is coming up with some way to test it, piece by piece.
Self-play is not enough, you'll never play enough games to show a 10-point improvement.”

Roman Hartmann · Post by **Roman Hartmann** » Thu Aug 02, 2007 6:55 pm

Thanks a lot for this great post, Tord. I too made (or still make) some of the mistakes you mention.
Took me also quite some time to figure out that giving a bonus for something doesn't have to result in the same behaviour like giving a penalty for not doing something. Good explanation for that phenomenon from you side.

Roman

Tord Romstad · Post by **Tord Romstad** » Thu Aug 02, 2007 10:55 pm

Hello Jan,

Thanks to you, Harm Geert and Roman for your kind words! While writing long posts like the one you replied to, I sometimes wonder why I bother writing them, and fear that noone is going to do the effort of reading them. That intelligent people like you not only read what I write, but even appreciate it, is very encouraging.

Jan Brouwer wrote:One idea I had was to consider an evaluation feature to consist of the feature proper, and of a noise component.
Is it possible to measure how large the noisy part of a particular feature is?

I haven't tried, and I am not even sure I understand the idea correctly. Do you mean that each evaluation term should not only consist of a value, but also an estimate of its possible inaccuracy? Perhaps this might be useful, but I am not quite sure how I would use the information.

And how do you measure the goodness of an evaluation function in general? By playing many games?
Vasik Rajlich wrote: “The key to having a good evaluation is coming up with some way to test it, piece by piece.
Self-play is not enough, you'll never play enough games to show a 10-point improvement.”

This is a very difficult question, and I'm afraid I don't have any good answer. Quite often, I have to trust my intuition, and I am sure my intuition is very often wrong with respect to the evaluation function. It is quite common that some new piece of knowledge makes the program play "optically" better than before, in the sense that its play looks more intelligent and purposeful, even if the practical strength drops by a few Elo points. You can often see clearly that your program wins a few games because of the newly added knowledge, but it is not so easy to notice the many unexpected ways the newly added knowledge causes your program to lose games.

Tord

Michael Sherwin · Post by **Michael Sherwin** » Fri Aug 03, 2007 2:13 am

Thank you Tord for this reply to Stan Arts as I did not have the time to reply until now. You did a better Job than what I would have done anyway. I would just add that the eval can not be divorced from the time element. If the eval has fantastic and correct chess knowledge, but, is too slow then it is not a super good eval. It is one thing to have a slightly under par search and quite another to have a slightly under par search that is also crippled by a too slow eval.

Edit: If anyone thinks that I am contradicting myself to some degree then go back and look at my original post and you will see that I indicated that a super good eval did not necessarily mean complicated.

Harald · Post by **Harald** » Fri Aug 03, 2007 10:38 am

Tord Romstad wrote: Thanks to you, Harm Geert and Roman for your kind words! While writing long posts like the one you replied to, I sometimes wonder why I bother writing them, and fear that noone is going to do the effort of reading them. That intelligent people like you not only read what I write, but even appreciate it, is very encouraging.

I like to read posts with good technical informations in it. And even if I do
not answer them I store them in a big folder of chess programming on my hard disc. In this case in the subfolder 'evaluation'. In case I want to
improve or rewrite my own engine some day I have a lot of ideas to think
about. There is about 800 MByte of stuff in the folder and then there are
others with chess related papers, chess engines, sources and so on. I do
not believe I can read the 2.5 Gbyte in 23000 files in the rest of my life
but collecting is fun, too.

Harald

Jan Brouwer · Post by **Jan Brouwer** » Fri Aug 03, 2007 2:35 pm

Hi Tord,

I'm sure that quite some amateur chess engine authors like myself are eager to understand why Glaurung is so strong, and searches so efficiently!

Tord Romstad wrote:
Jan Brouwer wrote:One idea I had was to consider an evaluation feature to consist of the feature proper, and of a noise component.
Is it possible to measure how large the noisy part of a particular feature is?
I haven't tried, and I am not even sure I understand the idea correctly. Do you mean that each evaluation term should not only consist of a value, but also an estimate of its possible inaccuracy? Perhaps this might be useful, but I am not quite sure how I would use the information.

I was thinking about treating an evaluation feature as a "black box", apply a lot of smart number crunching to it, and out comes an answer like "this feature correlates 60% with playing strength, 40% is random noise". Now the only tricky part that remains is defining the number crunching needed

. If this were possible, it would provide a way of optimizing evalutaion features. Anyway, it is just a vague idea

.

Tord Romstad wrote:
And how do you measure the goodness of an evaluation function in general? By playing many games?
Vasik Rajlich wrote: “The key to having a good evaluation is coming up with some way to test it, piece by piece.
Self-play is not enough, you'll never play enough games to show a 10-point improvement.”
This is a very difficult question, and I'm afraid I don't have any good answer. Quite often, I have to trust my intuition, and I am sure my intuition is very often wrong with respect to the evaluation function. It is quite common that some new piece of knowledge makes the program play "optically" better than before, in the sense that its play looks more intelligent and purposeful, even if the practical strength drops by a few Elo points. You can often see clearly that your program wins a few games because of the newly added knowledge, but it is not so easy to notice the many unexpected ways the newly added knowledge causes your program to lose games.

Here I am at a disadvantage, I know next to nothing about playing chess. It is only recently that a learned about the importance of (candidate) passed pawns. Playing chess for me is mainly about concentrating enough not to give away a piece!

Btw, I found that Wikipedia contains quite some good articles (as far as I can judge) about chess.

Dan Andersson · Post by **Dan Andersson** » Fri Aug 03, 2007 5:58 pm

For tuning evaluation terms there are publications on reinforcement learning for chess. Temporal difference learning is one useful technique. There are papers concerning reinforcement learning of proper values of evaluation terms and extension policies.
Automatically acquiring new knowledge is a harder field. There you get into Markov chains, Monte Carlo simulation, Bayesian inferences and genetic algorithms ...
Citeseer link:
http://citeseer.ist.psu.edu/cs
Good search subjects:
Temporal difference learning
Yngvi Björnsson
Reinforcement learning

MvH Dan Andersson

wgarvin · Post by **wgarvin** » Fri Aug 03, 2007 6:25 pm

Tord Romstad wrote:
Jan Brouwer wrote:One idea I had was to consider an evaluation feature to consist of the feature proper, and of a noise component.
Is it possible to measure how large the noisy part of a particular feature is?
I haven't tried, and I am not even sure I understand the idea correctly. Do you mean that each evaluation term should not only consist of a value, but also an estimate of its possible inaccuracy? Perhaps this might be useful, but I am not quite sure how I would use the information.

One possibility would be to use a sort of "fuzzy eval". Eval could produce both an upper and lower bound on the score, with the size of the gap between them representing the uncertainty in the evalation. Some features of the eval might introduce more uncertainty into the position than others. Both of these values would be backed up through the search, and maybe you'd do something clever when comparing them (e.g. if the bounds of one evaluation completely enclose the bounds of the other you could do some probabilistic thing, otherwise the one with the highest upper bound wins). Maybe you could think of some tricks to make the program favor positions whose eval it has a high degree of certainty about (lowerBound - (difference/4) or something...)

I have no idea if this sort of thing works well or not, and it might not fit nicely into most programs.

Tord Romstad · Post by **Tord Romstad** » Fri Aug 03, 2007 6:30 pm

wgarvin wrote:One possibility would be to use a sort of "fuzzy eval". Eval could produce both an upper and lower bound on the score, with the size of the gap between them representing the uncertainty in the evalation. Some features of the eval might introduce more uncertainty into the position than others.

Yes, there are some programs which do (or at least did) something like this. Instead of an exact score, the evaluation function returns a probability distribution. This allows some interesting search algorithms different from classical alpha beta. I am not aware of any current top programs which use such techniques, but I think Hans Berliner's old program Hitech did.

Tord

BBauer · Post by **BBauer** » Tue Aug 07, 2007 5:55 pm

Hi Tord,

thank you for your very good and clear post.
Many of your thoughts you can find in an article written by C. Donninger in the swiss magazin KARL.
For example he finds that removing a bug may make the program play weaker, because the bug is in some sens a part of the program.

In KARL he reports an experiment with programs which he calls OLA_n.
OLA is the name of an ape in sweden.
OLA_m is a program which searches m moves. The eval is done by choosing random values. OLA_n is a program which searches n move. The eval is again done by choosing random values.
Now he finds that OLA_n is significantly better than OLA_m for n>m.

As we do not understand a position, we can not know the exact evaluation. Therefor we have to live with some randomness.
Special evaluation which should help may lead to special stupidity in other cases. Therefor it may be impossible to make a program signifficantly stronger by adding something. IMHO some programmers have noticed this and therefor stopped the development of their engine. They have startet with a new program.

kind regards
Bernhard

tactical play or positional play for chess engine

Re: The Art of Evaluation (long)

Re: The Art of Evaluation (long)

Re: The Art of Evaluation (long)

Re: The Art of Evaluation (long)

Re: The Art of Evaluation (long)

Re: The Art of Evaluation (long)

Re: The Art of Evaluation (long)

Re: The Art of Evaluation (long)

Re: The Art of Evaluation (long)

Re: The Art of Evaluation (long)