Thanks a lot for this great post, Tord. I too made (or still make) some of the mistakes you mention.
Took me also quite some time to figure out that giving a bonus for something doesn't have to result in the same behaviour like giving a penalty for not doing something. Good explanation for that phenomenon from you side.
Roman
tactical play or positional play for chess engine
Moderators: hgm, Rebel, chrisw
-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: The Art of Evaluation (long)
Hello Jan,
Thanks to you, Harm Geert and Roman for your kind words! While writing long posts like the one you replied to, I sometimes wonder why I bother writing them, and fear that noone is going to do the effort of reading them. That intelligent people like you not only read what I write, but even appreciate it, is very encouraging.
Tord
Thanks to you, Harm Geert and Roman for your kind words! While writing long posts like the one you replied to, I sometimes wonder why I bother writing them, and fear that noone is going to do the effort of reading them. That intelligent people like you not only read what I write, but even appreciate it, is very encouraging.
I haven't tried, and I am not even sure I understand the idea correctly. Do you mean that each evaluation term should not only consist of a value, but also an estimate of its possible inaccuracy? Perhaps this might be useful, but I am not quite sure how I would use the information.Jan Brouwer wrote:One idea I had was to consider an evaluation feature to consist of the feature proper, and of a noise component.
Is it possible to measure how large the noisy part of a particular feature is?
This is a very difficult question, and I'm afraid I don't have any good answer. Quite often, I have to trust my intuition, and I am sure my intuition is very often wrong with respect to the evaluation function. It is quite common that some new piece of knowledge makes the program play "optically" better than before, in the sense that its play looks more intelligent and purposeful, even if the practical strength drops by a few Elo points. You can often see clearly that your program wins a few games because of the newly added knowledge, but it is not so easy to notice the many unexpected ways the newly added knowledge causes your program to lose games.And how do you measure the goodness of an evaluation function in general? By playing many games?
Vasik Rajlich wrote: “The key to having a good evaluation is coming up with some way to test it, piece by piece.
Self-play is not enough, you'll never play enough games to show a 10-point improvement.”
Tord
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: The Art of Evaluation (long)
Thank you Tord for this reply to Stan Arts as I did not have the time to reply until now. You did a better Job than what I would have done anyway. I would just add that the eval can not be divorced from the time element. If the eval has fantastic and correct chess knowledge, but, is too slow then it is not a super good eval. It is one thing to have a slightly under par search and quite another to have a slightly under par search that is also crippled by a too slow eval.
Edit: If anyone thinks that I am contradicting myself to some degree then go back and look at my original post and you will see that I indicated that a super good eval did not necessarily mean complicated.
Edit: If anyone thinks that I am contradicting myself to some degree then go back and look at my original post and you will see that I indicated that a super good eval did not necessarily mean complicated.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 318
- Joined: Thu Mar 09, 2006 1:07 am
Re: The Art of Evaluation (long)
I like to read posts with good technical informations in it. And even if I doTord Romstad wrote: Thanks to you, Harm Geert and Roman for your kind words! While writing long posts like the one you replied to, I sometimes wonder why I bother writing them, and fear that noone is going to do the effort of reading them. That intelligent people like you not only read what I write, but even appreciate it, is very encouraging.
not answer them I store them in a big folder of chess programming on my hard disc. In this case in the subfolder 'evaluation'. In case I want to
improve or rewrite my own engine some day I have a lot of ideas to think
about. There is about 800 MByte of stuff in the folder and then there are
others with chess related papers, chess engines, sources and so on. I do
not believe I can read the 2.5 Gbyte in 23000 files in the rest of my life
but collecting is fun, too.
Harald
-
- Posts: 201
- Joined: Thu Mar 22, 2007 7:12 pm
- Location: Netherlands
Re: The Art of Evaluation (long)
Hi Tord,
I'm sure that quite some amateur chess engine authors like myself are eager to understand why Glaurung is so strong, and searches so efficiently!
Btw, I found that Wikipedia contains quite some good articles (as far as I can judge) about chess.
I'm sure that quite some amateur chess engine authors like myself are eager to understand why Glaurung is so strong, and searches so efficiently!
I was thinking about treating an evaluation feature as a "black box", apply a lot of smart number crunching to it, and out comes an answer like "this feature correlates 60% with playing strength, 40% is random noise". Now the only tricky part that remains is defining the number crunching needed . If this were possible, it would provide a way of optimizing evalutaion features. Anyway, it is just a vague idea .Tord Romstad wrote:I haven't tried, and I am not even sure I understand the idea correctly. Do you mean that each evaluation term should not only consist of a value, but also an estimate of its possible inaccuracy? Perhaps this might be useful, but I am not quite sure how I would use the information.Jan Brouwer wrote:One idea I had was to consider an evaluation feature to consist of the feature proper, and of a noise component.
Is it possible to measure how large the noisy part of a particular feature is?
Here I am at a disadvantage, I know next to nothing about playing chess. It is only recently that a learned about the importance of (candidate) passed pawns. Playing chess for me is mainly about concentrating enough not to give away a piece!Tord Romstad wrote:This is a very difficult question, and I'm afraid I don't have any good answer. Quite often, I have to trust my intuition, and I am sure my intuition is very often wrong with respect to the evaluation function. It is quite common that some new piece of knowledge makes the program play "optically" better than before, in the sense that its play looks more intelligent and purposeful, even if the practical strength drops by a few Elo points. You can often see clearly that your program wins a few games because of the newly added knowledge, but it is not so easy to notice the many unexpected ways the newly added knowledge causes your program to lose games.And how do you measure the goodness of an evaluation function in general? By playing many games?
Vasik Rajlich wrote: “The key to having a good evaluation is coming up with some way to test it, piece by piece.
Self-play is not enough, you'll never play enough games to show a 10-point improvement.”
Btw, I found that Wikipedia contains quite some good articles (as far as I can judge) about chess.
-
- Posts: 442
- Joined: Wed Mar 08, 2006 8:54 pm
Re: The Art of Evaluation (long)
For tuning evaluation terms there are publications on reinforcement learning for chess. Temporal difference learning is one useful technique. There are papers concerning reinforcement learning of proper values of evaluation terms and extension policies.
Automatically acquiring new knowledge is a harder field. There you get into Markov chains, Monte Carlo simulation, Bayesian inferences and genetic algorithms ...
Citeseer link:
http://citeseer.ist.psu.edu/cs
Good search subjects:
Temporal difference learning
Yngvi Björnsson
Reinforcement learning
MvH Dan Andersson
Automatically acquiring new knowledge is a harder field. There you get into Markov chains, Monte Carlo simulation, Bayesian inferences and genetic algorithms ...
Citeseer link:
http://citeseer.ist.psu.edu/cs
Good search subjects:
Temporal difference learning
Yngvi Björnsson
Reinforcement learning
MvH Dan Andersson
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: The Art of Evaluation (long)
One possibility would be to use a sort of "fuzzy eval". Eval could produce both an upper and lower bound on the score, with the size of the gap between them representing the uncertainty in the evalation. Some features of the eval might introduce more uncertainty into the position than others. Both of these values would be backed up through the search, and maybe you'd do something clever when comparing them (e.g. if the bounds of one evaluation completely enclose the bounds of the other you could do some probabilistic thing, otherwise the one with the highest upper bound wins). Maybe you could think of some tricks to make the program favor positions whose eval it has a high degree of certainty about (lowerBound - (difference/4) or something...)Tord Romstad wrote:I haven't tried, and I am not even sure I understand the idea correctly. Do you mean that each evaluation term should not only consist of a value, but also an estimate of its possible inaccuracy? Perhaps this might be useful, but I am not quite sure how I would use the information.Jan Brouwer wrote:One idea I had was to consider an evaluation feature to consist of the feature proper, and of a noise component.
Is it possible to measure how large the noisy part of a particular feature is?
I have no idea if this sort of thing works well or not, and it might not fit nicely into most programs.
-
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: The Art of Evaluation (long)
Yes, there are some programs which do (or at least did) something like this. Instead of an exact score, the evaluation function returns a probability distribution. This allows some interesting search algorithms different from classical alpha beta. I am not aware of any current top programs which use such techniques, but I think Hans Berliner's old program Hitech did.wgarvin wrote:One possibility would be to use a sort of "fuzzy eval". Eval could produce both an upper and lower bound on the score, with the size of the gap between them representing the uncertainty in the evalation. Some features of the eval might introduce more uncertainty into the position than others.
Tord
-
- Posts: 658
- Joined: Wed Mar 08, 2006 8:58 pm
Re: The Art of Evaluation (long)
Hi Tord,
thank you for your very good and clear post.
Many of your thoughts you can find in an article written by C. Donninger in the swiss magazin KARL.
For example he finds that removing a bug may make the program play weaker, because the bug is in some sens a part of the program.
In KARL he reports an experiment with programs which he calls OLA_n.
OLA is the name of an ape in sweden.
OLA_m is a program which searches m moves. The eval is done by choosing random values. OLA_n is a program which searches n move. The eval is again done by choosing random values.
Now he finds that OLA_n is significantly better than OLA_m for n>m.
As we do not understand a position, we can not know the exact evaluation. Therefor we have to live with some randomness.
Special evaluation which should help may lead to special stupidity in other cases. Therefor it may be impossible to make a program signifficantly stronger by adding something. IMHO some programmers have noticed this and therefor stopped the development of their engine. They have startet with a new program.
kind regards
Bernhard
thank you for your very good and clear post.
Many of your thoughts you can find in an article written by C. Donninger in the swiss magazin KARL.
For example he finds that removing a bug may make the program play weaker, because the bug is in some sens a part of the program.
In KARL he reports an experiment with programs which he calls OLA_n.
OLA is the name of an ape in sweden.
OLA_m is a program which searches m moves. The eval is done by choosing random values. OLA_n is a program which searches n move. The eval is again done by choosing random values.
Now he finds that OLA_n is significantly better than OLA_m for n>m.
As we do not understand a position, we can not know the exact evaluation. Therefor we have to live with some randomness.
Special evaluation which should help may lead to special stupidity in other cases. Therefor it may be impossible to make a program signifficantly stronger by adding something. IMHO some programmers have noticed this and therefor stopped the development of their engine. They have startet with a new program.
kind regards
Bernhard
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: The Art of Evaluation (long)
Perfection is achieved not when there is nothing left to add, but when there is nothing left to take away.