The evaluation value and value returned by minimax search
Moderators: bob, hgm, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
The evaluation value and value returned by minimax search
Hi, all.
These days I was thinking about using reinforcement learning to train my evaluation function. I came up with a question about how to understand the relationship between evaluation and the search.
Let us go back to the original definition of evaluation and minimax search. We know that for each position, the evaluation gives us an estimation of the current position. If this estimation is exactly enough (which means this evaluation is *perfect*), the engine should be strong even only search for just 1 depth to pick a best move whose leading position has the maximum evaluation. Unfortunately, it is impossible to make such a 100% exact estimation. However, a minimax search with depth D can make the evaluation more exactly. It is intuitive to believe that the value of a minimax search with depth D (D > 1) should be more accurate than calling the evaluation function directly.
Now the question is coming: if we have an evaluation function Eval() which is not accurate enough (for example, an evaluation that only make the summation of all material values in that position), firstly we use this evaluation function to evaluate a position, suppose the value return by Eval(pos) is H. Then we run a minimax search of depth D with the same evaluation function (which is called when the search reach the leaf nodes) and the position above is the root position. The search use qsearch and check extension to avoid horizon effect. The search will also return a value for this position, say V. My question is: “Can we say that V is more accurate than H respectively?”
Notice that the evaluation function itself is not accurate at all. So the search result may also not be accurate.
Another more interesting topic is about the delta between V and H. Suppose Delta = V – H, what conclusion can we get from this Delta? Can we say that Delta reflect the degree of accuracy of an evaluation function? Or it is just a essential condition for an accurate evaluation function, but not sufficient (we can say “an accurate evaluation => a small Delta”, but can not say “a small Delta => an accurate evaluation”)? Or neither?
Dose the evaluation of Stockfish or Crafty has such a property of “small Delta”? Have someone made such a test? (The comparison may need some normalization to the percentage pawn score scale)
These days I was thinking about using reinforcement learning to train my evaluation function. I came up with a question about how to understand the relationship between evaluation and the search.
Let us go back to the original definition of evaluation and minimax search. We know that for each position, the evaluation gives us an estimation of the current position. If this estimation is exactly enough (which means this evaluation is *perfect*), the engine should be strong even only search for just 1 depth to pick a best move whose leading position has the maximum evaluation. Unfortunately, it is impossible to make such a 100% exact estimation. However, a minimax search with depth D can make the evaluation more exactly. It is intuitive to believe that the value of a minimax search with depth D (D > 1) should be more accurate than calling the evaluation function directly.
Now the question is coming: if we have an evaluation function Eval() which is not accurate enough (for example, an evaluation that only make the summation of all material values in that position), firstly we use this evaluation function to evaluate a position, suppose the value return by Eval(pos) is H. Then we run a minimax search of depth D with the same evaluation function (which is called when the search reach the leaf nodes) and the position above is the root position. The search use qsearch and check extension to avoid horizon effect. The search will also return a value for this position, say V. My question is: “Can we say that V is more accurate than H respectively?”
Notice that the evaluation function itself is not accurate at all. So the search result may also not be accurate.
Another more interesting topic is about the delta between V and H. Suppose Delta = V – H, what conclusion can we get from this Delta? Can we say that Delta reflect the degree of accuracy of an evaluation function? Or it is just a essential condition for an accurate evaluation function, but not sufficient (we can say “an accurate evaluation => a small Delta”, but can not say “a small Delta => an accurate evaluation”)? Or neither?
Dose the evaluation of Stockfish or Crafty has such a property of “small Delta”? Have someone made such a test? (The comparison may need some normalization to the percentage pawn score scale)
Re: The evaluation value and value returned by minimax searc
“Can we say that V is more accurate than H respectively?”
It depends on the domain. In chess when you search deep enough, sooner or later you end up in a silent position where all tactical complications are resolved and only material matters. That's why it's generally true to say that V is much more accurate than H. Of course, there are exceptions, for example fortress positions.
However for example in game of Go, where tactical complications can stay on the board over 100 moves and writing a good evaluation function is almost impossible, searching deeper with minimax doesn't necessarily help (or the gain is minimalistic).
I think I saw a paper long time ago which demonstrated games where searching deeper made a program weaker, but I'm not sure about this.
Btw, One funny result: Around 12 years ago, I tried to tune Stockfish's evaluation function to make delta(VH) smaller on average, but this change made SF weaker!!
It depends on the domain. In chess when you search deep enough, sooner or later you end up in a silent position where all tactical complications are resolved and only material matters. That's why it's generally true to say that V is much more accurate than H. Of course, there are exceptions, for example fortress positions.
However for example in game of Go, where tactical complications can stay on the board over 100 moves and writing a good evaluation function is almost impossible, searching deeper with minimax doesn't necessarily help (or the gain is minimalistic).
I think I saw a paper long time ago which demonstrated games where searching deeper made a program weaker, but I'm not sure about this.
Btw, One funny result: Around 12 years ago, I tried to tune Stockfish's evaluation function to make delta(VH) smaller on average, but this change made SF weaker!!
Joona Kiiski
Re: The evaluation value and value returned by minimax searc
Hi Joona and Ma Chao!
The big problem as I see it, trying to minimize the absolute value of difference between a shallow and a deep search, is that doing so IMO will lead to a simplification of the evaluation function. IMO It might help though if you can determine if the deeper search is a win, draw or a loss.
The reason why I think it will lead to a simplification of the evaluation is that having an evaluation function f such that f(position) = 0 if the position is neither a win nor a loss, f(position) = 0.01 if the position is win for you and f(position) = 0.01 if the position is a win for the opponent, will look like a great function since the absolute value of the difference between a deep and shallow search will always be small.
Joona, if I am guessing right, your try failed because the evaluation function you ended up with decimated all the values that tended to fluctuate, for example king safety and maybe mobility. Am I right? I am really curious if my thought experiment is right. Also I do not understand the logic by using the absolute value of the difference as a meassure of error??
An interesting idea I have thought about is trying to improve the search and evaluation endgame parameters of an engine by using an endgame tablebase as an oracle. What you could do then is to set up lots of positions and meassure the fitness of the evaluation as a function of how many solutions (mates and draws) you could find in a fixed number of moves searched. I think you should also give a bonus for finding the solutions in fewer steps than the fixed number and also a bonus if the function is close to the true value and of course a penalty if it is not close to the true value. Why not try the same method for other positions that can be proven as check mates or draws in the opening and middle stage of the game.
Joona, I am not surprised if you mathematically could prove (assuming the leaf nodes has some error distribution from their true values (1 , 0, 1)) that a deeper search will lead to a worse result than a shallow search. I think it has a lot to do with the branching factor and conditional probabilities of the game tree. A low branching factor would propegate the errors a lot more IMO. If the search is deep enough the argument is wrong since you will now find the true value
Ma Chao, thank you for a really interesting topic . I think your questions are really good.
The big problem as I see it, trying to minimize the absolute value of difference between a shallow and a deep search, is that doing so IMO will lead to a simplification of the evaluation function. IMO It might help though if you can determine if the deeper search is a win, draw or a loss.
The reason why I think it will lead to a simplification of the evaluation is that having an evaluation function f such that f(position) = 0 if the position is neither a win nor a loss, f(position) = 0.01 if the position is win for you and f(position) = 0.01 if the position is a win for the opponent, will look like a great function since the absolute value of the difference between a deep and shallow search will always be small.
Joona, if I am guessing right, your try failed because the evaluation function you ended up with decimated all the values that tended to fluctuate, for example king safety and maybe mobility. Am I right? I am really curious if my thought experiment is right. Also I do not understand the logic by using the absolute value of the difference as a meassure of error??
An interesting idea I have thought about is trying to improve the search and evaluation endgame parameters of an engine by using an endgame tablebase as an oracle. What you could do then is to set up lots of positions and meassure the fitness of the evaluation as a function of how many solutions (mates and draws) you could find in a fixed number of moves searched. I think you should also give a bonus for finding the solutions in fewer steps than the fixed number and also a bonus if the function is close to the true value and of course a penalty if it is not close to the true value. Why not try the same method for other positions that can be proven as check mates or draws in the opening and middle stage of the game.
Joona, I am not surprised if you mathematically could prove (assuming the leaf nodes has some error distribution from their true values (1 , 0, 1)) that a deeper search will lead to a worse result than a shallow search. I think it has a lot to do with the branching factor and conditional probabilities of the game tree. A low branching factor would propegate the errors a lot more IMO. If the search is deep enough the argument is wrong since you will now find the true value
Ma Chao, thank you for a really interesting topic . I think your questions are really good.

 Posts: 23
 Joined: Sun Feb 08, 2009 2:54 am
Re: The evaluation value and value returned by minimax searc
most of time eval function is used for cutoff, so diffirent eval makes different play style,i think
Re: The evaluation value and value returned by minimax searc
Traditionally to the evaluation is given the meaning of measure of the position in terms of centipawns or, anyhow to measure how good is a given position assigning a number that more or less resembles the result of a chess position's analysis by a GM. This concept is IMHO completely misguiding.
Chess engines don't use evaluation to "analyze" a position, but to decide what moves to try first. If, for instance, we would have an evaluation function that returns random numbers but with the only constrain that given a position and a list of possible moves, the position obtained playing the strongest move has always the highest evaluation, well, this would be the perfect function from ELO point of view, but at the same time could be as well totally unusable for classical chess "analysis".
Chess engines don't use evaluation to "analyze" a position, but to decide what moves to try first. If, for instance, we would have an evaluation function that returns random numbers but with the only constrain that given a position and a list of possible moves, the position obtained playing the strongest move has always the highest evaluation, well, this would be the perfect function from ELO point of view, but at the same time could be as well totally unusable for classical chess "analysis".

 Posts: 408
 Joined: Sat Mar 06, 2010 8:28 am
Re: The evaluation value and value returned by minimax searc
A="small delta => accurate evaluation"
For the perfect evaluation delta is 0. Let's look at the worst possible evaluation. Let's say the worst possible evaluation would always report a loss when the position is a win and vice versa. For a draw position the worst possible evaluation would always report a win. What about delta now? Delta would be obviously 0 again, because the worst possible evaluation is as reliable as the best possible evaluation. Now lets make the best possible evaluation a little weaker and the worst one a little stronger. We could say there is one position p for which the best evaluation report randomly the wrong and the worst evaluation the right outcome. In both cases delta would be the same small value. Therefore I think A is not true.
For the perfect evaluation delta is 0. Let's look at the worst possible evaluation. Let's say the worst possible evaluation would always report a loss when the position is a win and vice versa. For a draw position the worst possible evaluation would always report a win. What about delta now? Delta would be obviously 0 again, because the worst possible evaluation is as reliable as the best possible evaluation. Now lets make the best possible evaluation a little weaker and the worst one a little stronger. We could say there is one position p for which the best evaluation report randomly the wrong and the worst evaluation the right outcome. In both cases delta would be the same small value. Therefore I think A is not true.
Re: The evaluation value and value returned by minimax searc
It doesn't matter whether we speak about go or chess, in all cases statistically D > 1 will statistically return a much better score.nkg114mc wrote:Hi, all.
These days I was thinking about using reinforcement learning to train my evaluation function. I came up with a question about how to understand the relationship between evaluation and the search.
Let us go back to the original definition of evaluation and minimax search. We know that for each position, the evaluation gives us an estimation of the current position. If this estimation is exactly enough (which means this evaluation is *perfect*), the engine should be strong even only search for just 1 depth to pick a best move whose leading position has the maximum evaluation. Unfortunately, it is impossible to make such a 100% exact estimation. However, a minimax search with depth D can make the evaluation more exactly. It is intuitive to believe that the value of a minimax search with depth D (D > 1) should be more accurate than calling the evaluation function directly.
Now the question is coming: if we have an evaluation function Eval() which is not accurate enough
(for example, an evaluation that only make the summation of all material values in that position), firstly we use this evaluation function to evaluate a position, suppose the value return by Eval(pos) is H. Then we run a minimax search of depth D with the same evaluation function (which is called when the search reach the leaf nodes) and the position above is the root position. The search use qsearch and check extension to avoid horizon effect. The search will also return a value for this position, say V. My question is: “Can we say that V is more accurate than H respectively?”
A statistical better score doesn't mean it's true for all positions.
There is no question about this.
That would be a heuristic with a weak positive relationship.Notice that the evaluation function itself is not accurate at all. So the search result may also not be accurate.
Another more interesting topic is about the delta between V and H. Suppose Delta = V – H, what conclusion can we get from this Delta? Can we say that Delta reflect the degree of accuracy of an evaluation function?
A stronger assumption would be that the position isn't quiet.
In classic game tree search the assumption is that we strive for returning an evaluation of this position removing possible tactics that are there, so returning a quiet evaluation.Or it is just a essential condition for an accurate evaluation function, but not sufficient (we can say “an accurate evaluation => a small Delta”, but can not say “a small Delta => an accurate evaluation”)? Or neither?
Now that said, i must note that the years 70/80 razoring technique is again very fashionable, which contradicts the above statement.
I didn't checkout the latest crafty, as for Stockfish it's razoring last plies bigtime.Dose the evaluation of Stockfish or Crafty has such a property of “small Delta”?
You don't need to test it  the relationship is there, just it's a weak positive relationship, so filtering out tactics has an overwhelming bigger impact on the score difference the first few plies, than the above heuristical relationship, yet it is there.Have someone made such a test? (The comparison may need some normalization to the percentage pawn score scale)
So if you'd compare a search depth of say n with n+i , with n rather big and n+i immense bigger, the initial weak positive relationship gets a lot stronger.
Re: The evaluation value and value returned by minimax searc
You might be interested in digging up some old deep blue paper, as i tend to remember they wrote they did do something like that for deep blue.nkg114mc wrote:Hi, all.
These days I was thinking about using reinforcement learning to train my evaluation function.
[snip]
Re: The evaluation value and value returned by minimax searc
Total crap and desinformation.zamar wrote:“Can we say that V is more accurate than H respectively?”
It depends on the domain. In chess when you search deep enough, sooner or later you end up in a silent position where all tactical complications are resolved and only material matters. That's why it's generally true to say that V is much more accurate than H. Of course, there are exceptions, for example fortress positions.
However for example in game of Go, where tactical complications can stay on the board over 100 moves and writing a good evaluation function is almost impossible, searching deeper with minimax doesn't necessarily help (or the gain is minimalistic).
I think I saw a paper long time ago which demonstrated games where searching deeper made a program weaker, but I'm not sure about this.
Btw, One funny result: Around 12 years ago, I tried to tune Stockfish's evaluation function to make delta(VH) smaller on average, but this change made SF weaker!!
Also easy to disprove. One of the ways to show that is that basically you claim that a chessprogram only needs a material evaluation.
I'd say kick out all evaluation except for material and see how it plays...
Re: The evaluation value and value returned by minimax searc
If just 1% of the effort put in computerchess would be put in computergo, you also easily beat of course any go player with todays hardware that also gets used for computerchess.zamar wrote:“
SNIP
However for example in game of Go, where tactical complications can stay on the board over 100 moves and writing a good evaluation function is almost impossible, searching deeper with minimax doesn't necessarily help (or the gain is minimalistic).
Just of course it requires a few talented game tree searchers to work in that field, and those guys want to get paid a salary.
What most people do not realize is that in go it's much easier to prune moves, so getting very big search depths there is pretty easy with todays hardware.
Now that said, realize how all search algorithms are total tuned and invented for computerchess.
Name me 1 search algorithm not yet invented/used before by some computerchess researcher that was invented in Go/Shogi or even checkers/draughts.
It all has been invented in chess as the most briliant guys worked on computerchess and not computergo. That won't quickly change either.
Some stronger go players who also wrote a go program have estimated they need around a 30 ply search depth to get to the world top level tactics in go, not counting of course ladders.
However this is possible to achieve in go, even without reductions and no forward pruning and just nullmove you already get 15 ply.
Try to get 15 ply in chess with just nullmove, knowing that in go some engines stil even today prune already in a hard manner in the root position moves.
That's how accurate you can make a statistical function there.
If i would be able to use the evaluation function of some of the existing go programs that play strong, it's rather easy to already get 500 elo above any other go program, now don't tell me that any of todays go engines has dan level yet, as i hardly ever play go, maybe a few games each few years when i test again the current state of the go engines, and it's tough to see me as someone who is X  dan level, as a 5 dan just total hammered me with 361 points or so
The first thing to introduce of course in your testing in go is add the concept of elo to the game of go, rather than work with dan and kyu levels