## A different way of summing evaluation features

**Moderators:** bob, hgm, Harvey Williamson

**Forum rules**

This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

### A different way of summing evaluation features

Hi to all of you!

I have an idea that I have not seen before. If you have seen any similar idea please let me know.

I have seen that the normal way of deciding how good a position is, is by summing up all the different evaluation features. I have no problem with that if it would be that the evaluation features would describe the same thing but I guess that is not the case.

My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.

After that the idea is to describe the goodness of the position as the euclidean distance (see http://en.wikipedia.org/wiki/Euclidean_distance) in the coordinate system of the orthogonal features. So the goodness of the position could be represented as the difference of lengths between white and black.

Good luck with you engines!!!

Kind regards

Pio

I have an idea that I have not seen before. If you have seen any similar idea please let me know.

I have seen that the normal way of deciding how good a position is, is by summing up all the different evaluation features. I have no problem with that if it would be that the evaluation features would describe the same thing but I guess that is not the case.

My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.

After that the idea is to describe the goodness of the position as the euclidean distance (see http://en.wikipedia.org/wiki/Euclidean_distance) in the coordinate system of the orthogonal features. So the goodness of the position could be represented as the difference of lengths between white and black.

Good luck with you engines!!!

Kind regards

Pio

### Re: A different way of summing evaluation features

How do you do that? How do you make the terms "rook material", "rook on open file" and "rook mobility" independent of each other ? Just an example.My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.

Thomas...

### Re: A different way of summing evaluation features

Distance tends to lack a sign, unless your plan is distance from the origin of 'positive white features' and 'positive black features' and the difference between them. Even then you would not be able to have a 'white feature' potentially go negative, you'd then need to transfer it to a black feature, as distance doesn't care whether a feature is positive or negative, both increase it, and this starts to detract from the simplicity of your original idea.

As an example say your orthogonal features have the white king safe, the black king under assault, black with control of the queenside (in the form of pawn structure, mobility, and piece presence), and black with an extra pawn. It's difficult with a sum of squares to express the goodness of such a position. the distance between the end points would be a good expression of how unbalanced the position was, but not of how good the position is.

Furthermore, i'm not sure such a combination makes sense in chess. It seems to me 'different features' in chess amplify the strength of your other advantages.

As an example say your orthogonal features have the white king safe, the black king under assault, black with control of the queenside (in the form of pawn structure, mobility, and piece presence), and black with an extra pawn. It's difficult with a sum of squares to express the goodness of such a position. the distance between the end points would be a good expression of how unbalanced the position was, but not of how good the position is.

Furthermore, i'm not sure such a combination makes sense in chess. It seems to me 'different features' in chess amplify the strength of your other advantages.

### Re: A different way of summing evaluation features

The way to do it is well known:tpetzke wrote:How do you do that? How do you make the terms "rook material", "rook on open file" and "rook mobility" independent of each other ? Just an example.My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.

Thomas...

https://en.wikipedia.org/wiki/Principal ... t_analysis

It may be interesting to study the components of an evaluation function, how they interact or how important they are, but I do not see how it can be usefull to a chess program.

Richard

### Re: A different way of summing evaluation features

Can you please clarify ? Sounds like you are trying to make a parrallel with the Gram Schmidt orthogonalisation... ?Pio wrote: My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.

Can you please explain in concrete language, how that can be applied. Just a simple example with a few eval features, if explained clearly and precisely, would be worth much more than great vague theories.

Again what does that mean ? What vector space ? What orthonormal base ? What would the distance mean ?After that the idea is to describe the goodness of the position as the euclidean distance (see http://en.wikipedia.org/wiki/Euclidean_distance) in the coordinate system of the orthogonal features. So the goodness of the position could be represented as the difference of lengths between white and black.

Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

### Re: A different way of summing evaluation features

lucasart wrote:Can you please clarify ? Sounds like you are trying to make a parrallel with the Gram Schmidt orthogonalisation... ?Pio wrote: My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.

Can you please explain in concrete language, how that can be applied. Just a simple example with a few eval features, if explained clearly and precisely, would be worth much more than great vague theories.

Again what does that mean ? What vector space ? What orthonormal base ? What would the distance mean ?After that the idea is to describe the goodness of the position as the euclidean distance (see http://en.wikipedia.org/wiki/Euclidean_distance) in the coordinate system of the orthogonal features. So the goodness of the position could be represented as the difference of lengths between white and black.

The reason why I think it could be a good idea to orthogonalize the

evaluation features and use euclidean distance is because I think it make

sense. Assume for example you would measure the speed of a car as v = v_x + v_y and not as sqrt(v_x^2 + v_y^2). The problem with that is that measuring the speed as v_x + v_y makes the choice of the x-axis and y-axis important and you cannot rotate or transform the coordinate system to measure the same thing as viewed from another viewer/observer.

You could create a basis for example as Richard pointed out by https://en.wikipedia.org/wiki/Principal ... t_analysis

I thought one way of evaluating the position in an orthogonal basis would be as score_whitePointOfView = sqrt(c_1*white_1^2 + c_2*white_2^2 + ... + c_k*white_k^2 + c_(k+1)*white_(k+1)^2 + c_(k+2)*white_(k+2)^2 + ... + c_n*white_n^2) -

sqrt(c_1*black_1^2 + c_2*black_2^2 + ... + c_k*black_k^2 +

c_(k+1)*black_(k+1)^2 + c_(k+2)*black_(k+2)^2 + ... + c_n*black_n^2)

where c_1 to c_k are positive evaluation scaling weights, white_1 and

black_1 describe the size of the evaluation in the first direction, white_2

and black_2 in the second direction ... . The scaling weights c_(k+1) to c_n corresponds for white's side to the directions that have negative directions for black and hence c_(k+1) to c_n will also be positive. The same goes for c_(k+1) to c_n for black's side of view.

The way I view this formula is that you have some type of n-dimensional

plane that both sides try to push. if the weight nr i (c_i) is negative it

might just be seen as an addition to the other side to push the plane.

I guess that trying my way will amplify the small weights in your engines when optimized by SPSA or some other automated optimization procedure.

Good luck with your engines

### Re: A different way of summing evaluation features

The euclidean metric (and thus the associated norm of a vector) is invariant under the action of the orthogonal group O(n), essentially by definition of O(n). Thus a rotation or reflection doesn't alter how far a point is from the origin. The standard "taxicab metric" does not have this property. For example, the point (1,1) in R^2 is 2 taxi units from the origin, but its image under a pi/4 rotation is sqrt(2) taxi units from the origin. [EDIT] Maybe a better way to say this is that the orthogonal group depends on the metric. Seems like O(2) for the taxicab metric is simply the group of symmetries of the square (the unit "sphere" in this metric). So much smaller than the standard O(2).Pio wrote: The reason why I think it could be a good idea to orthogonalize the

evaluation features and use euclidean distance is because I think it make

sense. Assume for example you would measure the speed of a car as v = v_x + v_y and not as sqrt(v_x^2 + v_y^2). The problem with that is that measuring the speed as v_x + v_y makes the choice of the x-axis and y-axis important and you cannot rotate or transform the coordinate system to measure the same thing as viewed from another viewer/observer.

Not yet seeing why this might matter in a chess evaluation function.

### Re: A different way of summing evaluation features

What I think might?? be beneficial to CC with my approach is that I think the evaluation scores will be more accurate. The problem I see with the normal approach is that lots of small bonuses might be hard to rotate to something more concrete and that many small evaluation errors in the evaluation function might propagate more to the total error.zullil wrote:The euclidean metric (and thus the associated norm of a vector) is invariant under the action of the orthogonal group O(n), essentially by definition of O(n). Thus a rotation or reflection doesn't alter how far a point is from the origin. The standard "taxicab metric" does not have this property. For example, the point (1,1) in R^2 is 2 taxi units from the origin, but its image under a pi/4 rotation is sqrt(2) taxi units from the origin. [EDIT] Maybe a better way to say this is that the orthogonal group depends on the metric. Seems like O(2) for the taxicab metric is simply the group of symmetries of the square (the unit "sphere" in this metric). So much smaller than the standard O(2).Pio wrote: The reason why I think it could be a good idea to orthogonalize the

evaluation features and use euclidean distance is because I think it make

sense. Assume for example you would measure the speed of a car as v = v_x + v_y and not as sqrt(v_x^2 + v_y^2). The problem with that is that measuring the speed as v_x + v_y makes the choice of the x-axis and y-axis important and you cannot rotate or transform the coordinate system to measure the same thing as viewed from another viewer/observer.

Not yet seeing why this might matter in a chess evaluation function.

### Re: A different way of summing evaluation features

Assuming you have chosen (preferably orthogonal) evaluation features and you are free to select the weighting coefficients, all you are doing is combining a number of "one-dimensional" evaluations into a single evaluation. For this, I don't yet see any advantage to the euclidean rather than the taxicab approach. Since you've already chosen the features to evaluate, you've essentially already chosen a preferred coordinate system for your taxi metric. Why would "rotation" matter at all? But maybe I'm missing something ...Pio wrote:What I think might?? be beneficial to CC with my approach is that I think the evaluation scores will be more accurate. The problem I see with the normal approach is that lots of small bonuses might be hard to rotate to something more concrete and that many small evaluation errors in the evaluation function might propagate more to the total error.zullil wrote:The euclidean metric (and thus the associated norm of a vector) is invariant under the action of the orthogonal group O(n), essentially by definition of O(n). Thus a rotation or reflection doesn't alter how far a point is from the origin. The standard "taxicab metric" does not have this property. For example, the point (1,1) in R^2 is 2 taxi units from the origin, but its image under a pi/4 rotation is sqrt(2) taxi units from the origin. [EDIT] Maybe a better way to say this is that the orthogonal group depends on the metric. Seems like O(2) for the taxicab metric is simply the group of symmetries of the square (the unit "sphere" in this metric). So much smaller than the standard O(2).Pio wrote: The reason why I think it could be a good idea to orthogonalize the

evaluation features and use euclidean distance is because I think it make

sense. Assume for example you would measure the speed of a car as v = v_x + v_y and not as sqrt(v_x^2 + v_y^2). The problem with that is that measuring the speed as v_x + v_y makes the choice of the x-axis and y-axis important and you cannot rotate or transform the coordinate system to measure the same thing as viewed from another viewer/observer.

Not yet seeing why this might matter in a chess evaluation function.

### Re: A different way of summing evaluation features

I don't think you're missing anything. What the OP is suggesting only makes sense as long as you don't try to understand what it might mean.zullil wrote:But maybe I'm missing something ...

Some kind of Euclidean distance between "white's position" and "black's position" would only measure the dissimilarity between the two. That is of no use whatsoever as a measure of how good the position is for white or for black. (Well, if white and black's positions are identical, the evaluation should indeed be close to zero, but that's really it.)

Finding "orthogonal evaluation features" can be of some use, but only for tuning the evaluation function. Evaluation features are orthogonal if their weights can be tuned independently. It would basically mean that Elo as a function of the weights is a sum of independent functions, each function taking one weight as an argument. I am certain that such orthogonal features do not actually exist, but clearly certain pairs of features will be less interrelated than other pairs of features.

I don't think the OP is thinking about evaluation weights at all, which seems to confirm that there is no solid theory behind the proposal. But maybe he can give a formal definition of evaluation feature orthogonality so that we know what we are talking about.