A different way of summing evaluation features

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

A different way of summing evaluation features

Post by Pio »

Hi to all of you!

I have an idea that I have not seen before. If you have seen any similar idea please let me know.

I have seen that the normal way of deciding how good a position is, is by summing up all the different evaluation features. I have no problem with that if it would be that the evaluation features would describe the same thing but I guess that is not the case.

My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.

After that the idea is to describe the goodness of the position as the euclidean distance (see http://en.wikipedia.org/wiki/Euclidean_distance) in the coordinate system of the orthogonal features. So the goodness of the position could be represented as the difference of lengths between white and black.

Good luck with you engines!!!

Kind regards
Pio
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 4:57 pm
Location: Germany

Re: A different way of summing evaluation features

Post by tpetzke »

My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.
How do you do that? How do you make the terms "rook material", "rook on open file" and "rook mobility" independent of each other ? Just an example.

Thomas...
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: A different way of summing evaluation features

Post by kbhearn »

Distance tends to lack a sign, unless your plan is distance from the origin of 'positive white features' and 'positive black features' and the difference between them. Even then you would not be able to have a 'white feature' potentially go negative, you'd then need to transfer it to a black feature, as distance doesn't care whether a feature is positive or negative, both increase it, and this starts to detract from the simplicity of your original idea.

As an example say your orthogonal features have the white king safe, the black king under assault, black with control of the queenside (in the form of pawn structure, mobility, and piece presence), and black with an extra pawn. It's difficult with a sum of squares to express the goodness of such a position. the distance between the end points would be a good expression of how unbalanced the position was, but not of how good the position is.

Furthermore, i'm not sure such a combination makes sense in chess. It seems to me 'different features' in chess amplify the strength of your other advantages.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: A different way of summing evaluation features

Post by abulmo »

tpetzke wrote:
My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.
How do you do that? How do you make the terms "rook material", "rook on open file" and "rook mobility" independent of each other ? Just an example.

Thomas...
The way to do it is well known:
https://en.wikipedia.org/wiki/Principal ... t_analysis

It may be interesting to study the components of an evaluation function, how they interact or how important they are, but I do not see how it can be usefull to a chess program.
Richard
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A different way of summing evaluation features

Post by lucasart »

Pio wrote: My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.
Can you please clarify ? Sounds like you are trying to make a parrallel with the Gram Schmidt orthogonalisation... ?

Can you please explain in concrete language, how that can be applied. Just a simple example with a few eval features, if explained clearly and precisely, would be worth much more than great vague theories.
After that the idea is to describe the goodness of the position as the euclidean distance (see http://en.wikipedia.org/wiki/Euclidean_distance) in the coordinate system of the orthogonal features. So the goodness of the position could be represented as the difference of lengths between white and black.
Again what does that mean ? What vector space ? What orthonormal base ? What would the distance mean ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: A different way of summing evaluation features

Post by Pio »

lucasart wrote:
Pio wrote: My idea is first to orthogonalize all the evaluation features so that each orthogonal feature would be a linear combination of the original features.
Can you please clarify ? Sounds like you are trying to make a parrallel with the Gram Schmidt orthogonalisation... ?

Can you please explain in concrete language, how that can be applied. Just a simple example with a few eval features, if explained clearly and precisely, would be worth much more than great vague theories.
After that the idea is to describe the goodness of the position as the euclidean distance (see http://en.wikipedia.org/wiki/Euclidean_distance) in the coordinate system of the orthogonal features. So the goodness of the position could be represented as the difference of lengths between white and black.
Again what does that mean ? What vector space ? What orthonormal base ? What would the distance mean ?

The reason why I think it could be a good idea to orthogonalize the
evaluation features and use euclidean distance is because I think it make
sense. Assume for example you would measure the speed of a car as v = v_x + v_y and not as sqrt(v_x^2 + v_y^2). The problem with that is that measuring the speed as v_x + v_y makes the choice of the x-axis and y-axis important and you cannot rotate or transform the coordinate system to measure the same thing as viewed from another viewer/observer.

You could create a basis for example as Richard pointed out by https://en.wikipedia.org/wiki/Principal ... t_analysis

I thought one way of evaluating the position in an orthogonal basis would be as score_whitePointOfView = sqrt(c_1*white_1^2 + c_2*white_2^2 + ... + c_k*white_k^2 + c_(k+1)*white_(k+1)^2 + c_(k+2)*white_(k+2)^2 + ... + c_n*white_n^2) -
sqrt(c_1*black_1^2 + c_2*black_2^2 + ... + c_k*black_k^2 +
c_(k+1)*black_(k+1)^2 + c_(k+2)*black_(k+2)^2 + ... + c_n*black_n^2)

where c_1 to c_k are positive evaluation scaling weights, white_1 and
black_1 describe the size of the evaluation in the first direction, white_2
and black_2 in the second direction ... . The scaling weights c_(k+1) to c_n corresponds for white's side to the directions that have negative directions for black and hence c_(k+1) to c_n will also be positive. The same goes for c_(k+1) to c_n for black's side of view.

The way I view this formula is that you have some type of n-dimensional
plane that both sides try to push. if the weight nr i (c_i) is negative it
might just be seen as an addition to the other side to push the plane.

I guess that trying my way will amplify the small weights in your engines when optimized by SPSA or some other automated optimization procedure.

Good luck with your engines
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: A different way of summing evaluation features

Post by zullil »

Pio wrote: The reason why I think it could be a good idea to orthogonalize the
evaluation features and use euclidean distance is because I think it make
sense. Assume for example you would measure the speed of a car as v = v_x + v_y and not as sqrt(v_x^2 + v_y^2). The problem with that is that measuring the speed as v_x + v_y makes the choice of the x-axis and y-axis important and you cannot rotate or transform the coordinate system to measure the same thing as viewed from another viewer/observer.
The euclidean metric (and thus the associated norm of a vector) is invariant under the action of the orthogonal group O(n), essentially by definition of O(n). Thus a rotation or reflection doesn't alter how far a point is from the origin. The standard "taxicab metric" does not have this property. For example, the point (1,1) in R^2 is 2 taxi units from the origin, but its image under a pi/4 rotation is sqrt(2) taxi units from the origin. [EDIT] Maybe a better way to say this is that the orthogonal group depends on the metric. Seems like O(2) for the taxicab metric is simply the group of symmetries of the square (the unit "sphere" in this metric). So much smaller than the standard O(2).

Not yet seeing why this might matter in a chess evaluation function.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: A different way of summing evaluation features

Post by Pio »

zullil wrote:
Pio wrote: The reason why I think it could be a good idea to orthogonalize the
evaluation features and use euclidean distance is because I think it make
sense. Assume for example you would measure the speed of a car as v = v_x + v_y and not as sqrt(v_x^2 + v_y^2). The problem with that is that measuring the speed as v_x + v_y makes the choice of the x-axis and y-axis important and you cannot rotate or transform the coordinate system to measure the same thing as viewed from another viewer/observer.
The euclidean metric (and thus the associated norm of a vector) is invariant under the action of the orthogonal group O(n), essentially by definition of O(n). Thus a rotation or reflection doesn't alter how far a point is from the origin. The standard "taxicab metric" does not have this property. For example, the point (1,1) in R^2 is 2 taxi units from the origin, but its image under a pi/4 rotation is sqrt(2) taxi units from the origin. [EDIT] Maybe a better way to say this is that the orthogonal group depends on the metric. Seems like O(2) for the taxicab metric is simply the group of symmetries of the square (the unit "sphere" in this metric). So much smaller than the standard O(2).

Not yet seeing why this might matter in a chess evaluation function.
What I think might?? be beneficial to CC with my approach is that I think the evaluation scores will be more accurate. The problem I see with the normal approach is that lots of small bonuses might be hard to rotate to something more concrete and that many small evaluation errors in the evaluation function might propagate more to the total error.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: A different way of summing evaluation features

Post by zullil »

Pio wrote:
zullil wrote:
Pio wrote: The reason why I think it could be a good idea to orthogonalize the
evaluation features and use euclidean distance is because I think it make
sense. Assume for example you would measure the speed of a car as v = v_x + v_y and not as sqrt(v_x^2 + v_y^2). The problem with that is that measuring the speed as v_x + v_y makes the choice of the x-axis and y-axis important and you cannot rotate or transform the coordinate system to measure the same thing as viewed from another viewer/observer.
The euclidean metric (and thus the associated norm of a vector) is invariant under the action of the orthogonal group O(n), essentially by definition of O(n). Thus a rotation or reflection doesn't alter how far a point is from the origin. The standard "taxicab metric" does not have this property. For example, the point (1,1) in R^2 is 2 taxi units from the origin, but its image under a pi/4 rotation is sqrt(2) taxi units from the origin. [EDIT] Maybe a better way to say this is that the orthogonal group depends on the metric. Seems like O(2) for the taxicab metric is simply the group of symmetries of the square (the unit "sphere" in this metric). So much smaller than the standard O(2).

Not yet seeing why this might matter in a chess evaluation function.
What I think might?? be beneficial to CC with my approach is that I think the evaluation scores will be more accurate. The problem I see with the normal approach is that lots of small bonuses might be hard to rotate to something more concrete and that many small evaluation errors in the evaluation function might propagate more to the total error.
Assuming you have chosen (preferably orthogonal) evaluation features and you are free to select the weighting coefficients, all you are doing is combining a number of "one-dimensional" evaluations into a single evaluation. For this, I don't yet see any advantage to the euclidean rather than the taxicab approach. Since you've already chosen the features to evaluate, you've essentially already chosen a preferred coordinate system for your taxi metric. Why would "rotation" matter at all? But maybe I'm missing something ...
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: A different way of summing evaluation features

Post by syzygy »

zullil wrote:But maybe I'm missing something ...
I don't think you're missing anything. What the OP is suggesting only makes sense as long as you don't try to understand what it might mean.

Some kind of Euclidean distance between "white's position" and "black's position" would only measure the dissimilarity between the two. That is of no use whatsoever as a measure of how good the position is for white or for black. (Well, if white and black's positions are identical, the evaluation should indeed be close to zero, but that's really it.)

Finding "orthogonal evaluation features" can be of some use, but only for tuning the evaluation function. Evaluation features are orthogonal if their weights can be tuned independently. It would basically mean that Elo as a function of the weights is a sum of independent functions, each function taking one weight as an argument. I am certain that such orthogonal features do not actually exist, but clearly certain pairs of features will be less interrelated than other pairs of features.

I don't think the OP is thinking about evaluation weights at all, which seems to confirm that there is no solid theory behind the proposal. But maybe he can give a formal definition of evaluation feature orthogonality so that we know what we are talking about.