mjlef wrote:I think curve/formula fitting is the best approach. I actually staretd doing that, as follows:
Merge the data for Black and while. For example, if you have
KR vs k
and
K vs kr
just add the totals together to form ona KR v k entry. This increases the counts for that imbalance and make it more statitsically relevant.
Next, split out and count the amount of each piece. You really want these numbers
P
N
B
R
Q
p
n
b
r
q
which are the counts of each piece type for each color.
Next, you need to come up with a formula for the correction:
C = -100*p +100*P -325*N + 325*n -328*b +325*B -500*r + 500*R
-900*q +900*q (these are just whatever standard values you want to give the pieces)
+A*p + B*P +C*p*n + D*p*N + E*p*b + F*p*B + G*p*r + H*p*R +
I*p*q + J*p*Q + K*n + L*N +....
and so on, with a constant for each combination of piece type counts
Then just curve fit and find the best value for each term.
I tried doing this manually, using the square of the difference between the correction and the win probabilities. You need to equate the win probablitlies to material difference to do this. I think I set 1 pawn=100 ELO...very crude.
It would be nice to find a good fit with the material in the "center" where material imbalance is not great. Who cares how good the fit is when one side is a queen ahead, since it hardly matters. In my case I only tried the manual approach for material differences of 3 or less pawns.
The above only uses linear terms. A better system could fit most of the data, and mark some positions that just do not fit well for exception handling. For example, B vs p is most likely a draw, but trying to get that to fit well might mess up other terms, and so it could be culled from the data to get a better fit of other stuff.
Is there a good (and free) package that can do these kinds of fits? I see stuff to handle a few terms, but we need a lot. 10+9+8+7+6+5+4+3+2+1 =55 actually for just the linear terms. Of course, many of these might come out to close to zero, once the data is analyzed. And if higher order terms are needed, there might be a lot more.
Another approach is to to a best fit of each term. See, for example what each term shows up on its own. If it comes close to zero it could probably be eliminated.
Mark
I have code to perform parabolic least squares fit, which is probably what is wanted. I also have code for fitting any arbitrary function (Levenberg Marquardt algorithm).
I have used the parabolic fit to do curves to find optimal tactical settings and it works great for that. Unfortunately, the tactical settings do not turn out to be optimal for play.
I have not run the curves on material imbalance. It will be very interesting to me, after we calculate the curves, if we can reason backwards from the equation to deduce the meaning of it.
Here is the parabolic fit:
Code: Select all
#include <math.h>
#include <float.h>
/***************************************************
* Parabolic Least Squares Demonstration Program *
* ------------------------------------------------ *
* Reference: BASIC Scientific Subroutines, Vol. II *
* by F.R. Ruckdeschel, BYTE/McGRAWW-HILL, 1981. *
* ISBN 0-07-054202-3 (v. 2), Pages 24-31 *
* *
* C++ version by J-P Moreau, Paris *
* Subsequently translated to C and heavily altered *
* by Dann Corbit *
* ------------------------------------------------ *
* This program calculates a parabolic least squares*
* fit to a given data set. *
* *
* INSTRUCTIONS *
* ------------ *
* 1. The number of data coordinates provided must *
* be greater than three points. *
* 2. The data must not be a single point, which *
* is repeated. *
* 3. The data must not be perfectly colinear. *
* If any of these 3 rules are violated, an error *
* flag will be set. The returned data will be *
* invalid and must not be used. *
* *
* SAMPLE DATA: *
* *
* Number of data points : 4 *
* *
* X[0]=1 Y[0]=1 *
* X[1]=2 Y[1]=4 *
* X[2]=3 Y[2]=9 *
* X[3]=5 Y[3]=24.95 *
* *
* Fitted equation is: *
* Y = -0.017727 + 0.022045 X + 0.994318 X^2 *
* *
* Standard deviation of fit: 0.004767 *
***************************************************/
#define DATA_OK 0 // Everything is A-OK
#define NOT_ENOUGH_DATA 1 // Underdetermined system
#define SINGLE_POINT 2 // Degenerate data
#define STRAIGHT_LINE 3 // Degenerate data
/****************************************************************
* Parabolic least squares estimation *
* ------------------------------------------------------------- *
* In: unsigned n = number of points *
* n values xd[i], yd[i] *
* Out: coefficients a,b,c of fit (a+b*x+c*x^2) *
* coefs[0] = multiplier for x^0 {Constant term} *
* coefs[1] = multiplier for x^1 {Linear term} *
* coefs[2] = multiplier for x^2 {Quadratic term} *
* coefs[3] = Standard deviation {How good is the fit?} *
* coefs[4] = x location for the extrapolcated minimum *
* Returns: The location of the minimum of the parabola at x. *
****************************************************************/
double plsqf(double *xd, double *yd, unsigned n, double *coefs, int *error)
{
double a0;
double a1;
double a2,
a3,
a4,
b0,
b1,
b2,
d1;
unsigned i;
double a,
b,
c,
d,
e;
*error = DATA_OK; // Assume that there are no problems
// Check for not enough data...
if (n < 3) {
*error = NOT_ENOUGH_DATA;
return 0;
}
a0 = 1;
a1 = 0;
a2 = 0;
a3 = 0;
a4 = 0;
b0 = 0;
b1 = 0;
b2 = 0;
for (i = 0; i < n; i++) {
double xi2 = xd[i] * xd[i];
double xi4 = xi2 * xi2;
double xy = xd[i] * yd[i];
a1 += xd[i];
a2 += xi2;
a3 += xi2 * xd[i];
a4 += xi4;
b0 += yd[i];
b1 += xy;
b2 += xy * xd[i];
}
a1 /= n;
a2 /= n;
a3 /= n;
a4 /= n;
b0 /= n;
b1 /= n;
b2 /= n;
d = a0 * (a2 * a4 - a3 * a3) - a1 * (a1 * a4 - a2 * a3) + a2 * (a1 * a3 - a2 * a2);
// Check for {near} singularity (all data is the same point)
if (fabs(d) < (DBL_EPSILON * 10.0)) {
*error = SINGLE_POINT;
return 0;
}
a = (b0 * (a2 * a4 - a3 * a3) + b1 * (a2 * a3 - a1 * a4) + b2 * (a1 * a3 - a2 * a2)) / d;
b = (b0 * (a2 * a3 - a1 * a4) + b1 * (a0 * a4 - a2 * a2) + b2 * (a1 * a2 - a0 * a3)) / d;
c = (b0 * (a1 * a3 - a2 * a2) + b1 * (a2 * a1 - a0 * a3) + b2 * (a0 * a2 - a1 * a1)) / d;
// Check for {near} singularity (the data is perfectly linear)
if (fabs(c) < (DBL_EPSILON * 10.0)) {
*error = STRAIGHT_LINE;
return 0;
}
// Evaluation of standard deviation d
d = 0;
for (i = 0; i < n; i++) {
d1 = yd[i] - a - b * xd[i] - c * xd[i] * xd[i];
d += d1 * d1;
}
d = sqrt(d / (n - 3.0));
// Calculation of the minimum/maximum for x
e = -b / (c * 2.0);
// Load the constants into the return array
coefs[0] = a;
coefs[1] = b;
coefs[2] = c;
coefs[3] = d;
coefs[4] = e;
return e; // Return the x location for the minimum/maximum
}