Material Tables and Indexing

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Material Tables and Indexing

Post by bob »

gladius wrote:
bob wrote:Just do what good chess players do and do not trade a piece for three pawns, or two pieces for a rook and pawn, unless you can actually _see_ an advantage in the resulting position. Otherwise it is just an easy way to lose slowly but surely...
My material values are roughly pawn = 950 cp, knight/bishop = 3250 cp, so there has to be at least 400 cp compensation for my program to go for the trade (which, if some of the pawns are part of the pawn shelter, is pretty easy to come by). So, if I had the material table bonus of 750 cp, it would need 1150 cp of compensation, which seems more in line with what should be happening.

I'm not sure what you mean by seeing an advantage though. The dynamic considerations in the position are frequently worth 400cp to my (admittedly not amazing) evaluation function, so it thinks the trade is great. 20 moves down the line however, it usually ends up being not so good :). Other than handling these material imbalances, I guess I could penalize all my non-material weights in the opening, but that has lots of side effects as well.
There are two points here...

1. piece for 3 pawns is almost always bad. The opponent ends up with an extra piece, and can slowly overwhelm each and every one of your pawns since he has one extra piece to attack with.

2. On rare occasions, particularly in late middlegame positions, a knight for 3 pawns might not be bad at all, if the pawns are mobile and can move freely and quickly, Or if the piece for three pawns exposes the king to an attack by rooks/queen on the now-open files. But those cases are extremely rare.

For this reason, I have some very simple code in Crafty that simply tosses in a big penalty if piece for 3 pawns, or two minors for rook+pawn gets played on the board... And it works well in almost all cases. General rules will almost always have exceptions, but if they are rare enough....
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: Material Tables and Indexing

Post by Edsel Apostol »

Mark, thanks for the ideas. I will try to decipher just how these values are being derived.

Maybe someday if I could finally discover the perfect equation for material imbalance in chess, I will post it here.
Edsel Apostol
Posts: 803
Joined: Mon Jul 17, 2006 5:53 am
Full name: Edsel Apostol

Re: Material Tables and Indexing

Post by Edsel Apostol »

Hi Harald,

I don't know if using data from Strelka is illegal. I have read somewhere that data is not under any copyright. If most of them here would say that it is illegal then I would not use it anymore.

Anyway, don't worry as I'm just using it for my little study.
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Material Tables and Indexing

Post by mjlef »

Edsel Apostol wrote:Hi Harald,

I don't know if using data from Strelka is illegal. I have read somewhere that data is not under any copyright. If most of them here would say that it is illegal then I would not use it anymore.

Anyway, don't worry as I'm just using it for my little study.
I do not know either. One idea might bve to use the data here:

http://www.ascotti.org/programming/chess/mat_stats.html

and try and derive a good equaltion to match the data. Once you have that, you could feed the data to Strelka or Rybka in sample positions, have it search, then see how closely the data compares to the eval. Since that would not involve looking into source code, it would be fair, and this could act as a control, to see how close "real world" data is to whatever Strelka/Rybka used.

One proeblm with the data is it is full of holes, since most material imbalance positions never appear in the data, or not enoug positions happened to be worth including in them.

BTW, I did take that data and convert it into a big numeric table, with piece counts. It is fun to play with. Manually, I was able to come up with some terms that got me to within 88% of the measured values, and this did not include pawn terms at all.

My next attempt will be to try and find some automated way of doing a least squares fit. I can then apply this to each piece combination I want, then see how likely that piece term is in changing the final values. FOr example, does a term of say WP * BN show any relationtion to the results? If I can prune down the number of terms, it gets a lot easier to solve for a matching formula.

Mark
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Material Tables and Indexing

Post by hgm »

I don't believe that this kind of data could tell us anything about the value of a material imbalance anyway. Because the statistics is computed on a highly biased sample: these are positions from games, rather than randomly generated.

To extract piece values, the sample would have to be corrected not only for rating difference of the players, but also for positional compensation (e.g. passer bonuses, King safety). They would have to be evaluated (through a search of reasonable depth) by an engine with known piece values, and then the piece values of the root position should be subtracted from this score to obtain the positional bias.

I only use data generated by self-play from symmetrical positions (except for the imbalance).
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Material Tables and Indexing

Post by mjlef »

hgm wrote:I don't believe that this kind of data could tell us anything about the value of a material imbalance anyway. Because the statistics is computed on a highly biased sample: these are positions from games, rather than randomly generated.

To extract piece values, the sample would have to be corrected not only for rating difference of the players, but also for positional compensation (e.g. passer bonuses, King safety). They would have to be evaluated (through a search of reasonable depth) by an engine with known piece values, and then the piece values of the root position should be subtracted from this score to obtain the positional bias.

I only use data generated by self-play from symmetrical positions (except for the imbalance).
You have to think in terms of a large number of positions. When the population becomes large enough, things like king safety and passed pawns will balance out. Plus it would be possible to toss out positions with these features. I think the idea is to measure the predictive value material imbalances has. Real world data always has noise, but just as the average weight of a human increases with age (to some age), you would not throw out that conclusion just because weight fluctuates a bit day to day. The very crude material imbalance term I added to my program (which was basically taking the data and just assuming pawn=100 ELO) worked in the tests I did. But I think it could be vastly improved.
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Material Tables and Indexing

Post by hgm »

The point is not that there is noise, but if you can expect the noise to average to zero. I don't think you can. Especially not in positions close to a complete set of pieces: the majority of positions with only one Pawn missing will be positions where this Pawn was sacrificed intentionally, as it is very unlikely that players above 2200 will get into so much trouble this early in the game that they would lose a Pawn unvoluntarily.

Similarly, KQKP will see an unrealistically high number of draws, as most games would have been resigned or adjudicated long before this stage, unless the Pawn is on 7th rank.
F. Bluemers
Posts: 868
Joined: Thu Mar 09, 2006 11:21 pm
Location: Nederland

Re: Material Tables and Indexing

Post by F. Bluemers »

Edsel Apostol wrote:Hi Harald,

I don't know if using data from Strelka is illegal. I have read somewhere that data is not under any copyright. If most of them here would say that it is illegal then I would not use it anymore.

Anyway, don't worry as I'm just using it for my little study.
they tables are probably generating with fruit/toga 's evaluation code for material.It is in materialcpp

Code: Select all

// material_comp_info()

static void material_comp_info(material_info_t * info, const board_t * board) {

   int wp, wn, wb, wr, wq;
   int bp, bn, bb, br, bq;
   int wt, bt;
   int wm, bm;
   int colour;
   int recog;
   int flags;
   int cflags[ColourNb];
   int mul[ColourNb];
   int phase;
   int opening, endgame;
   int owf,obf,ewf,ebf; /* Thomas */
   int WhiteMinors,BlackMinors,WhiteMajors,BlackMajors;

   ASSERT(info!=NULL);
   ASSERT(board!=NULL);

   // init

   wp = board->number[WhitePawn12];
   wn = board->number[WhiteKnight12];
   wb = board->number[WhiteBishop12];
   wr = board->number[WhiteRook12];
   wq = board->number[WhiteQueen12];

   bp = board->number[BlackPawn12];
   bn = board->number[BlackKnight12];
   bb = board->number[BlackBishop12];
   br = board->number[BlackRook12];
   bq = board->number[BlackQueen12];

   wt = wq + wr + wb + wn + wp; // no king
   bt = bq + br + bb + bn + bp; // no king

   wm = wb + wn;
   bm = bb + bn;

   // recogniser

   recog = MAT_NONE;
............
etc. etc.
The idea of using a big array just like the one in strelka was posted long ago in russian fora.
Best
Fonzy
F. Bluemers
Posts: 868
Joined: Thu Mar 09, 2006 11:21 pm
Location: Nederland

Re: Material Tables and Indexing

Post by F. Bluemers »

I was lucky to find it back :lol:
(http://kasparovchess.crestbook.com/view ... =2186&p=19)

Code: Select all

Vlad_Imir 
Participant 
Joined: 11/11/2006 
Re: Fishing 
NS wrote: 

But with the tables once again did not understand. 
Even if predpolozheni true, and indeed Washika table material evaluation depending on the mix of material on the board / stage of the party ... 

Everything is simple, Rybkin directly pointed to material_get_info (from the Fruit or Toga). 
How is the information regarding the material? See eval () 

If you take that on board may be no more than 8 pawns, two elephants, and so on. That is not exotic-type 5 elephants fight against 6 horses, a normal position, the combinations of all shapes, black and white well 
int size = 9 * 3 * 3 * 3 * 2 * 9 * 3 * 3 * 3 * 2; 

Creating massivchik material_info_t n_material * = new material_info_t [size]; 
Making cycles for vsez combinations, causing material_comp_info (info, board), fill the array. 
Now you can (but not necessarily) write to the disk. 

When downloading: 
material_info_t n_material *; 

int main (int argc, char * argv []) ( 

/ / init 
FileProcessor <material_info_t> FP; 

int size = 9 * 3 * 3 * 3 * 2 * 9 * 3 * 3 * 3 * 2; 
n_material = new material_info_t &#91;size&#93;; 

bool succes = FP.ReadFromFile &#40;T.ini ", & n_material &#91;0&#93;, size&#41;; 
. 
. 

Using&#58; 

extern material_info_t n_material *; 

void material_get_info &#40;material_info_t info *, const board_t board *) ( 

int wp, wn, wb, wr, wq; 
int bp, bn, bb, br, bq; 
int tkey; 

wp = board-> number &#91;WhitePawn12&#93;; 
= wn board-> number &#91;WhiteKnight12&#93;; 
wb = board-> number &#91;WhiteBishop12&#93;; 
wr = board-> number &#91;WhiteRook12&#93;; 
wq = board-> number &#91;WhiteQueen12&#93;; 

bp = board-> number &#91;BlackPawn12&#93;; 
board-bn => number &#91;BlackKnight12&#93;; 
bb = board-> number &#91;BlackBishop12&#93;; 
br = board-> number &#91;BlackRook12&#93;; 
= bq board-> number &#91;BlackQueen12&#93;; 

if (! &#40;wn> 2 | | wb> 2 | | wr> 2 | | wq> 1 | | bn> 2 | | bb> 2 | | br> 2 | | bq> 1&#41;) ( 
tkey = bq 
* 2 + br 
bb + 2 * 3 * 
+ bn * 2 * 3 * 3 
+ bp * 2 * 3 * 3 * 3 
+ wq * 2 * 3 * 3 * 3 * 9 
+ wr * 2 * 3 * 3 * 3 * 9 * 2 
wb + * 2 * 3 * 3 * 3 * 9 * 2 * 3 
wn + * 2 * 3 * 3 * 3 * 9 * 2 * 3 * 3 
wp + * 2 * 3 * 3 * 3 * 9 * 2 * 3 * 3 * 3; 

* n_material info = &#91;tkey&#93;; 
return; 
) 

/ / If two queen, five odds, etc. - call 
material_comp_info &#40;info, board&#41;; 

Of structure material_info_t remove uint32 lock - is not necessary, of course 

Speeds adds ten percent
Best
Fonzy
(don't blame me for the translation :lol: )
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Material Tables and Indexing

Post by Dann Corbit »

mjlef wrote:I think curve/formula fitting is the best approach. I actually staretd doing that, as follows:

Merge the data for Black and while. For example, if you have

KR vs k
and
K vs kr

just add the totals together to form ona KR v k entry. This increases the counts for that imbalance and make it more statitsically relevant.

Next, split out and count the amount of each piece. You really want these numbers

P
N
B
R
Q
p
n
b
r
q

which are the counts of each piece type for each color.

Next, you need to come up with a formula for the correction:

C = -100*p +100*P -325*N + 325*n -328*b +325*B -500*r + 500*R
-900*q +900*q (these are just whatever standard values you want to give the pieces)
+A*p + B*P +C*p*n + D*p*N + E*p*b + F*p*B + G*p*r + H*p*R +
I*p*q + J*p*Q + K*n + L*N +....
and so on, with a constant for each combination of piece type counts

Then just curve fit and find the best value for each term.

I tried doing this manually, using the square of the difference between the correction and the win probabilities. You need to equate the win probablitlies to material difference to do this. I think I set 1 pawn=100 ELO...very crude.

It would be nice to find a good fit with the material in the "center" where material imbalance is not great. Who cares how good the fit is when one side is a queen ahead, since it hardly matters. In my case I only tried the manual approach for material differences of 3 or less pawns.

The above only uses linear terms. A better system could fit most of the data, and mark some positions that just do not fit well for exception handling. For example, B vs p is most likely a draw, but trying to get that to fit well might mess up other terms, and so it could be culled from the data to get a better fit of other stuff.


Is there a good (and free) package that can do these kinds of fits? I see stuff to handle a few terms, but we need a lot. 10+9+8+7+6+5+4+3+2+1 =55 actually for just the linear terms. Of course, many of these might come out to close to zero, once the data is analyzed. And if higher order terms are needed, there might be a lot more.

Another approach is to to a best fit of each term. See, for example what each term shows up on its own. If it comes close to zero it could probably be eliminated.

Mark
I have code to perform parabolic least squares fit, which is probably what is wanted. I also have code for fitting any arbitrary function (Levenberg Marquardt algorithm).

I have used the parabolic fit to do curves to find optimal tactical settings and it works great for that. Unfortunately, the tactical settings do not turn out to be optimal for play.

I have not run the curves on material imbalance. It will be very interesting to me, after we calculate the curves, if we can reason backwards from the equation to deduce the meaning of it.

Here is the parabolic fit:

Code: Select all

#include <math.h>
#include <float.h>

/***************************************************
*  Parabolic Least Squares Demonstration Program   *
* ------------------------------------------------ *
* Reference&#58; BASIC Scientific Subroutines, Vol. II *
*   by F.R. Ruckdeschel, BYTE/McGRAWW-HILL, 1981.  *
*   ISBN 0-07-054202-3 &#40;v. 2&#41;, Pages 24-31         *
*                                                  *
*              C++ version by J-P Moreau, Paris    *
* Subsequently translated to C and heavily altered *
* by Dann Corbit                                   *
* ------------------------------------------------ *
* This program calculates a parabolic least squares*
* fit to a given data set.                         *
*                                                  *
* INSTRUCTIONS                                     *
* ------------                                     *
* 1.  The number of data coordinates provided must *
*     be greater than three points.                *
* 2.  The data must not be a single point, which   *
*     is repeated.                                 *
* 3.  The data must not be perfectly colinear.     *
* If any of these 3 rules are violated, an error   *
* flag will be set.  The returned data will be     *
* invalid and must not be used.                    *
*                                                  *
* SAMPLE DATA&#58;                                     *
*                                                  *
* Number of data points &#58; 4                        *
*                                                  *
*   X&#91;0&#93;=1  Y&#91;0&#93;=1                                 *
*   X&#91;1&#93;=2  Y&#91;1&#93;=4                                 *
*   X&#91;2&#93;=3  Y&#91;2&#93;=9                                 *
*   X&#91;3&#93;=5  Y&#91;3&#93;=24.95                             *
*                                                  *
* Fitted equation is&#58;                              *
*   Y = -0.017727 + 0.022045 X + 0.994318 X^2      *
*                                                  *
* Standard deviation of fit&#58; 0.004767              *
***************************************************/
#define DATA_OK         0	// Everything is A-OK

#define NOT_ENOUGH_DATA 1	// Underdetermined system

#define SINGLE_POINT    2	// Degenerate data

#define STRAIGHT_LINE   3	// Degenerate data

 /****************************************************************
 *           Parabolic least squares estimation                  *
 * ------------------------------------------------------------- *
 * In&#58;  unsigned n = number of points                            *
 *      n values xd&#91;i&#93;, yd&#91;i&#93;                                    *
 * Out&#58; coefficients a,b,c of fit &#40;a+b*x+c*x^2&#41;                  *
 *      coefs&#91;0&#93; = multiplier for x^0 &#123;Constant term&#125;            *
 *      coefs&#91;1&#93; = multiplier for x^1 &#123;Linear term&#125;              *
 *      coefs&#91;2&#93; = multiplier for x^2 &#123;Quadratic term&#125;           *
 *      coefs&#91;3&#93; = Standard deviation &#123;How good is the fit?&#125;     *
 *      coefs&#91;4&#93; = x location for the extrapolcated minimum      *
 * Returns&#58; The location of the minimum of the parabola at x.    *
 ****************************************************************/

double          plsqf&#40;double *xd, double *yd, unsigned n, double *coefs, int *error&#41;
&#123;
    double          a0;
    double          a1;
    double          a2,
                    a3,
                    a4,
                    b0,
                    b1,
                    b2,
                    d1;
    unsigned        i;
    double          a,
                    b,
                    c,
                    d,
                    e;
    *error = DATA_OK;           // Assume that there are no problems

    // Check for not enough data...
    if &#40;n < 3&#41; &#123;
        *error = NOT_ENOUGH_DATA;
        return 0;
    &#125;
    a0 = 1;
    a1 = 0;
    a2 = 0;
    a3 = 0;
    a4 = 0;
    b0 = 0;
    b1 = 0;
    b2 = 0;
    for &#40;i = 0; i < n; i++) &#123;
        double          xi2 = xd&#91;i&#93; * xd&#91;i&#93;;
        double          xi4 = xi2 * xi2;
        double          xy = xd&#91;i&#93; * yd&#91;i&#93;;
        a1 += xd&#91;i&#93;;
        a2 += xi2;
        a3 += xi2 * xd&#91;i&#93;;
        a4 += xi4;
        b0 += yd&#91;i&#93;;
        b1 += xy;
        b2 += xy * xd&#91;i&#93;;
    &#125;
    a1 /= n;
    a2 /= n;
    a3 /= n;
    a4 /= n;
    b0 /= n;
    b1 /= n;
    b2 /= n;
    d = a0 * &#40;a2 * a4 - a3 * a3&#41; - a1 * &#40;a1 * a4 - a2 * a3&#41; + a2 * &#40;a1 * a3 - a2 * a2&#41;;

    // Check for &#123;near&#125; singularity &#40;all data is the same point&#41;
    if &#40;fabs&#40;d&#41; < &#40;DBL_EPSILON * 10.0&#41;) &#123;
        *error = SINGLE_POINT;
        return 0;
    &#125;
    a = &#40;b0 * &#40;a2 * a4 - a3 * a3&#41; + b1 * &#40;a2 * a3 - a1 * a4&#41; + b2 * &#40;a1 * a3 - a2 * a2&#41;) / d;
    b = &#40;b0 * &#40;a2 * a3 - a1 * a4&#41; + b1 * &#40;a0 * a4 - a2 * a2&#41; + b2 * &#40;a1 * a2 - a0 * a3&#41;) / d;
    c = &#40;b0 * &#40;a1 * a3 - a2 * a2&#41; + b1 * &#40;a2 * a1 - a0 * a3&#41; + b2 * &#40;a0 * a2 - a1 * a1&#41;) / d;

    // Check for &#123;near&#125; singularity &#40;the data is perfectly linear&#41;
    if &#40;fabs&#40;c&#41; < &#40;DBL_EPSILON * 10.0&#41;) &#123;
        *error = STRAIGHT_LINE;
        return 0;
    &#125;
    // Evaluation of standard deviation d
    d = 0;
    for &#40;i = 0; i < n; i++) &#123;
        d1 = yd&#91;i&#93; - a - b * xd&#91;i&#93; - c * xd&#91;i&#93; * xd&#91;i&#93;;
        d += d1 * d1;
    &#125;
    d = sqrt&#40;d / &#40;n - 3.0&#41;);

    // Calculation of the minimum/maximum for x
    e = -b / &#40;c * 2.0&#41;;

    // Load the constants into the return array
    coefs&#91;0&#93; = a;
    coefs&#91;1&#93; = b;
    coefs&#91;2&#93; = c;
    coefs&#91;3&#93; = d;
    coefs&#91;4&#93; = e;
    return e;                   // Return the x location for the minimum/maximum
&#125;