Stockfish - material balance/imbalance evaluation

Ralph Stoesser · Post by **Ralph Stoesser** » Wed May 12, 2010 10:13 pm

mcostalba wrote:
Ralph Stoesser wrote: No bit counting involved.
Not to be considered hasty but if you say so it means you have not understood how mobility is evaluated in SF. I would suggest to try to write the patch you are proposing and so verify yourself what I was trying to tell you.

Come on, the mobility code is easy to understand.

I would not add x-rays for mobility evaluation, but only for evaluate_threats(). This could be done without bit counting. So maybe there was a missunderstanding? Or do you mean something other by bit counting than counting the number of set bits from a bitboard?

Ralph Stoesser · Post by **Ralph Stoesser** » Wed May 12, 2010 11:15 pm

Marco,

line 545, evaluate_pieces()

Code: Select all

    while ((s = *ptr++) != SQ_NONE)
    {
        // Find attacked squares
    	switch (Piece)
    	{
			case KNIGHT: b  = pos.attacks_from<KNIGHT>(s);
					       b2 = EmptyBoardBB;
					       break;
			case BISHOP: b  = pos.attacks_from<BISHOP>(s);
					       b2 = bishop_attacks_bb(s, pos.occupied_squares() & ~pos.pieces(QUEEN, Us)) & ~b;
					       break;
			case ROOK  : b  = pos.attacks_from<ROOK>(s);
					       b2 = rook_attacks_bb(s, pos.occupied_squares() & ~pos.pieces(ROOK, QUEEN, Us)) & ~b;
					       break;
			case QUEEN : b  = pos.attacks_from<QUEEN>(s);
					       b2 = queen_attacks_bb(s, pos.occupied_squares() & ~pos.pieces(ROOK, BISHOP, Us)) & ~b;
					       break;
			default	 : assert(false);
    	}

        // Update attack info
        ei.attackedBy[Us][Piece] 	 |= b;
        ei.xrayAttackedBy[Us][Piece] |= b2;

In case we would like to add xrays for mobility eval.

Code: Select all

// Mobility
        mob = (Piece != QUEEN ? count_1s_max_15<HasPopCnt>((b | b2) & no_mob_area)
                              : count_1s<HasPopCnt>((b | b2) & no_mob_area));

Threats
line 665, evaluate_threats()

Code: Select all

    // Add bonus according to type of attacked enemy pieces and to the
    // type of attacking piece, from knights to queens. Kings are not
    // considered because are already special handled in king evaluation.
    for (PieceType pt1 = KNIGHT; pt1 < KING; pt1++)
    {
        b = ei.attackedBy[Us][pt1] & weakEnemies;
        if (b)
            for (PieceType pt2 = PAWN; pt2 < KING; pt2++)
                if (b & pos.pieces(pt2))
                    bonus += ThreatBonus[pt1][pt2];
        
        b2 = ei.xrayAttackedBy[Us][pt1] & weakEnemies;
        if (b2)
            for (PieceType pt2 = PAWN; pt2 < KING; pt2++)
                if (b2 & pos.pieces(pt2))
                    bonus += ThreatBonus[pt1][pt2] / 2;
    }
    ei.value += Sign[Us] * bonus;

So what extra bit counting you are exactly talking about?

mcostalba · Post by **mcostalba** » Wed May 12, 2010 11:34 pm

Ralph Stoesser wrote: So what extra bit counting you are exactly talking about?

Because I did understand that you want to weight the x-rays in a different way from the normal attacks, so you need two weights applied to two mobility values that are calculated through bit counting: first count the normal attack and weight them, then count x-ray attacks and weight them with a different weight. I now understand that you are talking of weighting threats not mobility.

Ralph Stoesser · Post by **Ralph Stoesser** » Thu May 13, 2010 12:02 am

Aha, ok. Yes for weighted mobility we would need an extra bit count operation.
But I refered explicitely to threats, not to mobility.

Ralph Stoesser wrote: B) Threats: It would be usefull to see x-ray attacks against week enemy pieces, but the bonuses for x-ray attacks probably should be lesser compared to the bonuses for direct attacks.

Ralph Stoesser · Post by **Ralph Stoesser** » Thu May 13, 2010 2:51 pm

mcostalba wrote:
bob wrote: I have tens of millions of game that show the exact opposite. For most changes, and for almost all eval changes, very fast games produce similar "spreads" between the old and new version regardless of whether you play 15 sec games or 15 minute game. Some search changes can break this, and whenever you change time allocation code, you have to test different time controls to make sure it doesn't work with one and fail for another. But for eval changes, fast games do work _extremely_ accurately...
Thanks, of course you have much more experience then me and everybody else here !

You seem to not want to contradict. Is it because you have not enough experience testing SF with very fast self play games regarding eval only changes?

If Bob would be right also concerning SF especially, that would be very nice, because with 1 sec games you can test many ideas within a limited time frame. Also it would mean that it's much more accurate to play 10000 x 1 sec games in a sixth of the time compared to 1000 x 1 min games .

diep · Post by **diep** » Wed Jul 28, 2010 8:10 pm

Tord Romstad wrote:
Ralph Stoesser wrote:I've read Kaufman's paper about the evaluation of material imbalance, but I wonder what exactly Tord Romstad's polynomial function does.
OK, I'll try to explain. It's nothing very fancy, really.

A material evaluation function is a function of 10 variables; P (the number of white pawns), p (the number of black pawns), N (the number of white knights), n (the number of black pawns, and by now you'll understand the meaning of the remaining variables), B, b, R, r, Q and q.

When we learned to play chess, most of us were taught a material evaluation function which is a linear polynomial in the 10 variables, something like this:
Code: Select all
f(P, p, N, n, B, b, R, r, Q, q) = 1*(P-p) + 3*(N-n) + 3*(B-b) + 4.5*(R-r) + 9*(Q-q)
Later on, we learn a few material evaluation rules which cannot be expressed by a linear function. The most obvious example is the bishop pair: Two bishops are, in general, worth more than the double of a single bishop. However, we can still use a polynomial to model the evaluation function, as long as we allow terms of the second degree. If we decide that the bishop pair should be worth half a pawn, we can include this in the above evaluation function by adding the following term:
Code: Select all
0.25 * (B*(B-1) - b*(b-1))
This works because the product B*(B-1) is 0 if there are 0 or 1 white bishops, but 2 if there are 2 bishops.

Similarly, other more complex material evaluation rules like the ones found in Kaufman's paper can also be modeled by second-degree polynomial terms. For instance, assume that we want to increase the value of a knight by 0.05 for each enemy pawn on the board (this is almost certainly not an exact rule from Kaufman's paper, but I'm too lazy to look up the paper now). This would correspond to a term like this:
Code: Select all
0.05 * (N*p - n*P)
That so many material evaluation rules can be modeled by polynomials of degree 2 gave me the idea of using a completely general (apart from the obvious symmetry relations) second degree polynomial for evaluating material, and to spend lots of effort trying to tune all the coefficients (this was shortly after Joona had invented a very effective method for tuning evaluation parameters).

We never managed to make it work as well as I hoped, though.

Nonsense from the highest degree you write here.

If it would be a simple polynomial then with lineair programming you could full automatic and within 5 minutes exactly tune your entire program in a perfect manner. In fact you could do that with a simple world war 2 algorithm from the US army in fact, used for logistics.

Something like Simplex rings a bell?

I wouldn't want to claim this is first years math students theory nowadays, but ...

Tuning in computerchess is however a lot more complex. It also shows that none of the posters here has any clue on parameter tuning at all.

It's the NCSA that just tunes it with incredible amounts of system time for a big army of engines, all more or less a clone from a specific code, usually rybka.

It's not a surprise to me then that you have no idea either how Stockfish got tuned, nor Marco Costalba with his crap story of playing 1000 game. An amount that you can't even tune accurately to 1 elopoint with, let alone even tune stockfish with.

We hear too much crap about tuning, which is the most clear proof that you guys have no clue about tuning at all. Those who are really forced to tune their engine themselves know a lot better.

To my calculation, as forwarded to several, the total system time used up for parameter tuning of the rybka* type engines must be roughly around 100 million cpu node hours, or at the expensive government hardware that's roughly a budget of $50 million.

Seemingly it all gets done in USA that tuning.

Vincent

Vincent

p.s. is that why the russians posted at the time the strelka code? They saw some big army budget getting spent on computerchess and thought: "what is this?" and just posted it. All top programmers were AMAZED when they saw that code from Strelka. To quote one of them, though not only one: "Do you believe all these hundreds of parameters have been HAND TUNED?"

Milos · Post by **Milos** » Wed Jul 28, 2010 8:24 pm

diep wrote:To my calculation, as forwarded to several, the total system time used up for parameter tuning of the rybka* type engines must be roughly around 100 million cpu node hours, or at the expensive government hardware that's roughly a budget of $50 million.

A $50 million tuning project that can't even tune TM properly, ROFL

.
I wish I could believe in Santa too

.

Dann Corbit · Post by **Dann Corbit** » Wed Jul 28, 2010 8:43 pm

diep wrote:
Tord Romstad wrote:
Ralph Stoesser wrote:I've read Kaufman's paper about the evaluation of material imbalance, but I wonder what exactly Tord Romstad's polynomial function does.
OK, I'll try to explain. It's nothing very fancy, really.

A material evaluation function is a function of 10 variables; P (the number of white pawns), p (the number of black pawns), N (the number of white knights), n (the number of black pawns, and by now you'll understand the meaning of the remaining variables), B, b, R, r, Q and q.

When we learned to play chess, most of us were taught a material evaluation function which is a linear polynomial in the 10 variables, something like this:
Code: Select all
f(P, p, N, n, B, b, R, r, Q, q) = 1*(P-p) + 3*(N-n) + 3*(B-b) + 4.5*(R-r) + 9*(Q-q)
Later on, we learn a few material evaluation rules which cannot be expressed by a linear function. The most obvious example is the bishop pair: Two bishops are, in general, worth more than the double of a single bishop. However, we can still use a polynomial to model the evaluation function, as long as we allow terms of the second degree. If we decide that the bishop pair should be worth half a pawn, we can include this in the above evaluation function by adding the following term:
Code: Select all
0.25 * (B*(B-1) - b*(b-1))
This works because the product B*(B-1) is 0 if there are 0 or 1 white bishops, but 2 if there are 2 bishops.

Similarly, other more complex material evaluation rules like the ones found in Kaufman's paper can also be modeled by second-degree polynomial terms. For instance, assume that we want to increase the value of a knight by 0.05 for each enemy pawn on the board (this is almost certainly not an exact rule from Kaufman's paper, but I'm too lazy to look up the paper now). This would correspond to a term like this:
Code: Select all
0.05 * (N*p - n*P)
That so many material evaluation rules can be modeled by polynomials of degree 2 gave me the idea of using a completely general (apart from the obvious symmetry relations) second degree polynomial for evaluating material, and to spend lots of effort trying to tune all the coefficients (this was shortly after Joona had invented a very effective method for tuning evaluation parameters).

We never managed to make it work as well as I hoped, though.
Nonsense from the highest degree you write here.

If it would be a simple polynomial then with lineair programming you could full automatic and within 5 minutes exactly tune your entire program in a perfect manner. In fact you could do that with a simple world war 2 algorithm from the US army in fact, used for logistics.

Something like Simplex rings a bell?

I wouldn't want to claim this is first years math students theory nowadays, but ...

Tuning in computerchess is however a lot more complex. It also shows that none of the posters here has any clue on parameter tuning at all.

It's the NCSA that just tunes it with incredible amounts of system time for a big army of engines, all more or less a clone from a specific code, usually rybka.

It's not a surprise to me then that you have no idea either how Stockfish got tuned, nor Marco Costalba with his crap story of playing 1000 game. An amount that you can't even tune accurately to 1 elopoint with, let alone even tune stockfish with.

We hear too much crap about tuning, which is the most clear proof that you guys have no clue about tuning at all. Those who are really forced to tune their engine themselves know a lot better.

To my calculation, as forwarded to several, the total system time used up for parameter tuning of the rybka* type engines must be roughly around 100 million cpu node hours, or at the expensive government hardware that's roughly a budget of $50 million.

Seemingly it all gets done in USA that tuning.

Vincent

Vincent

p.s. is that why the russians posted at the time the strelka code? They saw some big army budget getting spent on computerchess and thought: "what is this?" and just posted it. All top programmers were AMAZED when they saw that code from Strelka. To quote one of them, though not only one: "Do you believe all these hundreds of parameters have been HAND TUNED?"

What Tord and Joona have done must work pretty well. His program is the second strongest after Rybka and all her children.

I guess that the Rybka team did not spend $50 million tuning their engine also, or was that a joke?

diep · Post by **diep** » Wed Jul 28, 2010 9:43 pm

Dann Corbit wrote:
diep wrote:
Tord Romstad wrote:
Ralph Stoesser wrote:I've read Kaufman's paper about the evaluation of material imbalance, but I wonder what exactly Tord Romstad's polynomial function does.
OK, I'll try to explain. It's nothing very fancy, really.

A material evaluation function is a function of 10 variables; P (the number of white pawns), p (the number of black pawns), N (the number of white knights), n (the number of black pawns, and by now you'll understand the meaning of the remaining variables), B, b, R, r, Q and q.

When we learned to play chess, most of us were taught a material evaluation function which is a linear polynomial in the 10 variables, something like this:
Code: Select all
f(P, p, N, n, B, b, R, r, Q, q) = 1*(P-p) + 3*(N-n) + 3*(B-b) + 4.5*(R-r) + 9*(Q-q)
Later on, we learn a few material evaluation rules which cannot be expressed by a linear function. The most obvious example is the bishop pair: Two bishops are, in general, worth more than the double of a single bishop. However, we can still use a polynomial to model the evaluation function, as long as we allow terms of the second degree. If we decide that the bishop pair should be worth half a pawn, we can include this in the above evaluation function by adding the following term:
Code: Select all
0.25 * (B*(B-1) - b*(b-1))
This works because the product B*(B-1) is 0 if there are 0 or 1 white bishops, but 2 if there are 2 bishops.

Similarly, other more complex material evaluation rules like the ones found in Kaufman's paper can also be modeled by second-degree polynomial terms. For instance, assume that we want to increase the value of a knight by 0.05 for each enemy pawn on the board (this is almost certainly not an exact rule from Kaufman's paper, but I'm too lazy to look up the paper now). This would correspond to a term like this:
Code: Select all
0.05 * (N*p - n*P)
That so many material evaluation rules can be modeled by polynomials of degree 2 gave me the idea of using a completely general (apart from the obvious symmetry relations) second degree polynomial for evaluating material, and to spend lots of effort trying to tune all the coefficients (this was shortly after Joona had invented a very effective method for tuning evaluation parameters).

We never managed to make it work as well as I hoped, though.
Nonsense from the highest degree you write here.

If it would be a simple polynomial then with lineair programming you could full automatic and within 5 minutes exactly tune your entire program in a perfect manner. In fact you could do that with a simple world war 2 algorithm from the US army in fact, used for logistics.

Something like Simplex rings a bell?

I wouldn't want to claim this is first years math students theory nowadays, but ...

Tuning in computerchess is however a lot more complex. It also shows that none of the posters here has any clue on parameter tuning at all.

It's the NCSA that just tunes it with incredible amounts of system time for a big army of engines, all more or less a clone from a specific code, usually rybka.

It's not a surprise to me then that you have no idea either how Stockfish got tuned, nor Marco Costalba with his crap story of playing 1000 game. An amount that you can't even tune accurately to 1 elopoint with, let alone even tune stockfish with.

We hear too much crap about tuning, which is the most clear proof that you guys have no clue about tuning at all. Those who are really forced to tune their engine themselves know a lot better.

To my calculation, as forwarded to several, the total system time used up for parameter tuning of the rybka* type engines must be roughly around 100 million cpu node hours, or at the expensive government hardware that's roughly a budget of $50 million.

Seemingly it all gets done in USA that tuning.

Vincent

Vincent

p.s. is that why the russians posted at the time the strelka code? They saw some big army budget getting spent on computerchess and thought: "what is this?" and just posted it. All top programmers were AMAZED when they saw that code from Strelka. To quote one of them, though not only one: "Do you believe all these hundreds of parameters have been HAND TUNED?"
What Tord and Joona have done must work pretty well. His program is the second strongest after Rybka and all her children.

I guess that the Rybka team did not spend $50 million tuning their engine also, or was that a joke?

Not a joke.
They just post some crap here and desinformation.

If you talk with all the programmers who actually tune at home you soon figure out how the tuningsproces must work. You also see a combination of different forms of tuning, yet it all needs the same oracle.

To build all that is very fulltime work. Dont underestimate this please.

You see typically that engines with more knowledge like Shredder, which uses only 24 cores has problems catchig up.

When i wasted a core or 60 (not sure how many Renze used as a maximum) at some initial tries, i soon learned that to get thinsg statistical significant you need really lots of cpu time.

We also see how crafty, despite being the only engine that's original work in some sense (no mercilious cut'n pasting from other 'open source or whatever you want to call them engines, such as some polish and russian programmer mentionned that Glaurung did do (so version 2.2 before it was baptized stockfish).

Also note the nps-es have been completely optimized cycle wise everywhere.

Wasn't glaurung first 400k nps or so when i ran it at my box, now it's 4M nps. It's faster than any other of todays top engines, except for rybka.

That's not easy to achieve.

As the polish and russian programmers already noticed, is that several programmers have worked in the source code of glaurung 2.0 to 2.2.

A lot of changes were there, none of them really followed the styleguide of Tord and some were clumsy C programmers just doing cut'n paste work. Not something Costalba nor that later shown up Joona person would EVER do, even at 4 AM.

It's very unclear, but seems it was a rather big team doing all those code changes to the glaurung code. Definitely not Tord.

In crafty we see the same thing. It's a math guy again there doing code changes and even cycles get saved out. It's unclear who is doing the code changes, except we know for sure it's not bob and we can see from code it's more than 1 person, whereas claim is that it is 1 person.

In rybka, well you know just look at the huge differences between version 1.0 to 3.0 and you'll realize soon it's a bunch of programmers.

The total budget in system time is far bigger of course than programmers time, as usual. Besides system time is just a paper form and otherwise those supercomputers idle anyway.

Yet to get such big budgets really requires something.

Most are simply underestimating what it takes.

I'd argue, just compare with the publications by well known computerchess authors. In terms of testing it's not even in the same galaxy quality wise.

Compare accuracy of Heinz publications and Omid publications with what has happened here. That requires LARGE teams in the background.

Only NCSA can deliver that.

To quote someone here: "The AIVD (dutch intelligence agency) would NEVER allow that secret tuners get used to tune engines that get public spreaded somehow, commercial or open source or in whatever form".

Other european agencies would work the same i guess (i do not know i never worked for one). So that leaves Mossad and NCSA.

Thanks,
Vincent Diepeveen

diep · Post by **diep** » Wed Jul 28, 2010 9:44 pm

diep wrote:
Dann Corbit wrote:
diep wrote:
Tord Romstad wrote:
Ralph Stoesser wrote:I've read Kaufman's paper about the evaluation of material imbalance, but I wonder what exactly Tord Romstad's polynomial function does.
OK, I'll try to explain. It's nothing very fancy, really.

A material evaluation function is a function of 10 variables; P (the number of white pawns), p (the number of black pawns), N (the number of white knights), n (the number of black pawns, and by now you'll understand the meaning of the remaining variables), B, b, R, r, Q and q.

When we learned to play chess, most of us were taught a material evaluation function which is a linear polynomial in the 10 variables, something like this:
Code: Select all
f(P, p, N, n, B, b, R, r, Q, q) = 1*(P-p) + 3*(N-n) + 3*(B-b) + 4.5*(R-r) + 9*(Q-q)
Later on, we learn a few material evaluation rules which cannot be expressed by a linear function. The most obvious example is the bishop pair: Two bishops are, in general, worth more than the double of a single bishop. However, we can still use a polynomial to model the evaluation function, as long as we allow terms of the second degree. If we decide that the bishop pair should be worth half a pawn, we can include this in the above evaluation function by adding the following term:
Code: Select all
0.25 * (B*(B-1) - b*(b-1))
This works because the product B*(B-1) is 0 if there are 0 or 1 white bishops, but 2 if there are 2 bishops.

Similarly, other more complex material evaluation rules like the ones found in Kaufman's paper can also be modeled by second-degree polynomial terms. For instance, assume that we want to increase the value of a knight by 0.05 for each enemy pawn on the board (this is almost certainly not an exact rule from Kaufman's paper, but I'm too lazy to look up the paper now). This would correspond to a term like this:
Code: Select all
0.05 * (N*p - n*P)
That so many material evaluation rules can be modeled by polynomials of degree 2 gave me the idea of using a completely general (apart from the obvious symmetry relations) second degree polynomial for evaluating material, and to spend lots of effort trying to tune all the coefficients (this was shortly after Joona had invented a very effective method for tuning evaluation parameters).

We never managed to make it work as well as I hoped, though.
Nonsense from the highest degree you write here.

If it would be a simple polynomial then with lineair programming you could full automatic and within 5 minutes exactly tune your entire program in a perfect manner. In fact you could do that with a simple world war 2 algorithm from the US army in fact, used for logistics.

Something like Simplex rings a bell?

I wouldn't want to claim this is first years math students theory nowadays, but ...

Tuning in computerchess is however a lot more complex. It also shows that none of the posters here has any clue on parameter tuning at all.

It's the NCSA that just tunes it with incredible amounts of system time for a big army of engines, all more or less a clone from a specific code, usually rybka.

It's not a surprise to me then that you have no idea either how Stockfish got tuned, nor Marco Costalba with his crap story of playing 1000 game. An amount that you can't even tune accurately to 1 elopoint with, let alone even tune stockfish with.

We hear too much crap about tuning, which is the most clear proof that you guys have no clue about tuning at all. Those who are really forced to tune their engine themselves know a lot better.

To my calculation, as forwarded to several, the total system time used up for parameter tuning of the rybka* type engines must be roughly around 100 million cpu node hours, or at the expensive government hardware that's roughly a budget of $50 million.

Seemingly it all gets done in USA that tuning.

Vincent

Vincent

p.s. is that why the russians posted at the time the strelka code? They saw some big army budget getting spent on computerchess and thought: "what is this?" and just posted it. All top programmers were AMAZED when they saw that code from Strelka. To quote one of them, though not only one: "Do you believe all these hundreds of parameters have been HAND TUNED?"
What Tord and Joona have done must work pretty well. His program is the second strongest after Rybka and all her children.

I guess that the Rybka team did not spend $50 million tuning their engine also, or was that a joke?
Not a joke.

or to quote Bob: "Who are all these guys?"

Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation

Re: Stockfish - material balance/imbalance evaluation