Piece weights with regression analysis (in Russian)

Xann · Post by **Xann** » Mon May 04, 2015 6:04 pm

Hi Gerd,

Gerd Isenberg wrote:Wow, so simple is that
Slowly the dust settles. Thanks for the insight, yes, a single-neuron ANN.

I don't exactly get that with the cost function requiering gradient descent to find the optimum, i.e. the error formula with least squares as used in Texel Tuning and Deep Thought Tuning and the "wild formulas" in Buro's paper?
http://www.jair.org/media/179/live-179-1475-jair.pdf

You're right, I forgot to mention the error functions: cross entropy vs. sum of squares.

I would expect statisticians to be religious about using cross entropy when you want to interpret the output as a probability. In practice, we can just try both

I think the practical advice that can be found on the net about ANN learning parameters is both very useful and easier to understand (e.g. the bottom of https://visualstudiomagazine.com/Articl ... spx?Page=1 though this focuses on data representation):
- if the output is continuous, use a linear output and sum of squares as error function
- if the output is binary/probability, use the logistic function + cross entropy
- if the output is discrete, use Softmax + log loss (not sure about the official name)

The binary case is just an optimisation of Softmax with two outputs (since q = 1 - p), but IMO it complicates the formulas.

That was for choosing an error function though, which might not be what your question was about.

Fabien.

Xann · Post by **Xann** » Mon May 04, 2015 6:20 pm

mvk wrote:I don't know. I remember having read about the idea of assigning elo to othello pattern features. Instead of guessing, do you have links to specific papers? (both Buro, and also ANN?)

I used your post but actually meant the question to be for everyone. Gerd found the original paper by Buro (his earlier PhD is in German). Later ones are very interesting too but he switched to linear regression since Othello has "continuous" game results.

The ANN literature is ginormous ... Any resource mentioning logistic output and (if possible) cross entropy should be relevant.

Xann · Post by **Xann** » Mon May 04, 2015 6:29 pm

Gerd Isenberg wrote:I don't exactly get that with the cost function requiering gradient descent to find the optimum, i.e. the error formula with least squares as used in Texel Tuning and Deep Thought Tuning and the "wild formulas" in Buro's paper?
http://www.jair.org/media/179/live-179-1475-jair.pdf

Are you referring to the log(L(beta)) ... page 377?

I think that Buro wanted to show that by assuming that the output is a probability and then maximising the likelihood, you obtain the cross-entropy error function. He needlessly complicated the formulas by introducing n_i though.

Xann · Post by **Xann** » Mon May 04, 2015 7:00 pm

mvk wrote:I don't know. I remember having read about the idea of assigning elo to othello pattern features. Instead of guessing, do you have links to specific papers? (both Buro, and also ANN?)

Try this one for ANN (also for Gerd): http://www.cedar.buffalo.edu/%7Esrihari ... aining.pdf

It seems to mention all the subjects I talked about!

Gerd Isenberg · Post by **Gerd Isenberg** » Mon May 04, 2015 7:17 pm

Xann wrote:
Gerd Isenberg wrote:I don't exactly get that with the cost function requiering gradient descent to find the optimum, i.e. the error formula with least squares as used in Texel Tuning and Deep Thought Tuning and the "wild formulas" in Buro's paper?
http://www.jair.org/media/179/live-179-1475-jair.pdf
Are you referring to the log(L(beta)) ... page 377?

I think that Buro wanted to show that by assuming that the output is a probability and then maximising the likelihood, you obtain the cross-entropy error function. He needlessly complicated the formulas by introducing n_i though.

Yep,
and further from Vladimir's article
https://chessprogramming.wikispaces.com ... n+Analysis
minimizing the cost function for the logistic regression
J(θ)=1m[∑i=1my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]
...
where the components of the gradient of J reg have the form
(∇Jreg)0=1m∑i=1m(hθ(x(i))−y(i))x(i)0
(∇Jreg)j=1m∑i=1m(hθ(x(i))−y(i))x(i)j−λmθj

Dann Corbit · Post by **Dann Corbit** » Mon May 04, 2015 8:56 pm

Xann wrote:Hi Marcel,

mvk wrote:Isn't the essence of this method exactly the same what everybody else has been doing now for a couple of years and which has become known as "Texel's tuning method"? (That is, a fit of the evaluation in the percentage domain against the game results?). The only real difference is that we use much larger data sets. All other differences seem not essential at first glance.
Sorry to hijack your post but this topic has been bugging me for a while: isn't "Texel's tuning method" just logistic regression that has been used in games for about twenty years (see Buro's papers for instance)?

I don't see any difference between all three apart from QS vs. eval, which seems a minor issue to me. A single-neuron ANN is also the same.

Fabien.

I think that the Chess Programming Wiki has nothing new in it.
What it does contain is simple, clear explanations about how to do something you need to do to write a good chess engine.
For instance, how do I do a population count? Probably everyone knows how to do it except the utter novice but the inclusion of articles on that is still very nice because it is a clear explanation for someone who wants to do it and also has explanations of how and why, not just what.
This sort of parameter fitting was not found in the wiki before, so I think it is a very good inclusion.
Probably, for all of the expert programmers this stuff is old hat.
But it is the clearest and simplest explanation that I have read.

Dann Corbit · Post by **Dann Corbit** » Mon May 04, 2015 8:59 pm

SuneF wrote:
Dann Corbit wrote:These weights are derived from correspondence chess:

PIECE VALUES:

Pawn: 100
Knight: 162.839
Bishop: 203.914
Rook: 306.862
Queen: 697.509
How are these derived from correspondence chess?

Granted the pawn is particularly difficult to estimate an average value of, since it could be worth anything from 0 to a queen. IMO though, in practice, a minor piece is worth more than 3 pawns "on average" based on the fact you don't see many "piece for 3 pawns" exchanges in actual games. I would consider such a trade to be more of a sacrifice.

I have a correspondence chess database with about 850K games in it.
I filtered the database for Elo of 2200+ (correspondence chess players have a lower number for the same ability compared to Fide OTB players). After filtering the output, I ran the algorithm against it.

Not a very scientific measurement, but a fun one nevertheless.
I can send you the filtered games if you like.

Michel · Post by **Michel** » Mon May 04, 2015 8:59 pm

While the method seems to yield very reasonable results, it appears to me it should suffer from selection bias.

If a player is a pawn ahead there is a sizeable probability that his opponent has actually sacrificed the pawn in return for compensation. So the value of the pawn will come out lower than it really is.

Xann · Post by **Xann** » Mon May 04, 2015 9:16 pm

Hi Dann,

Dann Corbit wrote:I think that the Chess Programming Wiki has nothing new in it.
What it does contain is simple, clear explanations about how to do something you need to do to write a good chess engine. ...

Allow me to clarify.

I'm not after CPW here; I've already said during the Senpai release that it was a great resource (in case anybody doubted it). Anecdote: around year 2000, I spent 6 months reading CCC posts at "random". A few of them influenced the design of Fruit in 2003. Now the information one needs is only a few clicks away!

I'm complaining about the name. If it really is just logistic regression, why use another one?

Fabien.

Gerd Isenberg · Post by **Gerd Isenberg** » Mon May 04, 2015 11:03 pm

Xann wrote: I'm complaining about the name. If it really is just logistic regression, why use another one?

Fabien.

Hi Fabien,
I was not aware. I asked Peter to incorporate his nice posting - which was a reply to Álvaro Begué's, who mentioned logistic regression - into a cpw page and used the name "Texel's Tuning Method", similar to "Stockfish's Tuning Method" and "Eval Tuning in Deep Thought" on their respective pages.

http://www.talkchess.com/forum/viewtopi ... 22&t=50823

I will mention that "Texel's Tuning Method" is a concrete instance of logistic regression, despite no cross entropy but sum of squares, with logistic sigmoid in the diff to scale the quiescence score qi into the 0-1 propability range:

Code: Select all

E=1/N&#8721;i=1,N&#40;Ri&#8722;Sigmoid&#40;qi&#41;&#41;2

As a statistics illiterate, I will try to make more generalizations on the parent page, "Automated Tuning", but would like to ask for expert advice up and then

What kind of regression is Deep Thought tuning, and old Stockfish's? Linear regression?

Thanks,
Gerd

Piece weights with regression analysis (in Russian)

Re: Piece weights with regression analysis (in Russian)

Re: Piece weights with regression analysis (in Russian)

Re: Piece weights with regression analysis (in Russian)

Re: Piece weights with regression analysis (in Russian)

Re: Piece weights with regression analysis (in Russian)

Re: Piece weights with regression analysis (in Russian)

Re: Piece weights with regression analysis (in Russian).

Re: Piece weights with regression analysis (in Russian)

Re: Piece weights with regression analysis (in Russian)

Re: Piece weights with regression analysis (in Russian)