## tuning via maximizing likelihood

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Daniel Shawul
Posts: 3758
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### tuning via maximizing likelihood

Why is the tuning methods minimize the mean-squared-error directly instead of maximizing the likelihood ( or minimizing the negative likelihood). Given r=result, the objective with the maximum likelihood estimation would be

Code: Select all

``````   like =  r * log&#40; logistic&#40;score&#41; ) + &#40;1 - r&#41; log&#40; 1 - logistic&#40;score&#41; )
objective = 1/N Sum&#40;  -like  )
``````
This should converge faster than plain least squares regression using mean squared error

Code: Select all

``````   se = &#40;r - logistic&#40;score&#41;) ** 2
objective = 1/N Sum ( se )
``````
Daniel

AlvaroBegue
Posts: 920
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: tuning via maximizing likelihood

How do you handle draws? Or does your evaluation function return W/D/L probabilities?

But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.

Daniel Shawul
Posts: 3758
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### Re: tuning via maximizing likelihood

How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.

Edit:
But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.
Maximum likelihood is the 'standard' method for logistic regression. Also i seem to get more stable iterations with it than minimzing the mse directly.

AlvaroBegue
Posts: 920
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: tuning via maximizing likelihood

Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.

I don't know why you think that the convergence would be better than using mean squared error. I don't really know if it would be, but I am curious if you have a reason to believe that a priori.

Daniel Shawul
Posts: 3758
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### Re: tuning via maximizing likelihood

AlvaroBegue wrote:
Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
The evaluation score of my engine is between (-20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.

AlvaroBegue
Posts: 920
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: tuning via maximizing likelihood

Daniel Shawul wrote:
AlvaroBegue wrote:
Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
The evaluation score of my engine is between (-20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.
I raised two separate objections. One of them is that, although using log-likelihood is somewhat theoretically motivated, it is not at all clear how draws should be handled.

The other [more serious] objection is that the penalty imposed for getting one single sample wrong in the training set is unbounded in the case of the log-likelihood formula, while it is bounded if you use mean squared error.

Daniel Shawul
Posts: 3758
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### Re: tuning via maximizing likelihood

I don't really get what you are missing here -- again draws are not a problem as far as i can see.

The maximum-likelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (http://pubmedcentralcanada.ca/pmcc/arti ... 4-0306.pdf ) where they show that backpropagation done with a maximum likelihood objective (ML-BP) is shown to be better than the least squares objective (LS-BP).

Daniel

jdart
Posts: 3825
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

### Re: tuning via maximizing likelihood

My tuner actually has an option to do this, and in addition can do Ordinal Logistic Regression. See Objective enum in:

https://github.com/jdart1/arasan-chess/ ... /tuner.cpp

I have done very limited experimentation but generally have not found these options better than mean-squared error.

--Jon

AlvaroBegue
Posts: 920
Joined: Tue Mar 09, 2010 2:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

### Re: tuning via maximizing likelihood

Daniel Shawul wrote:I don't really get what you are missing here -- again draws are not a problem as far as i can see.
Log-likelihood is a very natural quantity to maximize if you have a probability model. So if we had some procedure that produced a probability for winning, a probability for drawing and a probability for losing, it would make sense to penalize by the -log of the probability of the outcome that really happened. But the particular penalty you are using for a draw is not well motivated. Nothing will blow up, but what you are doing is not exactly maximizing likelihood.
The maximum-likelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (http://pubmedcentralcanada.ca/pmcc/arti ... 4-0306.pdf ) where they show that backpropagation done with a maximum likelihood objective (ML-BP) is shown to be better than the least squares objective (LS-BP).
Yes, it's a very common thing to optimize if what you are maximizing over are probability models. But, as I said, that's not exactly true here.

Oh, and I wouldn't go around quoting neural-network papers from 1992. Daniel Shawul
Posts: 3758
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### Re: tuning via maximizing likelihood

Good to know! So far i have had better results with the ML objective function -- even though both barely improved my engine. You seem to use a 1 draw = 2 wins + 2 losses approach unless I am mistaken, is that intentional ? I am only aware of elo models that use 1 draw = 1 win + 1 loss (rao-kapper), 2 draw = 1 win + 1 loss (davidson).

Daniel