tuning via maximizing likelihood

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 04, 2017 6:54 pm

Why is the tuning methods minimize the mean-squared-error directly instead of maximizing the likelihood ( or minimizing the negative likelihood). Given r=result, the objective with the maximum likelihood estimation would be

Code: Select all

   like =  r * log( logistic(score) ) + (1 - r) log( 1 - logistic(score) )
   objective = 1/N Sum(  -like  )

This should converge faster than plain least squares regression using mean squared error

Code: Select all

   se = (r - logistic(score)) ** 2
   objective = 1/N Sum ( se )

Daniel

AlvaroBegue · Post by **AlvaroBegue** » Wed Oct 04, 2017 7:38 pm

How do you handle draws? Or does your evaluation function return W/D/L probabilities?

But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 04, 2017 7:45 pm

How do you handle draws? Or does your evaluation function return W/D/L probabilities?

Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.

Edit:

But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.

Maximum likelihood is the 'standard' method for logistic regression. Also i seem to get more stable iterations with it than minimzing the mse directly.

AlvaroBegue · Post by **AlvaroBegue** » Wed Oct 04, 2017 7:51 pm

Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.

The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.

I don't know why you think that the convergence would be better than using mean squared error. I don't really know if it would be, but I am curious if you have a reason to believe that a priori.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 04, 2017 8:00 pm

AlvaroBegue wrote:
Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.

The evaluation score of my engine is between (-20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.

AlvaroBegue · Post by **AlvaroBegue** » Wed Oct 04, 2017 8:20 pm

Daniel Shawul wrote:
AlvaroBegue wrote:
Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of -inf, inf, the logistic function should return scores between 0 and 1.
The log-likelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
The evaluation score of my engine is between (-20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.

I raised two separate objections. One of them is that, although using log-likelihood is somewhat theoretically motivated, it is not at all clear how draws should be handled.

The other [more serious] objection is that the penalty imposed for getting one single sample wrong in the training set is unbounded in the case of the log-likelihood formula, while it is bounded if you use mean squared error.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 04, 2017 10:15 pm

I don't really get what you are missing here -- again draws are not a problem as far as i can see.

The maximum-likelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (http://pubmedcentralcanada.ca/pmcc/arti ... 4-0306.pdf ) where they show that backpropagation done with a maximum likelihood objective (ML-BP) is shown to be better than the least squares objective (LS-BP).

Daniel

jdart · Post by **jdart** » Thu Oct 05, 2017 12:45 am

My tuner actually has an option to do this, and in addition can do Ordinal Logistic Regression. See Objective enum in:

https://github.com/jdart1/arasan-chess/ ... /tuner.cpp

I have done very limited experimentation but generally have not found these options better than mean-squared error.

--Jon

AlvaroBegue · Post by **AlvaroBegue** » Thu Oct 05, 2017 1:18 am

Daniel Shawul wrote:I don't really get what you are missing here -- again draws are not a problem as far as i can see.

Log-likelihood is a very natural quantity to maximize if you have a probability model. So if we had some procedure that produced a probability for winning, a probability for drawing and a probability for losing, it would make sense to penalize by the -log of the probability of the outcome that really happened. But the particular penalty you are using for a draw is not well motivated. Nothing will blow up, but what you are doing is not exactly maximizing likelihood.

The maximum-likelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (http://pubmedcentralcanada.ca/pmcc/arti ... 4-0306.pdf ) where they show that backpropagation done with a maximum likelihood objective (ML-BP) is shown to be better than the least squares objective (LS-BP).

Yes, it's a very common thing to optimize if what you are maximizing over are probability models. But, as I said, that's not exactly true here.

Oh, and I wouldn't go around quoting neural-network papers from 1992.

Daniel Shawul · Post by **Daniel Shawul** » Thu Oct 05, 2017 2:47 am

Good to know! So far i have had better results with the ML objective function -- even though both barely improved my engine. You seem to use a 1 draw = 2 wins + 2 losses approach unless I am mistaken, is that intentional ? I am only aware of elo models that use 1 draw = 1 win + 1 loss (rao-kapper), 2 draw = 1 win + 1 loss (davidson).

Daniel

tuning via maximizing likelihood

tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood

Re: tuning via maximizing likelihood