Discussion of chess software programming and technical issues.
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

Daniel Shawul
 Posts: 3593
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Wed Oct 04, 2017 4:54 pm
Why is the tuning methods minimize the meansquarederror directly instead of maximizing the likelihood ( or minimizing the negative likelihood). Given r=result, the objective with the maximum likelihood estimation would be
Code: Select all
like = r * log( logistic(score) ) + (1  r) log( 1  logistic(score) )
objective = 1/N Sum( like )
This should converge faster than plain least squares regression using mean squared error
Code: Select all
se = (r  logistic(score)) ** 2
objective = 1/N Sum ( se )
Daniel

AlvaroBegue
 Posts: 916
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Post
by AlvaroBegue » Wed Oct 04, 2017 5:38 pm
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.

Daniel Shawul
 Posts: 3593
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Wed Oct 04, 2017 5:45 pm
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of inf, inf, the logistic function should return scores between 0 and 1.
Edit:
But the real answer is that I don't need to penalize my evaluation function infinitely for getting one case wrong, which using logs would do.
Maximum likelihood is the 'standard' method for logistic regression. Also i seem to get more stable iterations with it than minimzing the mse directly.

AlvaroBegue
 Posts: 916
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Post
by AlvaroBegue » Wed Oct 04, 2017 5:51 pm
Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of inf, inf, the logistic function should return scores between 0 and 1.
The loglikelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
I don't know why you think that the convergence would be better than using mean squared error. I don't really know if it would be, but I am curious if you have a reason to believe that a priori.

Daniel Shawul
 Posts: 3593
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Wed Oct 04, 2017 6:00 pm
AlvaroBegue wrote:Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of inf, inf, the logistic function should return scores between 0 and 1.
The loglikelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
The evaluation score of my engine is between (20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.

AlvaroBegue
 Posts: 916
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Post
by AlvaroBegue » Wed Oct 04, 2017 6:20 pm
Daniel Shawul wrote:AlvaroBegue wrote:Daniel Shawul wrote:
How do you handle draws? Or does your evaluation function return W/D/L probabilities?
Draws should be fine i think. Unless we have a score of inf, inf, the logistic function should return scores between 0 and 1.
The loglikelihood function you posted is not bounded. If my evaluation function predicts the probability of the result is 0, it gets an infinite penalty.
The evaluation score of my engine is between (20000, 20000), and for a draw it would be score=0 or logistic(score) = 0.5. So draws are no problem as far as i can see.
I raised two separate objections. One of them is that, although using loglikelihood is somewhat theoretically motivated, it is not at all clear how draws should be handled.
The other [more serious] objection is that the penalty imposed for getting one single sample wrong in the training set is unbounded in the case of the loglikelihood formula, while it is bounded if you use mean squared error.

Daniel Shawul
 Posts: 3593
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Wed Oct 04, 2017 8:15 pm
I don't really get what you are missing here  again draws are not a problem as far as i can see.
The maximumlikelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (
http://pubmedcentralcanada.ca/pmcc/arti ... 40306.pdf ) where they show that backpropagation done with a maximum likelihood objective (MLBP) is shown to be better than the least squares objective (LSBP).
Daniel

AlvaroBegue
 Posts: 916
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
 Full name: Álvaro Begué (RuyDos)
Post
by AlvaroBegue » Wed Oct 04, 2017 11:18 pm
Daniel Shawul wrote:I don't really get what you are missing here  again draws are not a problem as far as i can see.
Loglikelihood is a very natural quantity to maximize if you have a probability model. So if we had some procedure that produced a probability for winning, a probability for drawing and a probability for losing, it would make sense to penalize by the log of the probability of the outcome that really happened. But the particular penalty you are using for a draw is not well motivated. Nothing will blow up, but what you are doing is not exactly maximizing likelihood.
The maximumlikelihood estimation can even be used to train neural networks (multiple layers) instead of a single layer evaluation function we are talking about here. Here is an example paper (
http://pubmedcentralcanada.ca/pmcc/arti ... 40306.pdf ) where they show that backpropagation done with a maximum likelihood objective (MLBP) is shown to be better than the least squares objective (LSBP).
Yes, it's a very common thing to optimize if what you are maximizing over are probability models. But, as I said, that's not exactly true here.
Oh, and I wouldn't go around quoting neuralnetwork papers from 1992.

Daniel Shawul
 Posts: 3593
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia

Contact:
Post
by Daniel Shawul » Thu Oct 05, 2017 12:47 am
Good to know! So far i have had better results with the ML objective function  even though both barely improved my engine. You seem to use a 1 draw = 2 wins + 2 losses approach unless I am mistaken, is that intentional ? I am only aware of elo models that use 1 draw = 1 win + 1 loss (raokapper), 2 draw = 1 win + 1 loss (davidson).
Daniel