CLOP slides

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

CLOP slides

Post by Rémi Coulom »

There are some slides about CLOP:
http://remi.coulom.free.fr/CLOP/CLOPSlides.pdf
They contain a step-by-step animation of the algorithm that may be easier to understand than the paper.

Rémi
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: CLOP slides

Post by lucasart »

Hello Remi,

Just wondering how you calculate the MAP (red line on page 6)
- Is it a Nadaraya-Watson kernel estimator ? Something similar ?
- What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?

Anyway thank you very much for CLOP, I'm sure it will help the community (in fact it already does)

Lucas
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: CLOP slides

Post by Michel »

What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?
I conjecture it will more or less randomly settle on one of the hills.

Once CLOP has selected a hill, the weights of the parameter values corresponding to the other hills will be low so these parameter values will be selected with low probability. So I think they will not change the outcome of the regression computation....
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: CLOP slides

Post by lucasart »

In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.

So for example the value of a Knight:
- clearly the true function has one hill (unless there's a strange bug in your program)
- but the estimated function over a finite sample might not !?

I wonder how Remi handled that case in practice.

I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: CLOP slides

Post by Rémi Coulom »

lucasart wrote:Hello Remi,

Just wondering how you calculate the MAP (red line on page 6)
- Is it a Nadaraya-Watson kernel estimator ? Something similar ?
- What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?

Anyway thank you very much for CLOP, I'm sure it will help the community (in fact it already does)

Lucas
MAP stands for maximum a posteriori. The regression is quadratic logistic regression. It cannot have more than one hill.

CLOP only does local optimization. That is to say it may get stuck in a local optimum. But for the situations I met in practice, it is not a problem at all. Most of my use for CLOP was tuning coefficients of the UCT-like formula of Crazy Stone, and I don't believe there is any local optima there. Same for tuning things such as piece values or time-control coefficients. I suppose it is the case for many parameters.

Rémi
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: CLOP slides

Post by Rémi Coulom »

lucasart wrote:In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.

So for example the value of a Knight:
- clearly the true function has one hill (unless there's a strange bug in your program)
- but the estimated function over a finite sample might not !?

I wonder how Remi handled that case in practice.
Since I do quadratic regression, the estimated function cannot have more than one hill.
I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
I did not expect this could impress people that much :-)

Rémi
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CLOP slides

Post by Laskos »

Rémi Coulom wrote:
lucasart wrote:In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.

So for example the value of a Knight:
- clearly the true function has one hill (unless there's a strange bug in your program)
- but the estimated function over a finite sample might not !?

I wonder how Remi handled that case in practice.
Since I do quadratic regression, the estimated function cannot have more than one hill.
I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
I did not expect this could impress people that much :-)

Rémi
Sorry, I still miss something. The data we get are for noisy observation. The noise can be estimated, but the quadratic regression you do is based on this noisy data, not on the data from the theoretical noiseless curve. Say, the "real" or theoretical noiseless curve has several extrema on that interval. First, a quadratic regression even on noiseless data would possibly fail. Second, we actually get a lot of noise, not this nicely oscillating curve. Doing the quadratic regression on this noisy data will succeed only if the noiseless curve itself has only one extremum.

I mean, in short, a broad local maximum would have a greater impact on the regression than a global sharp maximum, as far as the quadratic regression goes on noisy data. Shouldn't, in this case, the data collection be adaptive (gather more points towards the extremes)? From the graphs, it seemed to me as an uniform collection, but maybe I miss something, I just saw briefly the slides.

Kai
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CLOP slides

Post by Laskos »

Laskos wrote:
Rémi Coulom wrote:
lucasart wrote:In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.

So for example the value of a Knight:
- clearly the true function has one hill (unless there's a strange bug in your program)
- but the estimated function over a finite sample might not !?

I wonder how Remi handled that case in practice.
Since I do quadratic regression, the estimated function cannot have more than one hill.
I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
I did not expect this could impress people that much :-)

Rémi
Sorry, I still miss something. The data we get are for noisy observation. The noise can be estimated, but the quadratic regression you do is based on this noisy data, not on the data from the theoretical noiseless curve. Say, the "real" or theoretical noiseless curve has several extrema on that interval. First, a quadratic regression even on noiseless data would possibly fail. Second, we actually get a lot of noise, not this nicely oscillating curve. Doing the quadratic regression on this noisy data will succeed only if the noiseless curve itself has only one extremum.

I mean, in short, a broad local maximum would have a greater impact on the regression than a global sharp maximum, as far as the quadratic regression goes on noisy data. Shouldn't, in this case, the data collection be adaptive (gather more points towards the extremes)? From the graphs, it seemed to me as an uniform collection, but maybe I miss something, I just saw briefly the slides.

Kai
To exemplify quickly, we have a theoretical curve to play with for finding a global maximum:

Image

Practically, we have noisy data, I added white Gaussian noise, and then performed the quadratic regression on this data:

Image

The fit clearly misses the sharp peak (global maximum) of the real function close to 0, and settles for the broad local maximum.

Maybe I miss something, but in many dimensions this was a huge problem to me, there are always a plethora of local extrema, I needed adaptive techniques for my problem (Monte Carlo).

Kai
Rémi Coulom
Posts: 438
Joined: Mon Apr 24, 2006 8:06 pm

Re: CLOP slides

Post by Rémi Coulom »

Laskos wrote:To exemplify quickly, we have a theoretical curve to play with for finding a global maximum:
Yes, CLOP would probably miss the global optimum with such a function.

But I don't worry about it, because I don't expect such function shapes will occur when tuning parameters of a game-playing program.

An algorithm such as UCT would find the global optimum. But such global optimization methods don't scale well to more than a few dimensions.

Rémi
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CLOP slides

Post by Laskos »

Rémi Coulom wrote:
Laskos wrote:To exemplify quickly, we have a theoretical curve to play with for finding a global maximum:
Yes, CLOP would probably miss the global optimum with such a function.

But I don't worry about it, because I don't expect such function shapes will occur when tuning parameters of a game-playing program.

An algorithm such as UCT would find the global optimum. But such global optimization methods don't scale well to more than a few dimensions.

Rémi
In several dimensions, are you doing the multi-dimensional quadratic regression? I don't read the CLOP documentation, but if it's doing this, the result will almost certainly settle on the broadest local extremum, not on the global one. In many dimensions there will almost always be a hell lot of local extrema, even if each 1-dimensional function is very smooth. UCT is not exactly what I meant, it scales badly with the number of dimensions, rather on something like VEGAS by Lepage, adaptive recursive method where the data points are concentrated in the regions close to exrtrema. It scales extremely well as the number of dimensions go, linearly or a power close to 1, if I remember well.

Kai