CLOP slides

Rémi Coulom · Post by **Rémi Coulom** » Thu Nov 03, 2011 7:42 pm

There are some slides about CLOP:
http://remi.coulom.free.fr/CLOP/CLOPSlides.pdf
They contain a step-by-step animation of the algorithm that may be easier to understand than the paper.

Rémi

lucasart · Post by **lucasart** » Fri Nov 04, 2011 7:02 pm

Hello Remi,

Just wondering how you calculate the MAP (red line on page 6)
- Is it a Nadaraya-Watson kernel estimator ? Something similar ?
- What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?

Anyway thank you very much for CLOP, I'm sure it will help the community (in fact it already does)

Lucas

Michel · Post by **Michel** » Fri Nov 04, 2011 7:58 pm

What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?

I conjecture it will more or less randomly settle on one of the hills.

Once CLOP has selected a hill, the weights of the parameter values corresponding to the other hills will be low so these parameter values will be selected with low probability. So I think they will not change the outcome of the regression computation....

lucasart · Post by **lucasart** » Fri Nov 04, 2011 8:03 pm

In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.

So for example the value of a Knight:
- clearly the true function has one hill (unless there's a strange bug in your program)
- but the estimated function over a finite sample might not !?

I wonder how Remi handled that case in practice.

I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem

Rémi Coulom · Post by **Rémi Coulom** » Fri Nov 04, 2011 9:27 pm

lucasart wrote:Hello Remi,

Just wondering how you calculate the MAP (red line on page 6)
- Is it a Nadaraya-Watson kernel estimator ? Something similar ?
- What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?

Anyway thank you very much for CLOP, I'm sure it will help the community (in fact it already does)

Lucas

MAP stands for maximum a posteriori. The regression is quadratic logistic regression. It cannot have more than one hill.

CLOP only does local optimization. That is to say it may get stuck in a local optimum. But for the situations I met in practice, it is not a problem at all. Most of my use for CLOP was tuning coefficients of the UCT-like formula of Crazy Stone, and I don't believe there is any local optima there. Same for tuning things such as piece values or time-control coefficients. I suppose it is the case for many parameters.

Rémi

Rémi Coulom · Post by **Rémi Coulom** » Fri Nov 04, 2011 9:33 pm

lucasart wrote:In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.

So for example the value of a Knight:
- clearly the true function has one hill (unless there's a strange bug in your program)
- but the estimated function over a finite sample might not !?

I wonder how Remi handled that case in practice.

Since I do quadratic regression, the estimated function cannot have more than one hill.

I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem

I did not expect this could impress people that much

Rémi

Laskos · Post by **Laskos** » Sun Nov 06, 2011 11:44 pm

Rémi Coulom wrote:
lucasart wrote:In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.

So for example the value of a Knight:
- clearly the true function has one hill (unless there's a strange bug in your program)
- but the estimated function over a finite sample might not !?

I wonder how Remi handled that case in practice.
Since I do quadratic regression, the estimated function cannot have more than one hill.

I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
I did not expect this could impress people that much

Rémi

Sorry, I still miss something. The data we get are for noisy observation. The noise can be estimated, but the quadratic regression you do is based on this noisy data, not on the data from the theoretical noiseless curve. Say, the "real" or theoretical noiseless curve has several extrema on that interval. First, a quadratic regression even on noiseless data would possibly fail. Second, we actually get a lot of noise, not this nicely oscillating curve. Doing the quadratic regression on this noisy data will succeed only if the noiseless curve itself has only one extremum.

I mean, in short, a broad local maximum would have a greater impact on the regression than a global sharp maximum, as far as the quadratic regression goes on noisy data. Shouldn't, in this case, the data collection be adaptive (gather more points towards the extremes)? From the graphs, it seemed to me as an uniform collection, but maybe I miss something, I just saw briefly the slides.

Kai

Laskos · Post by **Laskos** » Mon Nov 07, 2011 6:33 am

Laskos wrote:
Rémi Coulom wrote:
lucasart wrote:In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.

So for example the value of a Knight:
- clearly the true function has one hill (unless there's a strange bug in your program)
- but the estimated function over a finite sample might not !?

I wonder how Remi handled that case in practice.
Since I do quadratic regression, the estimated function cannot have more than one hill.

I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
I did not expect this could impress people that much

Rémi
Sorry, I still miss something. The data we get are for noisy observation. The noise can be estimated, but the quadratic regression you do is based on this noisy data, not on the data from the theoretical noiseless curve. Say, the "real" or theoretical noiseless curve has several extrema on that interval. First, a quadratic regression even on noiseless data would possibly fail. Second, we actually get a lot of noise, not this nicely oscillating curve. Doing the quadratic regression on this noisy data will succeed only if the noiseless curve itself has only one extremum.

I mean, in short, a broad local maximum would have a greater impact on the regression than a global sharp maximum, as far as the quadratic regression goes on noisy data. Shouldn't, in this case, the data collection be adaptive (gather more points towards the extremes)? From the graphs, it seemed to me as an uniform collection, but maybe I miss something, I just saw briefly the slides.

Kai

To exemplify quickly, we have a theoretical curve to play with for finding a global maximum:

Practically, we have noisy data, I added white Gaussian noise, and then performed the quadratic regression on this data:

The fit clearly misses the sharp peak (global maximum) of the real function close to 0, and settles for the broad local maximum.

Maybe I miss something, but in many dimensions this was a huge problem to me, there are always a plethora of local extrema, I needed adaptive techniques for my problem (Monte Carlo).

Kai

Rémi Coulom · Post by **Rémi Coulom** » Mon Nov 07, 2011 11:29 am

Laskos wrote:To exemplify quickly, we have a theoretical curve to play with for finding a global maximum:

Yes, CLOP would probably miss the global optimum with such a function.

But I don't worry about it, because I don't expect such function shapes will occur when tuning parameters of a game-playing program.

An algorithm such as UCT would find the global optimum. But such global optimization methods don't scale well to more than a few dimensions.

Rémi

Laskos · Post by **Laskos** » Mon Nov 07, 2011 6:23 pm

Rémi Coulom wrote:
Laskos wrote:To exemplify quickly, we have a theoretical curve to play with for finding a global maximum:
Yes, CLOP would probably miss the global optimum with such a function.

But I don't worry about it, because I don't expect such function shapes will occur when tuning parameters of a game-playing program.

An algorithm such as UCT would find the global optimum. But such global optimization methods don't scale well to more than a few dimensions.

Rémi

In several dimensions, are you doing the multi-dimensional quadratic regression? I don't read the CLOP documentation, but if it's doing this, the result will almost certainly settle on the broadest local extremum, not on the global one. In many dimensions there will almost always be a hell lot of local extrema, even if each 1-dimensional function is very smooth. UCT is not exactly what I meant, it scales badly with the number of dimensions, rather on something like VEGAS by Lepage, adaptive recursive method where the data points are concentrated in the regions close to exrtrema. It scales extremely well as the number of dimensions go, linearly or a power close to 1, if I remember well.

Kai

CLOP slides

CLOP slides

Re: CLOP slides

Re: CLOP slides

Re: CLOP slides

Re: CLOP slides

Re: CLOP slides

Re: CLOP slides

Re: CLOP slides

Re: CLOP slides

Re: CLOP slides