CLOP slides
Moderators: Harvey Williamson, bob, hgm
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 436
 Joined: Mon Apr 24, 2006 6:06 pm
 Contact:
CLOP slides
There are some slides about CLOP:
http://remi.coulom.free.fr/CLOP/CLOPSlides.pdf
They contain a stepbystep animation of the algorithm that may be easier to understand than the paper.
Rémi
http://remi.coulom.free.fr/CLOP/CLOPSlides.pdf
They contain a stepbystep animation of the algorithm that may be easier to understand than the paper.
Rémi
Re: CLOP slides
Hello Remi,
Just wondering how you calculate the MAP (red line on page 6)
 Is it a NadarayaWatson kernel estimator ? Something similar ?
 What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?
Anyway thank you very much for CLOP, I'm sure it will help the community (in fact it already does)
Lucas
Just wondering how you calculate the MAP (red line on page 6)
 Is it a NadarayaWatson kernel estimator ? Something similar ?
 What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?
Anyway thank you very much for CLOP, I'm sure it will help the community (in fact it already does)
Lucas
Re: CLOP slides
I conjecture it will more or less randomly settle on one of the hills.What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?
Once CLOP has selected a hill, the weights of the parameter values corresponding to the other hills will be low so these parameter values will be selected with low probability. So I think they will not change the outcome of the regression computation....
Re: CLOP slides
In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.
So for example the value of a Knight:
 clearly the true function has one hill (unless there's a strange bug in your program)
 but the estimated function over a finite sample might not !?
I wonder how Remi handled that case in practice.
I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
So for example the value of a Knight:
 clearly the true function has one hill (unless there's a strange bug in your program)
 but the estimated function over a finite sample might not !?
I wonder how Remi handled that case in practice.
I trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem

 Posts: 436
 Joined: Mon Apr 24, 2006 6:06 pm
 Contact:
Re: CLOP slides
MAP stands for maximum a posteriori. The regression is quadratic logistic regression. It cannot have more than one hill.lucasart wrote:Hello Remi,
Just wondering how you calculate the MAP (red line on page 6)
 Is it a NadarayaWatson kernel estimator ? Something similar ?
 What happens if this MAP function is not strictly increasing then strictly decreasing ? What if there are several "hills" ?
Anyway thank you very much for CLOP, I'm sure it will help the community (in fact it already does)
Lucas
CLOP only does local optimization. That is to say it may get stuck in a local optimum. But for the situations I met in practice, it is not a problem at all. Most of my use for CLOP was tuning coefficients of the UCTlike formula of Crazy Stone, and I don't believe there is any local optima there. Same for tuning things such as piece values or timecontrol coefficients. I suppose it is the case for many parameters.
Rémi

 Posts: 436
 Joined: Mon Apr 24, 2006 6:06 pm
 Contact:
Re: CLOP slides
Since I do quadratic regression, the estimated function cannot have more than one hill.lucasart wrote:In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.
So for example the value of a Knight:
 clearly the true function has one hill (unless there's a strange bug in your program)
 but the estimated function over a finite sample might not !?
I wonder how Remi handled that case in practice.
I did not expect this could impress people that muchI trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
Rémi
Re: CLOP slides
Sorry, I still miss something. The data we get are for noisy observation. The noise can be estimated, but the quadratic regression you do is based on this noisy data, not on the data from the theoretical noiseless curve. Say, the "real" or theoretical noiseless curve has several extrema on that interval. First, a quadratic regression even on noiseless data would possibly fail. Second, we actually get a lot of noise, not this nicely oscillating curve. Doing the quadratic regression on this noisy data will succeed only if the noiseless curve itself has only one extremum.Rémi Coulom wrote:Since I do quadratic regression, the estimated function cannot have more than one hill.lucasart wrote:In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.
So for example the value of a Knight:
 clearly the true function has one hill (unless there's a strange bug in your program)
 but the estimated function over a finite sample might not !?
I wonder how Remi handled that case in practice.
I did not expect this could impress people that muchI trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
Rémi
I mean, in short, a broad local maximum would have a greater impact on the regression than a global sharp maximum, as far as the quadratic regression goes on noisy data. Shouldn't, in this case, the data collection be adaptive (gather more points towards the extremes)? From the graphs, it seemed to me as an uniform collection, but maybe I miss something, I just saw briefly the slides.
Kai
Re: CLOP slides
To exemplify quickly, we have a theoretical curve to play with for finding a global maximum:Laskos wrote:Sorry, I still miss something. The data we get are for noisy observation. The noise can be estimated, but the quadratic regression you do is based on this noisy data, not on the data from the theoretical noiseless curve. Say, the "real" or theoretical noiseless curve has several extrema on that interval. First, a quadratic regression even on noiseless data would possibly fail. Second, we actually get a lot of noise, not this nicely oscillating curve. Doing the quadratic regression on this noisy data will succeed only if the noiseless curve itself has only one extremum.Rémi Coulom wrote:Since I do quadratic regression, the estimated function cannot have more than one hill.lucasart wrote:In practice, we're talking about weights of features that are supposedly useful. If the feature is not significantly usefull, estimating it is doomed, no matter how you do it.
So for example the value of a Knight:
 clearly the true function has one hill (unless there's a strange bug in your program)
 but the estimated function over a finite sample might not !?
I wonder how Remi handled that case in practice.
I did not expect this could impress people that muchI trust a guy from the INRIA and the CNRS to do pretty smart things and to have found a clever solution to that problem
Rémi
I mean, in short, a broad local maximum would have a greater impact on the regression than a global sharp maximum, as far as the quadratic regression goes on noisy data. Shouldn't, in this case, the data collection be adaptive (gather more points towards the extremes)? From the graphs, it seemed to me as an uniform collection, but maybe I miss something, I just saw briefly the slides.
Kai
Practically, we have noisy data, I added white Gaussian noise, and then performed the quadratic regression on this data:
The fit clearly misses the sharp peak (global maximum) of the real function close to 0, and settles for the broad local maximum.
Maybe I miss something, but in many dimensions this was a huge problem to me, there are always a plethora of local extrema, I needed adaptive techniques for my problem (Monte Carlo).
Kai

 Posts: 436
 Joined: Mon Apr 24, 2006 6:06 pm
 Contact:
Re: CLOP slides
Yes, CLOP would probably miss the global optimum with such a function.Laskos wrote:To exemplify quickly, we have a theoretical curve to play with for finding a global maximum:
But I don't worry about it, because I don't expect such function shapes will occur when tuning parameters of a gameplaying program.
An algorithm such as UCT would find the global optimum. But such global optimization methods don't scale well to more than a few dimensions.
Rémi
Re: CLOP slides
In several dimensions, are you doing the multidimensional quadratic regression? I don't read the CLOP documentation, but if it's doing this, the result will almost certainly settle on the broadest local extremum, not on the global one. In many dimensions there will almost always be a hell lot of local extrema, even if each 1dimensional function is very smooth. UCT is not exactly what I meant, it scales badly with the number of dimensions, rather on something like VEGAS by Lepage, adaptive recursive method where the data points are concentrated in the regions close to exrtrema. It scales extremely well as the number of dimensions go, linearly or a power close to 1, if I remember well.Rémi Coulom wrote:Yes, CLOP would probably miss the global optimum with such a function.Laskos wrote:To exemplify quickly, we have a theoretical curve to play with for finding a global maximum:
But I don't worry about it, because I don't expect such function shapes will occur when tuning parameters of a gameplaying program.
An algorithm such as UCT would find the global optimum. But such global optimization methods don't scale well to more than a few dimensions.
Rémi
Kai