Hi Roberto,IQ wrote:This is the point i was talking about. You already use MCMC, but only as a second step after you approximate with a gaussian. I suggested to skip this approximation step and integrate the step 1 maximization into a full bayesian framework. The fact that the quadratic forms sometimes do not have a maximum, would make it even more suited, as f.e. a metropolis hastings algorithm have a higher accept probability in regions with more probability mass. Usually this approximates the posterior and makes it easy to calculate expectations -> but as n gets larger it converges to maximum liklihood. This would get rid of the intermediate maximization step, in a robust way. And if n is not large enough, you could just replicate the data. But maybe its difficult to find a usefull prior for the quadratic forms?Rémi Coulom wrote: My method computes the maximum of the posterior with gradient descent (conjugate gradient, in fact), and it works very well. QLR also has an MCMC sampler for the posterior. Efficiency of MCMC sampling algorithms depend on having a reasonable approximation of the posterior distribution. By computing the maximum and the Hessian at the maximum I can get a good Gaussian approximation that is then used by MCMC. In fact, the posterior samples I plotted in those slides were obtained by MCMC IIRC.
i
Thanks for you advice, but I am afraid what you mention does not solve any of the difficult problems. I repeat that I have no difficulty with estimating the posterior and taking samples from it. The fact that some samples are not definite negative is not a major problem. I tried for a while to use a prior that ensures they are definite negative, but it does not really help, and is not very important.
The main difficulty for the stability and convergence of the algorithm is making the regression local. Since the function to be maximized is never quadratic in practice, quadratic regression has a bias. So, in order to converge to the maximum, it is necessary to progressively remove samples that are far from the maximum. This is difficult to do, because if remote samples are removed too fast, then the algorithm may become unstable, and, if they are not removed fast enough, the algorithm will be inefficient.
After a lot of complicated theoretical and empirical study, I have come up with a satisfactory solution now. So I don't really need any advice, since I know how to make everything work efficienctly and robustly. The last version of QLR I made public did not have all the problems solved (but almost). The current private one is OK.
There are other issues such as "how do I estimate the location of the maximum?", and "with which parameters do I play the next game". I also found some good solutions for that.
All these problems are very subtle and difficult. Even if I don't release the software to the public, I may write a paper about it some day.
Rémi