[R-sig-ME] Numerical methods used to compute correlation coefficients

Thu Apr 16 23:42:26 CEST 2009

On Thu, Apr 16, 2009 at 3:33 PM, H c <harlancampbell at gmail.com> wrote:
> Thank you for the detailed answer.  The reason I ask, is that I am
> attempting to fit a mixture of mixed models to a dataset using the standard
> EM approach.  Exceptionally, the mixed models have AR(1) correlation.  I
> have been able to do the math and arrive at maximum-likelihood solutions
> using an R implementation I wrote.  However, the step that numerically
> calculates the \phi correlation coefficient(currently using optimize()) is
> extremely computationally expensive and thus makes any implementation of the
> algorithm on a large dataset impossible.
> I have attempted a few things in order to speed up estimation including
> transforming the data and maximizing the profiled likelihood rather then the
> likelihood.  Unfortunately, no success.  I have also tried using nlminb()
> rather then optimize.  Still incredibly slow.
> Would you have any suggestions as to how one could numerically obtain
> maximum-weighted-likelihood estimates for correlation parameters in a
> reasonable amount of time?
> Any suggestions would be greatly appreciated.

I'm afraid I don't have any immediate suggestions.  As you have seen,
there is a great deal of difference between writing some equations and
getting a working algorithm that is both robust and effective.  I
spent a considerable amount of time working with the EM algorithm and
variations like the ECME algorithm or the calculation of gradients of
the log-likelihood function with respect to some of the parameters
(these calculations are related to the EM algorithm).  In fact, I
would say that some of the best mathematics I have done was in
obtaining "simple" expressions for these quantities.

Unfortunately I found that in the implementation the use of ECME or
the analytic gradient just slowed down the convergence on large
complex problems.  Once I discovered that I could work out why it
happened but it was a disappointment to do such wonderful mathematics
and discover that it was for naught.

>
>  Wed, Apr 15, 2009 at 3:30 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
>>
>> On Wed, Apr 15, 2009 at 1:14 PM, H c <harlancampbell at gmail.com> wrote:
>>
>> > Thank you.  This  is exactly what I wanted to know.  Perhaps one last
>> > question.  Is there any particular reason to prefer  this optimization
>> > routine over another, such as "optimize()"?
>>
>> optimize() only handles scalar problems and you must define an
>> interval in which the optimum is to be found.
>>
>> I have used other unconstrained optimizers such as nlm() and optim()
>> but had trouble with both of those.  The original version of the nlme
>> package for S and S-PLUS used the ms() optimizer in S which was based
>> on some Fortran code by David Gay.  Initially that code could not be
>> used in R because it was not released under an open source license.
>> Eventually it was released to the public domain and I wrote the nlminb
>> interface code for R, primarily so I could use it as the optimizer in
>> the nlme and lme4 packages.
>>
>> Even this code by David Gay is far from ideal.  The development
>> version of the lme4 package factors out the evaluation of the deviance
>> function or the REML criterion from the optimization of these, so that
>> those who are interested can experiment with different optimizers and
>> see what works best in their problems.
>>
>> > On Wed, Apr 15, 2009 at 12:59 PM, Douglas Bates <bates at stat.wisc.edu>
>> > wrote:
>> >>
>> >> On Tue, Apr 14, 2009 at 12:02 PM, H c <harlancampbell at gmail.com> wrote:
>> >> > Thank you for the quick response.
>> >> > when I refer to "correlation parameters", I mean the "generally small
>> >> > set of
>> >> > parameters \lambda" that parametrize the \Lambda_{i}
>> >> > Variance-Covariance
>> >> > matrix.
>> >> > For example, one has time series data such that every subject has
>> >> > been
>> >> > observed at 4 time points.  One wishes to model this using an AR(1)
>> >> > correlation structure within the mixed model.
>> >> > the AR(1) is parametrized by a fixed parameter, \phi :
>> >> > lme(y~X, random=~1|ID, method="ML", data=data,
>> >> > correlation=corAR1(0.5,form=~X,fixed=FALSE))
>> >>
>> >> > Since there is no closed form solution for the maximum-likelihood
>> >> > estimate
>> >> > of \phi.  what numerical methods are used to arrive at the given
>> >> > estimate?
>> >> > Hopefully this has clarified my question.
>> >>
>> >> In the sense that I know what you mean by the correlation parameters.
>> >> I'm not sure how to characterize the numerical methods other than to
>> >> say that the deviance (negative twice the log-likelihood) or the
>> >> corresponding version of the REML criterion is expressed with respect
>> >> to an unconstrained parameter and the resulting function optimized
>> >> using optimization software within R (nlminb).  You can check the
>> >> definition of the corAR family to determine exactly how the
>> >> parameterization is defined.
>> >>
>> >> > Thanks again,
>> >> > Harlan
>> >> >
>> >> > On Tue, Apr 14, 2009 at 12:37 PM, Douglas Bates <bates at stat.wisc.edu>
>> >> > wrote:
>> >> >>
>> >> >> On Tue, Apr 14, 2009 at 9:20 AM, H c <harlancampbell at gmail.com>
>> >> >> wrote:
>> >> >> > I have have already posted the following question:  What numerical
>> >> >> > methods
>> >> >> > are used in nlme to estimate correlation parameters?
>> >> >> > I was referred to the Pinheiro and Bates book.  Unfortunately, on
>> >> >> > p.
>> >> >> > 202,
>> >> >> > section 5.1.1, under the title "Estimation and Computational
>> >> >> > Methods",
>> >> >> > no
>> >> >> > description on a numerical method is provided.  (When the data is
>> >> >> > transformed to work with the profiled likelihood(y->ystar), one
>> >> >> > needs
>> >> >> > the
>> >> >> > parameters that define Lambda.  How are these parameters
>> >> >> > estimated?)
>> >> >>
>> >> >> I'm not sure what you mean by "correlation parameters".  If you mean
>> >> >> the correlation parameters in the unconditional distribution of the
>> >> >> random effects then those are estimated by maximum likelihood (ML)
>> >> >> or
>> >> >> residual maximum likelihood (REML).  The profiled deviance or the
>> >> >> profiled REML criterion is evaluated with respect to a transformed
>> >> >> set
>> >> >> of parameters and this value is optimized.
>> >> >
>> >> >
>> >
>> >
>
>