[R-sig-ME] nlme and NONMEM

Thu Nov 1 22:42:29 CET 2007

On 11/1/07, Rob Forsyth <r.j.forsyth at newcastle.ac.uk> wrote:
> I'd appreciate hearing from anyone (off list if you think it more
> appropriate) who can share their comparative experiences of non-
> linear mixed effects modelling with both nlme and NONMEM. The latter
> appears the traditional tool of choice particularly in pharmacology.
> Having built up some familiarity with nlme I am now collaborating (on
> a non-pharmacological project) with someone strongly encouraging me
> to move to NONMEM, although that clearly represents another
> considerable learning curve. The main argument in favour is the
> relative difficulty I have had in getting convergence with nlme
> models in my relatively sparse datasets particularly when (as in my
> case) I am interested in the random effects covariance matrix and
> wish to avoid having to coerce it using pdDiag().

> I note the following comment from Douglas Bates on the R-help archive

> > The nonlinear optimization codes used by S-PLUS and R are different.
> > There are advantages to the code used in R relative to the code used
> > in S-PLUS but there are also disadvantages. One of the disadvantages
> > is that the code in R will try very large steps during its initial
> > exploration phase then it gets trapped in remote regions of the
> > parameter space. For nlme this means that the estimate of the
> > variance-covariance matrix of the random effects becomes singular.

> > Recent versions of the nlme library for R have a subdirectory called
> > scripts that contains R scripts for the examples from each of the
> > chapters in our book. If you check them you will see that not all of
> > the nonlinear examples work in the R version of nlme. We plan to
> > modify the choice of starting estimates and the internal algorithms to
> > improve this but it is a long and laborious process. I ask for your
> > patience.

> Can Doug or anyone comment on whether the development work on
> lme4:::nlmer has included any steps in this direction or not?

Yes.

The algorithm in nlme alternates between solving a linear
mixed-effects problem to update estimates of the variance components
and solving a penalized nonlinear least squares problem to update
estimates of the fixed-effects parameters and our approximation to the
conditional distribution of the random effects.  This type of
algorithm that alternates between two conditional optimizations is
appealing because each of the sub-problems is much simpler than the
general problem.  However it may have poor convergence properties.  In
particular it may end up bouncing back and forth between two different
conditional optima.

Also, at the time we wrote nlme we tried to remove the constraints on
the variance components by transforming them away (In simple
situations we iterate on the logarithm of the relative variances of
the random effects.)  This works well except when the estimate of the
variance component is zero.  Trying to reach zero when iterating on
the logarithm scale can lead to very flat likelihood surfaces.

In the nlmer function I use the same parameterization of the
variance-covariance of the random effects as in lmer and use the
Laplace approximation to the log-likelihood.  Both of these changes
should provide more reliable convergence, although the nlmer code has
not been vetted to nearly the same extent as has the nlme code.  In
other words, I am confident that the algorithm is superior but the
implementation may still need some work.

Regarding NONMEM, I think the work Jose Pinheiro and I did on nlme and
my current work on lme4 is based on a different philosophy than is the
basis of NONMEM.  As I have mentioned on this and other forums (fora?)
I want to be confident that the results from the code that I write
actually do represent an optimum of the objective function (such as
the likelihood or log-likelihood).  Nonlinear mixed-effects models for
sparse data frequently end up being over-parameterized. In such cases
I view it as a feature and not a bug that nlme or nlmer will indicate
failure to converge.  They may also fail to converge when there is a
well-defined optimum.  That behavior is not a feature.

As I understand it from people who have used NONMEM (I once had access
to a copy of NONMEM but was never successful in getting it to run and
haven't tried since then) it will produce estimates just about every
time it is run.  Considering how ill-defined the parameter estimates
in some nonlinear mixed-effects model fits can be, I don't view this
as a feature.

Many people feel that statistical techniques and statistical software
are some sort of magic that can extract information from data, even
when the information is not there.  As I understand it from
conversations many years ago with Lewis Sheiner, his motivation in
developing NONMEM (with Stu Beal) was to be able to use routine
clinical data (such as the Quinidine data in the nlme package) to
estimate population pharmacokinetic parameters.

Routine clinical data like these are very sparse. In the Quinidine
example the majority of subjects have 1, 2 or 3 concentration
measurements

> table(table(subset(Quinidine, !is.na(Quinidine$conc))$Subject))

 1  2  3  4  5  6  7 10 11
46 33 31  9  3  8  2  1  3

and frequently these measurements are at widely spaced time points
relative to the dosing schedule.  Such cases contribute almost no
information to the parameter estimates, yet I have had pharmacologists
suggest to me that it would be wonderful to use study designs in which
each patient has only one concentration measurement and somehow the
magic of nonlinear mixed effects will conjure estimates from such
data.

The real world doesn't work like that.  If you have only one
observation per person it should make sense that no amount of
statistical magic will be able to separate the per-observation noise
from the per-person variability.

So when I am told that NONMEM converged to parameter estimates on a
problem where nlme or nlmer failed to converge I think (and sometimes
say) "You mean NONMEM *declared* convergence to a set of estimates".
Declaring convergence and converging can be different.