[R-sig-ME] Gastric emptying data: nlmer vs. nlme

Fri Oct 5 15:29:25 CEST 2007

On 10/5/07, Dieter Menne <dieter.menne at menne-biomed.de> wrote:
> Douglas,
>
> thanks a lot for your explanations.
>
> > The reason that the first model is so difficult to fit is because
> > there are 6 variance-covariance parameters and only 8 levels of
> > subj:treat.  You are trying to estimate too many variance-covariance
> > parameters from too few groups.  The likelihood surface will be very
> > flat and the parameter estimates will be ill-defined.
>
> I realized that the demo example was badly stripped down, but as you
> noted it was the idea to compare nlmer and nlme.
>
> >>>This means that the function value being optimized is not
> >> reproducible and that causes a lot of problems in a
> >> derivative-free optimization.
>
> Here I am confused. I first had explicitly added the gradient, until I realized
> that for this simple function selfStart() had produced it automatically. Or did
> I misunderstand "derivative-free"?

I may have been too terse in my explanation and left you confused
about the derivative I was referring to.  We want the maximum
likelihood estimates for the model but we can't evaluate the
log-likelihood analytically.  In nlmer it is the Laplace approximation
to the log-likelihood that is optimized (nlme uses a different
optimization method that Mary Lindstrom and I suggested and which is
closely related to the penalized quasi-likelihood, or PQL, method for
generalized linear mixed models).    This is a function of the
fixed-effects parameters and the parameters that define the
variance-covariance of the random effects.

Given values of these parameters we determine the conditional modes of
the random effects by optimizing a penalized nonlinear least squares
objective with respect to the random effects. For this "inner
optimization" we use the gradient of the nonlinear model with respect
to the parameters - the "gradient" attribute of the model function
evaluation.

Once we have the conditional modes we evaluate the Laplace
approximation and return that to the function that is doing the "outer
optimization".  We don't have a gradient for the log-likelihood so the
log-likelihood itself is optimized with a derivative-free method
subject to constraints on some of the parameters (the variances or
relative variances must be non-negative).

What I found was that I needed to restart the penalized nonlinear
least squares optimization with respect to the random effects (i.e.
the "inner optimization") from the same starting estimates for the
random effects - the zero vector - every time.  Otherwise the value of
the Laplace approximation to the log-likelihood is somewhat stochastic
and that throws off the "outer optimization".  It's a general problem
with optimizers - if you do not get a reproducible evaluation of the
objective function - the log-likelihood in this case - the optimizer
gets very confused.

While on the topic of the outer optimization and the inner
optimization - Spencer Graves raised the issue several months ago of
why go to all this trouble of the inner optimization.  That is, why
find the conditional modes of the random effects?  The answer is that
the inner optimization is easy and fast.  It can have a large number
of parameters (3 * n where n is the number of levels of the grouping
factor) but the penalized least squares objective is separable (each
group of 3 random effects determine only the predictions for
observations at one subj:treat level) and the penalty part of the
objective "regularizes" the optimization.

I hope this helps.

> ----
> EmptInit= function(mCall,LHS,data){ # dummy, not explicitly used
>    stop("Should not be called")
> }
> # Standard LinExp model for gastric emptying
> SSEmptLinExp=selfStart(~v0*(1+kappa*t/tempt)*exp(-t/tempt),
>    initial=EmptInit, parameters= c("v0","kappa","tempt"))
> SSEmptLinExp
> ----
>
> function (t, v0, kappa, tempt)
> {
>     .expr1 <- kappa * t
>     .expr3 <- 1 + .expr1/tempt
>     .expr4 <- v0 * .expr3
>     .expr7 <- exp(-t/tempt)
>     .expr13 <- tempt^2
>     .value <- .expr4 * .expr7
>     .grad <- array(0, c(length(.value), 3L), list(NULL, c("v0",
>         "kappa", "tempt")))
>     .grad[, "v0"] <- .expr3 * .expr7
>     .grad[, "kappa"] <- v0 * (t/tempt) * .expr7
>     .grad[, "tempt"] <- .expr4 * (.expr7 * (t/.expr13)) - v0 *
>         (.expr1/.expr13) * .expr7
>     attr(.value, "gradient") <- .grad
>     .value
> }
> <environment: 0x0383825c>
> attr(,"initial")
> function(mCall,LHS,data){ # dummy, not explicitly used
>   stop("Should not be called")
> }
> attr(,"pnames")
> [1] "v0"    "kappa" "tempt"
> attr(,"class")
> [1] "selfStart"
> >
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>