[R-sig-ME] optimizers for mixed models
Ben Bolker
bbolker at gmail.com
Thu Mar 14 18:08:18 CET 2013
Ross Boylan <ross at ...> writes:
>
> lme4 appears to use Nelder Mead for the final stage of its
> optimization. The archives show various concerns about stability, but
> it looks as if even the prior optimizer was in the simplex class.
Depends how far you go back. nlme and older versions of lme4 used
nlminb. This is derivative-based, but you'd have to look at the guts
of nlminb and at the PORT documentation for a detailed description ...
http://netlib.bell-labs.com/cm/cs/cstr/153.pdf
I think it's quasi-Newton ... digging around more in the R code
guts shows it's NL2SOL -- you could dig some more on Netlib to
find the details
> This surprised me, since I have found Nelder Mead to be relatively slow
> and imprecise, and in a more complex mixed model with sampling I've been
> using optim with L-BFGS-B, which is quasi-newton. One of the parameter
> estimates kept creeping up over iterations; the bounds were necessary to
> cut it off.
Not quite sure what you mean here. I have personally found L-BFGS-B
to be somewhat unstable in my experience ... The optimplus package
on R-forge gives a lot more optimizer choices ...
> If anyone can give me more insight into why Nelder Mead is in use, and
> perhaps whether I should be using it myself, I'd appreciate it.
Nelder-Mead *is* fairly slow, but it tends to be slightly more robust
over a broad class of problems. The current development version of lme4
lets you pick your own optimizer, so you can make the speed vs
robustness tradeoff however you like ...
> The current behavior, in which about 10% of the simulated datasets have
> convergence problems, with one of the parameters heading toward an
> implausible value (a correlation of 1.0, where atanh(rho) is the actual
> parameter being estimated) is certainly not ideal. I tried simulated
> annealing to see if the algorithm had wandered mistakenly toward a local
> rather than global optimum; there was no evidence that it had (the SA
> ended up in the same neighborhood as BFGS, and when that point was the
> starting value for BFGS it ended on essentially the same point as before).
Can you send me a simulated example that experiences this problem?
I can't tell at the moment whether you are using lme4 or your
own hand-rolled solution to the problem.
There are two possibilities here: (1) the best-fit solution really
is in the limit as rho -> Inf / correlation -> 1.0; this is similar to
the case where the MLE variance of a random effect is zero, even when
the true model (from which the data were simulated) has a non-zero
variance; (2) the best-fit solution is very close to +/- 1.0.
It's going to be tough either way, but in case #1 you would
definitely be better off with a parameter space in which the
singular cases corresponded to _simple_ box constraints on the
parameters. For the Cholesky parameterization this is true
for the variances (for a 2-by-2 matrix, theta_1=0 and theta_2=theta_3=0
are the conditions for the variances to be 0), and (mostly) for
the correlations -- the correlations go to +/- 1 when theta_3=0
*or* when theta_2 -> infinity, but in the latter case the variance
will blow up to infinity as well (so probably not a case worth
worrying about).
More information about the R-sig-mixed-models
mailing list