[R-sig-ME] optimizers for mixed models

Ross Boylan ross at biostat.ucsf.edu
Thu Mar 14 18:55:37 CET 2013


On 3/14/2013 10:08 AM, Ben Bolker wrote:
> Ross Boylan <ross at ...> writes:
>
>> lme4 appears to use Nelder Mead for the final stage of its
>> optimization.  The archives show various concerns about stability, but
>> it looks as if even the prior optimizer was in the simplex class.
>    Depends how far you go back.  nlme and older versions of lme4 used
> nlminb.  This is derivative-based, but you'd have to look at the guts
> of nlminb and at the PORT documentation for a detailed description ...
>
>   http://netlib.bell-labs.com/cm/cs/cstr/153.pdf
>
> I think it's quasi-Newton ... digging around more in the R code
> guts shows it's NL2SOL -- you could dig some more on Netlib to
> find the details
>
>> This surprised me, since I have found Nelder Mead to be relatively slow
>> and imprecise, and in a more complex mixed model with sampling I've been
>> using optim with L-BFGS-B, which is quasi-newton.  One of the parameter
>> estimates kept creeping up over iterations; the bounds were necessary to
>> cut it off.
>    Not quite sure what you mean here.  I have personally found L-BFGS-B
> to be somewhat unstable in my experience ...  The optimplus package
> on R-forge gives a lot more optimizer choices ...
>    
I'll give that a try.
>> If anyone can give me more insight into why Nelder Mead is in use, and
>> perhaps whether I should be using it myself, I'd appreciate it.
>    Nelder-Mead *is* fairly slow, but it tends to be slightly more robust
> over a broad class of problems.  The current development version of lme4
> lets you pick your own optimizer, so you can make the speed vs
> robustness tradeoff however you like ...
>   
>> The current behavior, in which about 10% of the simulated datasets have
>> convergence problems, with one of the parameters heading toward an
>> implausible value (a correlation of 1.0, where atanh(rho) is the actual
>> parameter being estimated) is certainly not ideal.  I tried simulated
>> annealing to see if the algorithm had wandered mistakenly toward a local
>> rather than global optimum; there was no evidence that it had (the SA
>> ended up in the same neighborhood as BFGS, and when that point was the
>> starting value for BFGS it ended on essentially the same point as before).
>     Can you send me a simulated example that experiences this problem?
>
>    I can't tell at the moment whether you are using lme4 or your
> own hand-rolled solution to the problem.
I'm not sure you want it, since it does involve our hand-rolled 
solution.  You'd have to deal with the code as well as the data.
>
>    There are two possibilities here: (1) the best-fit solution really
> is in the limit as rho -> Inf / correlation -> 1.0; this is similar to
> the case where the MLE variance of a random effect is zero, even when
> the true model (from which the data were simulated) has a non-zero
> variance; (2) the best-fit solution is very close to +/- 1.0.
It got really, really close (e.g., tanh(7) -> rho= 0.9999983), so it's 
probably (1), even though that's hard to believe.  I wish I could 
identify exactly what about the data and the model are driving that.  
The data gave some of the standard mixed effects model (SAS glimmix) 
some trouble, but others (SAS nlmixed) were OK.
>
>    It's going to be tough either way, but in case #1 you would
> definitely be better off with a parameter space in which the
> singular cases corresponded to _simple_ box constraints on the
> parameters.
Optimization was originally via a custom optimizer using rho.  The 
custom optimizer did not incorporate bounds, and blew up when it got rho 
outside of [-1, 1].  So we switched to atanh(rho) as the target of 
optimization.  However, for some simulated datasets that failed to 
converge, as atanh(rho) marched slowly off toward infinity.  We switched 
to optim with bounds to cut that process off.

So perhaps we should go back to rho, but using optim or the other 
bounded optimizers you suggested.

So the fact that atanh(rho) is unbounded is a feature from some 
perspectives, but a bug from others.
>    For the Cholesky parameterization this is true
> for the variances (for a 2-by-2 matrix, theta_1=0 and theta_2=theta_3=0
> are the conditions for the variances to be 0), and (mostly) for
> the correlations -- the correlations go to +/- 1 when theta_3=0
> *or* when theta_2 -> infinity, but in the latter case the variance
> will blow up to infinity as well (so probably not a case worth
> worrying about).
>
Variances going to zero could also screw up estimates of the 
correlation, but the problem cases I've examined don't seem to have that 
particular problem.



More information about the R-sig-mixed-models mailing list