[R-sig-ME] glmer takes long time even after restricting iterations

Fri Sep 5 23:34:11 CEST 2014

On 14-09-05 05:05 PM, Douglas Bates wrote:
> I will take it as a compliment that you have sufficient confidence in our
> software to try to fit such a model.  :-)
> 
> Sadly, even with 400,000 observations it is highly unlikely you would be
> able to converge to parameter estimates for these modesl and even more
> unlikely that the estimates would be meaningful.
> 
> The optimization in glmer is different than the optimization in lmer.  For
> a linear mixed model the optimization is over the parameters in the
> relative covariance matrix only.  In this case it looks like there would be
> 15 such parameters.  The optimization problem involving even these
> parameters would be difficult, as it is likely that the solution will be on
> the boundary of the feasible region, representing a singular covariance
> matrix.  For glmer the optimization is much more difficult because it is
> over the concatenation of the fixed-effects parameters and the covariance
> parameters.  I lost track of what the number of fixed-effects parameters is
> but that number is large.  As you have seen the first model failed to
> converge in 100,000 iterations.  That is not encouraging.
> 
> Regarding the warning messages I will let Ben or Steve respond as they know
> more about the convergence checks than I do.  I believe those diagnostics
> involve creating a finite-difference approximation to the gradient vector
> and the Hessian matrix.  The approximation of the Hessian matrix at the
> optimum is probably where the time is being spent.
> 

  For speeding things up I would try setting nAGQ=0, and setting

control=glmerControl(check.conv.grad="ignore",check.conv.singular="ignore",
                     check.conv.hess="ignore")

-- this should deactivate the Hessian and gradient computations
(although at some point you will probably want to go back to testing these!)

  It looks like you have 79 fixed-effect parameters, plus what looks
like 10 random-effect parameters (this is a quick count, and assumes
that all your variables are numeric) -- this means that the Hessian
computation will have to do approximately 4000 (n*(n+1)/2) function
evaluations ...

   You can also try using the bobyqa implementation from nloptr, with
appropriate convergence settings, as described here:

https://github.com/lme4/lme4/issues/150#issuecomment-45813306

I believe these are the same settings that are implemented in ?nloptwrap.

> The best advice is to simplify the model.  You say that ALS is a binary
> variable, which means that even with 400,000 observations you have only
> 400,000 bits of information to which to fit the model.  That's not a lot.
>  A continuous response provides much more information per observation than
> a binary response.
> 
> Try to fit the fixed-effects only using glm.  I'm confident that most of
> the coefficients will not be significant.
> 
> On Fri, Sep 5, 2014 at 1:19 PM, Prachi Sanghavi <prachi.sanghavi at gmail.com>
> wrote:
> 
>> Hello!
>>
>> I have a fairly complex multilevel, multivariate logistic model that I am
>> trying to fit.  In both models below, the variables injury, AMI, stroke,
>> and resp are binary, as well as ALS and most other variables.  There are a
>> total of about 400,000 observations.  When I try to fit the model (Original
>> Model), I get several warnings, and I have pasted these below.  I am
>> largely concerned about number 4.  I think this problem is due to having
>> too many parameters in the model, and so I removed several interactions
>> that were unnecessary anyway (Modified Model).  I ran the Modified Model
>> with a fixed number of iterations, and it finished these quickly enough
>> (maybe 20 minutes?).  But then it took another 19 hours to actually stop
>> running, during which time I suspect R was doing various checks that led to
>> the warnings.  I'm not sure.  When the Modified Model finished, it produced
>> the warnings below.
>>
>> My biggest problem right now is the amount of time it takes for R to stop
>> running, even after restricting the number of iterations to 100.  Because
>> of this problem, it is impractical to try to figure out how to address the
>> warnings.
>>
>> Can somebody please help me figure out why R is taking so long, even after
>> it has finished the 100 iterations?  And what can I do about it?
>>
>> Thank you!!
>>
>> Prachi Sanghavi
>> Harvard University
>>
>>
>> Original Model and Warnings:
>>

>> AMI_county_final_2 <- glmer(ALS ~ -1 + AMI + (injury + stroke +
>> resp)*(FEMALE + AGE + MTUS_CNT + Asian + Black + Hispanic + Other +
>> Custodial + Nursing + Scene + WhiteHigh + BlackHigh + BlackLow +
>> IntegratedHigh + IntegratedLow + combinedscore + Year06 + Year07 + Year08 +
>> Year09 + Year10 + Metro + Per_College_Plus + Per_Gen_Prac + Any_MedSchlAff
>> + Any_Trauma) + (-1 + injury + AMI + stroke + resp | fullcounty),
>> family=binomial, data=rbind(IARS,IARS2), verbose=2,
>> control=glmerControl(optCtrl=list(maxfun=100)))
>>
>> Warning messages:
>> 1: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,
>> :
>>   failure to converge in 10000 evaluations
>> 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>>   Model failed to converge with max|grad| = 480.605 (tol = 0.001)
>> 3: In if (resHess$code != 0) { :
>>   the condition has length > 1 and only the first element will be used
>> 4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>>   Model is nearly unidentifiable: very large eigenvalue
>>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue
>> ratio
>>  - Rescale variables?
>>
>> Modified Model and Warnings:
>>
>> AMI_county_final_2 <- glmer(ALS ~ -1 + Year06 + Year07 + Year08 + Year09 +
>> Year10 + Metro + AMI + (injury + stroke + resp)*(FEMALE + AGE + MTUS_CNT +
>> Asian + Black + Hispanic + Other + Custodial + Nursing + Scene + WhiteHigh
>> + BlackHigh + BlackLow + IntegratedHigh + IntegratedLow + combinedscore) +
>> (-1 + injury + AMI + stroke + resp | fullcounty), family=binomial,
>> data=rbind(IARS,IARS2), verbose=2,
>> control=glmerControl(optCtrl=list(maxfun=100)))
>>
>> Warning messages:
>> 1: In commonArgs(par, fn, control, environment()) :
>>   maxfun < 10 * length(par)^2 is not recommended.
>> 2: In optwrap(optimizer, devfun, start, rho$lower, control = control,  :
>>   convergence code 1 from bobyqa: bobyqa -- maximum number of function
>> evaluations exceeded
>> 3: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,
>> :
>>   failure to converge in 100 evaluations
>> 4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>>   Model failed to converge with max|grad| = 15923.5 (tol = 0.001)
>> 5: In if (resHess$code != 0) { :
>>   the condition has length > 1 and only the first element will be used
>> 6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>>   Model is nearly unidentifiable: very large eigenvalue
>>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue
>> ratio
>>  - Rescale variables?
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>