[R-sig-ME] lme4 convergence/numerical issue with large sample 2 level logit

Fri May 27 21:35:16 CEST 2011

doug,
thanks for your reply. i do tons of mixed modeling, so obviously am a
big fan of / highly dependent on your work.

results from:
>sessionInfo()

R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] MCMCglmm_2.12      corpcor_1.5.7      ape_2.7-1          coda_0.14-4
 [5] tensorA_0.36       lme4_0.999375-39   Matrix_0.999375-50 lattice_0.19-23
 [9] nlme_3.1-100       foreign_0.8-43

loaded via a namespace (and not attached):
[1] gee_4.13-16   grid_2.13.0   stats4_2.13.0 tools_2.13.0

i apologize for trying MCMCglmm, feel like i have lipstick on my collar (ha!).

Please do tell me how to install "lme4a", I have downloaded the zip
file for windows
(https://r-forge.r-project.org/bin/windows/contrib/latest/lme4a_0.999375-66.zip),
but haven't figured out how to load it into working session of R.

also, could you point me in the right direction for start values?
would it appreciably help optimization to use results from random
intercept only model as starts for fixed effects and random intercept
variance estimate?

also, what do you think of "glmmPQL"? does that yield decent estimates
for models of my level of size/complexity?

Thx for any help.
D

On Fri, May 27, 2011 at 2:14 PM,  <dmbates at gmail.com> wrote:
> On Thu, May 26, 2011 at 8:53 PM, Daniel Adkins <deadkins at vcu.edu> wrote:
>> hi all,
>> trying to fit either this model:
>>
>> proto1 <- lmer(hibpe ~ age + b + b_age + h + h_age + female +
>> female_age + bxf + hxf + numwaves + dead + nodoctor + nohosp
>>        +(1|hhidpn) +(0 + age | hhidpn) ,  nAGQ =300, family=binomial,
>> data=hrs_data, na.action =na.omit,  verbose=TRUE)
>>
>> or the same model allowing correlation btwn the REs:
>>
>> proto2 <- lmer(hibpe ~ age + b + b_age + h + h_age + female +
>> female_age + bxf + hxf + numwaves + dead + nodoctor + nohosp
>>        + (age | hhidpn) ,  nAGQ =300, family=binomial, data=hrs_data,
>> na.action =na.omit,  verbose=TRUE)
>>
>> it is big data: ~50K obs & ~8400 clusters (subjects, in this case).
>>
>> when i try to fit these models they run ~forever (>72 hours) with no
>> convergence. when i try a laplace approx for the quadrature (i.e.,
>> nAGQ=1) I get false convergence.
>>
>> I can successfully fit a slightly less complex model in which
>> everything is the same except dropping the age RE:
>>
>> proto2 <- lmer(hibpe ~ age + b + b_age + h + h_age + female +
>> female_age + bxf + hxf + numwaves + dead + nodoctor + nohosp
>>        + (1 | hhidpn) ,  nAGQ =300, family=binomial, data=hrs_data,
>> na.action =na.omit,  verbose=TRUE)
>>
>> this takes ~25 minutes to converge, and it gives an accurate solution
>> (cross-validated in stata via xtlogit and (for fixed effects) using
>> simple, single level logit). notably, i only get global optimum
>> convergence once I set nAGQ>200.
>
> nAGQ > 200 ?  That seems rather drastic when you have an average of 6
> observations per subject.
>
> The problem may be related to the optimizer.  The development version
> of the package, called lme4a, uses another optimizer that has, in most
> but not all cases, been faster and more reliable.  You may want to try
> that version of the package instead.  If you can tell us what platform
> you are using, say by providing the output of
>
> sessionInfo()
>
> we should be able to provide you with installation instructions for
> lme4a (although I think that installation on Mac OS X is still not
> straightforward).
>
>
>> Obviously, the estimation of the 2nd RE is the issue and a very
>> difficult numerical problem given data size/structure. Stata
>> (xtmelogit) just conks out almost immediately and says that it doesn't
>> have the memory to handle the quadrature optimization (not in so many
>> words, but still), even with memory maxed out for the
>> program/hardware.
>>
>> Any suggestions for shortcuts to make 'proto1' or 'proto2' estimate? i
>> am about to have to resort to BUGS, which will take eons, but at least
>> will eventually give decent estimates. Would strongly prefer sticking
>> with R though. Start values maybe? Would that help? Any code for
>> inputting these (i have never done it before in R)? nlme? hglm? I am
>> desperate/open to any suggestions.
>>
>> Thx,
>> D
>>
>>
>> --
>> Daniel E. Adkins, PhD
>> Assistant Professor
>> Center for Biomarker Research and Personalized Medicine
>> School of Pharmacy
>> Virginia Commonwealth University
>> McGuire Hall, Room 216B
>> 1112 East Clay Street
>> Richmond, VA 23298
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>

-- 
Daniel E. Adkins, PhD
Assistant Professor
Center for Biomarker Research and Personalized Medicine
School of Pharmacy
Virginia Commonwealth University
McGuire Hall, Room 216B
1112 East Clay Street
Richmond, VA 23298