[R-sig-ME] lme4 convergence/numerical issue with large sample 2 level logit

Fri May 27 03:53:40 CEST 2011

hi all,
trying to fit either this model:

proto1 <- lmer(hibpe ~ age + b + b_age + h + h_age + female +
female_age + bxf + hxf + numwaves + dead + nodoctor + nohosp
	+(1|hhidpn) +(0 + age | hhidpn) ,  nAGQ =300, family=binomial,
data=hrs_data, na.action =na.omit,  verbose=TRUE)

or the same model allowing correlation btwn the REs:

proto2 <- lmer(hibpe ~ age + b + b_age + h + h_age + female +
female_age + bxf + hxf + numwaves + dead + nodoctor + nohosp
	+ (age | hhidpn) ,  nAGQ =300, family=binomial, data=hrs_data,
na.action =na.omit,  verbose=TRUE)

it is big data: ~50K obs & ~8400 clusters (subjects, in this case).

when i try to fit these models they run ~forever (>72 hours) with no
convergence. when i try a laplace approx for the quadrature (i.e.,
nAGQ=1) I get false convergence.

I can successfully fit a slightly less complex model in which
everything is the same except dropping the age RE:

proto2 <- lmer(hibpe ~ age + b + b_age + h + h_age + female +
female_age + bxf + hxf + numwaves + dead + nodoctor + nohosp
	+ (1 | hhidpn) ,  nAGQ =300, family=binomial, data=hrs_data,
na.action =na.omit,  verbose=TRUE)

this takes ~25 minutes to converge, and it gives an accurate solution
(cross-validated in stata via xtlogit and (for fixed effects) using
simple, single level logit). notably, i only get global optimum
convergence once I set nAGQ>200.

Obviously, the estimation of the 2nd RE is the issue and a very
difficult numerical problem given data size/structure. Stata
(xtmelogit) just conks out almost immediately and says that it doesn't
have the memory to handle the quadrature optimization (not in so many
words, but still), even with memory maxed out for the
program/hardware.

Any suggestions for shortcuts to make 'proto1' or 'proto2' estimate? i
am about to have to resort to BUGS, which will take eons, but at least
will eventually give decent estimates. Would strongly prefer sticking
with R though. Start values maybe? Would that help? Any code for
inputting these (i have never done it before in R)? nlme? hglm? I am
desperate/open to any suggestions.

Thx,
D

-- 
Daniel E. Adkins, PhD
Assistant Professor
Center for Biomarker Research and Personalized Medicine
School of Pharmacy
Virginia Commonwealth University
McGuire Hall, Room 216B
1112 East Clay Street
Richmond, VA 23298