[R-sig-ME] Convergence error for longitudinal glmm with simulated data

Fri Apr 23 05:50:54 CEST 2010

Hello R-ers,

I want to do power analysis via fake data simulation. Unfortunately, glmer returns convergence errors with a significant percentage of the data sets I generate. Unless, of course, I center one of my predictors, in which case everything works great. Except, I'd prefer not.

Let's get to the data:
N simulated subjects hypothetically quit smoking at week 0, split evenly into treatment and control groups. They report at which week they relapse and this is converted into a sequence of 0's and 1's (0 for every week they abstain and 1 for every week after they relapse) of length W, the length of the experiment. So the data frame looks like this (N=2, W=3):

ID Trt Week Smoke
1    0     0          0
1    0     1          0
1    0     2          1
1    0     3          1
2    1     0          0
2    1     1          0
2    1     2          0
2    1     3          0

A reasonable approximation of "time to relapse" is given by a Weibull distribution with scale 0.7, and shape 5 for the control group and 11.5-ish for the treatment group. Just for fun, I've added a little bit of individual variation to these. Note that this model doesn't allow people to "re-quit". 
I should also note that I've checked the data with large N, and it looks how it is supposed to.

The model we'd like to test is something like

glmer(Smoke ~ (Week-W)*Trt + (Week-W)^2 + ((Week-W) | ID), family=binomial)
(So we 'centered' Week at the final week, because I'm most interested in the treatment effect at the end of the experiment)

Enough chit-chat, here's my code: 

Using R 2.10.1 for OSX

library(lme4) # version 0.999375-32
library(boot) # I use the inv.logit function

time.quit<-function(P,E=0) { 				# returns number of weeks until relapse, E is an individual variation, sampled from a normal distribution
	return(floor(rweibull(1,0.7,(inv.logit(logit(P)+E))^-1)))
}

fake.data<-function(N,W,P,Q,S){ # generates the data, P = shape parameter for control group, Q = shape parameter for treatment group, S = standard error for individual variation
	interval<-0
	smoke<-matrix(nrow = N*(W+1), ncol=1)
	for(k in 0:1){
		for(i in 1:I(N/2)){
		error<-rnorm(1,0,S)		#ind variation
		time<-min(time.quit(ifelse(k==0,P,Q),error),W)		# returns number of weeks until relapse or study ends
		smoke[(1+interval):(W+1+interval),1]<-rep(0:1,c(1+time,W-time))               # turns time into sequence of 0's and 1's
		interval<-interval+W+1
		}
	}
	Data<-data.frame(cbind(rep(1:N,each=W+1),rep(0:1,each=I(N*(W+1)/2)),rep(0:W,times=N),smoke))  # creates most of the data frame
	colnames(Data)<-c("ID","Trt","Week","Smoke")
	rownames(Data)<-c(1:I(N*(W+1)))
	cWeek<-(Data$Week-mean(Data$Week))                     # creates a centered Week column
	eWeek<-(Data$Week-W)							# creates an end-centered Week column
	Data<-cbind(Data,cWeek,eWeek)
	return(Data)
}

fake<-fake.data(60,10,0.20,0.85,0.1); # 60 and 10 are chosen as they are neither too large nor too small
glmer(Smoke~cWeek*Trt + I(cWeek^2) + (0 + cWeek | ID),family=binomial,data=fake) # this works great
glmer(Smoke~cWeek*Trt + I(cWeek^2) + (cWeek | ID),family=binomial,data=fake) # this produces "In mer_finalize(ans) : false convergence (8)" much of the time
glmer(Smoke~eWeek*Trt + I(eWeek^2) + (0 + eWeek | ID),family=binomial,data=fake) # this produces "In mer_finalize(ans) : false convergence (8)" much of the time

For different models, as long as you have eWeek as a fixed effect or the intercept as a random effect, you see similar results.

So, what's up? 
In particular:
a) Why does a random intercept cause problems?
b) Why do I need to center the model? (I'm actually rather interested in this, as I assume it has something to do with the inner workings of lmer)
c) Am I doing anything screwy? (This is primarily an educational project for me; I'm newish to stats and I'm newer to glm's (even without the extra m))

Ben