[R-sig-ME] Convergence error for longitudinal glmm with simulated data
Ben Rosenfield
benjamin.rosenfield at gmail.com
Fri Apr 23 05:50:54 CEST 2010
Hello R-ers,
I want to do power analysis via fake data simulation. Unfortunately, glmer returns convergence errors with a significant percentage of the data sets I generate. Unless, of course, I center one of my predictors, in which case everything works great. Except, I'd prefer not.
Let's get to the data:
N simulated subjects hypothetically quit smoking at week 0, split evenly into treatment and control groups. They report at which week they relapse and this is converted into a sequence of 0's and 1's (0 for every week they abstain and 1 for every week after they relapse) of length W, the length of the experiment. So the data frame looks like this (N=2, W=3):
ID Trt Week Smoke
1 0 0 0
1 0 1 0
1 0 2 1
1 0 3 1
2 1 0 0
2 1 1 0
2 1 2 0
2 1 3 0
A reasonable approximation of "time to relapse" is given by a Weibull distribution with scale 0.7, and shape 5 for the control group and 11.5-ish for the treatment group. Just for fun, I've added a little bit of individual variation to these. Note that this model doesn't allow people to "re-quit".
I should also note that I've checked the data with large N, and it looks how it is supposed to.
The model we'd like to test is something like
glmer(Smoke ~ (Week-W)*Trt + (Week-W)^2 + ((Week-W) | ID), family=binomial)
(So we 'centered' Week at the final week, because I'm most interested in the treatment effect at the end of the experiment)
Enough chit-chat, here's my code:
Using R 2.10.1 for OSX
library(lme4) # version 0.999375-32
library(boot) # I use the inv.logit function
time.quit<-function(P,E=0) { # returns number of weeks until relapse, E is an individual variation, sampled from a normal distribution
return(floor(rweibull(1,0.7,(inv.logit(logit(P)+E))^-1)))
}
fake.data<-function(N,W,P,Q,S){ # generates the data, P = shape parameter for control group, Q = shape parameter for treatment group, S = standard error for individual variation
interval<-0
smoke<-matrix(nrow = N*(W+1), ncol=1)
for(k in 0:1){
for(i in 1:I(N/2)){
error<-rnorm(1,0,S) #ind variation
time<-min(time.quit(ifelse(k==0,P,Q),error),W) # returns number of weeks until relapse or study ends
smoke[(1+interval):(W+1+interval),1]<-rep(0:1,c(1+time,W-time)) # turns time into sequence of 0's and 1's
interval<-interval+W+1
}
}
Data<-data.frame(cbind(rep(1:N,each=W+1),rep(0:1,each=I(N*(W+1)/2)),rep(0:W,times=N),smoke)) # creates most of the data frame
colnames(Data)<-c("ID","Trt","Week","Smoke")
rownames(Data)<-c(1:I(N*(W+1)))
cWeek<-(Data$Week-mean(Data$Week)) # creates a centered Week column
eWeek<-(Data$Week-W) # creates an end-centered Week column
Data<-cbind(Data,cWeek,eWeek)
return(Data)
}
fake<-fake.data(60,10,0.20,0.85,0.1); # 60 and 10 are chosen as they are neither too large nor too small
glmer(Smoke~cWeek*Trt + I(cWeek^2) + (0 + cWeek | ID),family=binomial,data=fake) # this works great
glmer(Smoke~cWeek*Trt + I(cWeek^2) + (cWeek | ID),family=binomial,data=fake) # this produces "In mer_finalize(ans) : false convergence (8)" much of the time
glmer(Smoke~eWeek*Trt + I(eWeek^2) + (0 + eWeek | ID),family=binomial,data=fake) # this produces "In mer_finalize(ans) : false convergence (8)" much of the time
For different models, as long as you have eWeek as a fixed effect or the intercept as a random effect, you see similar results.
So, what's up?
In particular:
a) Why does a random intercept cause problems?
b) Why do I need to center the model? (I'm actually rather interested in this, as I assume it has something to do with the inner workings of lmer)
c) Am I doing anything screwy? (This is primarily an educational project for me; I'm newish to stats and I'm newer to glm's (even without the extra m))
Ben
More information about the R-sig-mixed-models
mailing list