[R-sig-ME] Problems with parametric bootstrap, glmer

Fri Jan 27 12:58:55 CET 2012

Hi all,
A few times a have encountered a problem when I try to perform a 
parametric bootstrap to obtain a "corrected" p-value for a fixed effect 
using glmer, like (lmer(y~x+x1+x2+(1|ran)+(1|id),data,family="poisson"). 
I can run the model without errors but when a use simulate(model) as 
response (y) I sometimes get the classic " Cholmod warning 'not positive 
definite'". Even more peculiar is that when I use refit() (e.g. as 
described in the lme4 documentation under 'simulate-mer'), then this 
happens more often compared to if I use the update() function.

So I can get a p-value based on the LR  (anova(model1,model2)) but if I 
run a parametric bootstrap (n=1000), maybe only 600-700 runs converge. 
My models are not particularly overparametrized (maximum 16 fix coef, 2 
random, N=500), random effects are not close to zero and centering data 
does not help (actually, it made it worse in some cases...?). Has anyone 
else encountered this problem? I assume that it is wrong to use the 
600-700 successful bootstrap runs to calculate a parametric bootstrap 
P-value. Any alternative ideas how to proceed?

I tried to put together test code (see below), I hope it works. It 
should show that update() works better than refit() but I wasnt able to 
reproduce that update() can fail as well (maybe I didnt run it long 
enough though).

Cheers,

Gustaf Granath (PhD student)

##############CODE
set.seed(100)
dat<-expand.grid(y=c(1:5),site=as.factor(c(1:10)),tree=c("spruce","pine"))
i=9
for (i in 0:i) {
     dat$y[(1+i*5):(5+5*i)]<- rpois(5,(5+1*i))+round(rnorm(5,0,2))
     dat$y[(51+i*5):(50+(5+5*i))]<- rpois(5,(9+1*i))+round(rnorm(5,0,2))
}
dat$cov<-rnorm(100,100,10)
dat$id<- 1:nrow(dat)

#run models
m1<-lmer(y~tree+cov+(1|site)+(1|id),dat,family="poisson")
m0<-lmer(y~cov+(1|site)+(1|id),dat,family="poisson")

#function for refit()
pboot2 <- function(m0,m1) {
     s <- simulate(m0)
     L0 <- logLik(refit(m0,s),REML=F)
     L1 <- logLik(refit(m1,s),REML=F)
     c(2*(L1-L0))
}
dist1<-replicate(10,pboot2(m0,m1))

#use update() instead (no errors now)
dist2=numeric(10)
for (i in 1:10) {
     dat$s <- simulate(m0)$sim_1
     m0.refit <-    update(m0,s~.,data=dat)
     L0 <- logLik(m0.refit,REML=F)
     m1.refit <-    update(m1,s~.,data=dat)
     L1 <- logLik(m1.refit,REML=F)
     dist2[i]<- c(2*(L1-L0))
}

#compare refit() and update()
dist1 #with refit()
dist2 #with update()