[R-sig-ME] Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE

Tue Aug 27 17:06:02 CEST 2013

Tom Wenseleers <Tom.Wenseleers at ...> writes:

> 
> Dear all,
> Further to my previous message it appears that there might be a
>  problem with the case where you set use.u=T
> (to also simulate random effects) in bootMer.
> This appears to be the case judging from the fact that when I try
> 
> boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=T,type="parametric")    
>       # parametric bootstrapping
> mean(boo2$t) # mean from bootstrap runs = 0.03165166
> 
> the mean from the parametric bootrap runs (0.03165166) 
> is much lower than that obtained from the original
> model (0.095):
> ICC1 <- ICCfun(gmm) 
> ICC1				# intraclass correlation=0.095
> 

> I then tried coding colony as a fixed factor, but that fails
> altogether, irrespective of whether one sets use.u to TRUE or FALSE:

  This failure occurs because when colony is a fixed factor, there
is no "colony" component, and no "patriline:colony" component.  See
below where I get approximately the same estimate of ICC by
computing the mean square difference among colonies from the fixed-effect
parameter.

   I don't know exactly what's going on with the bootstrap
distributions of ICC, but my guess is that it's *not* a bug in
the implementation of bootMer (although I could always be wrong).
I think it's some combination of bias in the estimate and
possibly Jensen's inequality in the computation of the ICC (i.e.
even if the estimates of the parameters are unbiased, the estimate
of the ICC won't necessarily be ...)

  I would be curious how the results from MCMCglmm, which should
have a different set of biases, or the results from a simulation
study with known ICC, would line up with this example ...

gmm <- glmer(trait~(1|colony/patriline), 
    family = binomial(link="logit"), data = dataset )
gmm2<- glmer(trait~colony+(1|patriline), 
    family = binomial(link="logit"), data = dataset )

## And estimated the intraclass correlation using:
ICCfun <- function(.,type=1) {
    vcor=VarCorr(.)  # variance components
    if (type==1) {
        bpatrvar=vcor[["patriline:colony"]]  #  among patriline(colony) variance
        bcolvar=vcor[["colony"]]  # among colony variance
    } else {
        bpatrvar=vcor[["patriline"]]
        ## difference between colonies, squared, / (n-1) [n=2]
        bcolvar <- (fixef(.)["colony"])^2  
    }
    if (family(gmm)$link=="probit") errvar=1 else errvar=(pi^2)/3
    ## # error variance is (pi^2)/3 for
    ## binomial logit model, and 1 for binomial probit model
    ##  (Snijders & Bosker 1999)
    c(bpatrvar/(bpatrvar+bcolvar+errvar)) # intraclass correlation
}

## wrapper
ICCfun2 <- function(.) ICCfun(.,type=2)

(ICC1 <- ICCfun(gmm))

set.seed(101)
boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=TRUE,
                .progress="txt",PBargs=list(style=3))
mean(boo2$t) # mean from bootstrap runs = 0.03165166

head(boo2d <- as.data.frame(boo2))
hist(boo2d[[1]],breaks=50,col="gray")

(ICC1 <- ICCfun(gmm) )
(ICC2 <- ICCfun2(gmm2))
boo3A <- bootMer(gmm2, ICCfun2, nsim = 500,use.u=TRUE)
hist(as.data.frame(boo3A)[[1]],breaks=50,col="gray")

boo3B <- bootMer(gmm2, ICCfun2, nsim = 500,use.u=FALSE)
hist(as.data.frame(boo3B)[[1]],breaks=50,col="gray")

dotplot(ranef(gmm2))
qqnorm(ranef(gmm2)[[1]][,1])

library(ggplot2)
d <- rbind(data.frame(w="use.u_TRUE",x=as.data.frame(boo3A)[[1]]),
           data.frame(w="use.u_FALSE",x=as.data.frame(boo3B)[[1]]))
ggplot(d,aes(x,fill=w))+geom_histogram(position="identity",alpha=0.4)+
    theme_set(theme_bw())