[R-sig-ME] Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE

Fri Aug 30 17:29:50 CEST 2013

Dear Ben,
Many thanks for this - that's great! 
With the code you mention I get an NA though in the section that extracts the variance 
bcolvar <- (fixef(.)["colony"])^2  :

If I try
gmm2<- glmer(trait~colony+(1|patriline), 
    family = binomial(link="logit"), data = dataset )
vcor=VarCorr(gmm2)  # variance components
bpatrvar=vcor[["patriline"]]
bcolvar <- (fixef(gmm2)["colony"])^2    ## difference between colonies, squared, / (n-1) [n=2]

I get an NA for the bcolvar, since fixef(gmm2) returns me
> fixef(gmm2)
(Intercept)     colony2 
0.013433529 0.004875263

Does that mean that I would have to use bcolvar <- (fixef(gmm2)["colony2"])^2 ? Or did you use other types of contrasts or something like that?

Also a couple of remaining questions:
1) Is there any way / any model / other approach (eg using a GEE?) that would allow for negative intraclass correlations? (right now ICCs are always positive, but in rare cases one might expect negative intraclass correlations)

2) If I wanted to allow patriline effects to be different for different colonies, would the appropriate model then be
gmm<- glmer(trait~colony+(colony|patriline), family = binomial(link="logit"), data = dataset ) or would it be
gmm<- glmer(trait~colony+(0+colony|patriline), family = binomial(link="logit"), data = dataset )?

3) If I want to treat all my factors as fixed, ie using a nested model as in
gm <- glm(trait~colony+patriline, family = binomial(link="logit"), data = dataset )                      # patriline is in fact colony:patriline since all patrilines have unique codes
how would I extract the variance components then?
coef(gm) gives me
(Intercept)     colony2  patriline2  patriline3  patriline4  patriline5  patriline6  patriline7  patriline8  patriline9 patriline10 patriline11 
  0.4054651 -16.9715335  -0.3001046  -0.5007753  -1.7917595  -0.4054651   0.6931472  -0.6931472 -16.9715335   1.2039728  -1.5040774  -0.8109302 
patriline12 patriline13 patriline14 patriline15 patriline16 patriline17 patriline18 patriline19 patriline20 patriline21 patriline22 
  1.3862944  16.2295962  15.7776111  17.4823591  17.7700412  16.3429249  17.0768940  17.6646807  16.4860257  16.5660684          NA
But how could I get variance components from these?
And is there any analogous function as bootMer for glm's? I guess it is not possible to analyse fixed effect models with lme4 by any chance? (if so I could still use bootMer perhaps)
Or would I have to do parametric bootstrapping using simulate() myself then?

I'll also try with GenABEL::polygenic_hglm, as suggested, and with MCMCglmm. Also trying with nonparametric bootstrapping to get confidence limits (by resampling both colonies and patrilines - resampling individuals within patrilines doesn't seem to work though as that messes up the intraclass correlation), and have been simulating datasets with an ICC of 1, to see how the different methods perform - I'll report back here as soon as I have the results...

Cheers,
Tom

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Bolker
Sent: 27 August 2013 17:06
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE

Tom Wenseleers <Tom.Wenseleers at ...> writes:

> 
> Dear all,
> Further to my previous message it appears that there might be a  
> problem with the case where you set use.u=T (to also simulate random 
> effects) in bootMer.
> This appears to be the case judging from the fact that when I try
> 
> boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=T,type="parametric")    
>       # parametric bootstrapping
> mean(boo2$t) # mean from bootstrap runs = 0.03165166
> 
> the mean from the parametric bootrap runs (0.03165166) is much lower 
> than that obtained from the original model (0.095):
> ICC1 <- ICCfun(gmm) 
> ICC1				# intraclass correlation=0.095
> 

> I then tried coding colony as a fixed factor, but that fails 
> altogether, irrespective of whether one sets use.u to TRUE or FALSE:

  This failure occurs because when colony is a fixed factor, there is no "colony" component, and no "patriline:colony" component.  See below where I get approximately the same estimate of ICC by computing the mean square difference among colonies from the fixed-effect parameter.

   I don't know exactly what's going on with the bootstrap distributions of ICC, but my guess is that it's *not* a bug in the implementation of bootMer (although I could always be wrong).
I think it's some combination of bias in the estimate and possibly Jensen's inequality in the computation of the ICC (i.e.
even if the estimates of the parameters are unbiased, the estimate of the ICC won't necessarily be ...)

  I would be curious how the results from MCMCglmm, which should have a different set of biases, or the results from a simulation study with known ICC, would line up with this example ...

gmm <- glmer(trait~(1|colony/patriline), 
    family = binomial(link="logit"), data = dataset )
gmm2<- glmer(trait~colony+(1|patriline), 
    family = binomial(link="logit"), data = dataset )

## And estimated the intraclass correlation using:
ICCfun <- function(.,type=1) {
    vcor=VarCorr(.)  # variance components
    if (type==1) {
        bpatrvar=vcor[["patriline:colony"]]  #  among patriline(colony) variance
        bcolvar=vcor[["colony"]]  # among colony variance
    } else {
        bpatrvar=vcor[["patriline"]]
        ## difference between colonies, squared, / (n-1) [n=2]
        bcolvar <- (fixef(.)["colony"])^2  
    }
    if (family(gmm)$link=="probit") errvar=1 else errvar=(pi^2)/3
    ## # error variance is (pi^2)/3 for
    ## binomial logit model, and 1 for binomial probit model
    ##  (Snijders & Bosker 1999)
    c(bpatrvar/(bpatrvar+bcolvar+errvar)) # intraclass correlation }

## wrapper
ICCfun2 <- function(.) ICCfun(.,type=2)

(ICC1 <- ICCfun(gmm))

set.seed(101)
boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=TRUE,
                .progress="txt",PBargs=list(style=3))
mean(boo2$t) # mean from bootstrap runs = 0.03165166

head(boo2d <- as.data.frame(boo2))
hist(boo2d[[1]],breaks=50,col="gray")

(ICC1 <- ICCfun(gmm) )
(ICC2 <- ICCfun2(gmm2))
boo3A <- bootMer(gmm2, ICCfun2, nsim = 500,use.u=TRUE)
hist(as.data.frame(boo3A)[[1]],breaks=50,col="gray")

boo3B <- bootMer(gmm2, ICCfun2, nsim = 500,use.u=FALSE)
hist(as.data.frame(boo3B)[[1]],breaks=50,col="gray")

dotplot(ranef(gmm2))
qqnorm(ranef(gmm2)[[1]][,1])

library(ggplot2)
d <- rbind(data.frame(w="use.u_TRUE",x=as.data.frame(boo3A)[[1]]),
           data.frame(w="use.u_FALSE",x=as.data.frame(boo3B)[[1]]))
ggplot(d,aes(x,fill=w))+geom_histogram(position="identity",alpha=0.4)+
    theme_set(theme_bw())

_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models