[R-sig-ME] Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE

Tue Aug 27 11:40:07 CEST 2013

Dear all,
Further to my previous message it appears that there might be a problem with the case where you set use.u=T (to also simulate random effects) in bootMer.
This appears to be the case judging from the fact that when I try

boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=T,type="parametric")           # parametric bootstrapping
mean(boo2$t) # mean from bootstrap runs = 0.03165166

the mean from the parametric bootrap runs (0.03165166) is much lower than that obtained from the original model (0.095):
ICC1 <- ICCfun(gmm) 
ICC1				# intraclass correlation=0.095

I then tried coding colony as a fixed factor, but that fails altogether, irrespective of whether one sets use.u to TRUE or FALSE:
gmm<-glmer(trait~colony+(1|patriline), family = binomial(link="logit"), data = dataset )   
boo <- bootMer(gmm, ICCfun, nsim = 1000,use.u=T,type="parametric") 	# parametric bootstrapping
#Warning message:
#In bootMer(gmm, ICCfun, nsim = 1000, use.u = T, type = "parametric") :
#some bootstrap runs failed (1000/1000)

gmm<-glmer(trait~colony+(1|patriline), family = binomial(link="logit"), data = dataset )   
boo <- bootMer(gmm, ICCfun, nsim = 1000,use.u=F,type="parametric") 	# parametric bootstrapping
#Warning message:
#In bootMer(gmm, ICCfun, nsim = 1000, use.u = F, type = "parametric") :
#some bootstrap runs failed (1000/1000)

Has anyone else noticed any odd behaviour with bootMer, particularly for the case where use.u is set to TRUE?
Or any thoughts what I might be doing wrong?

Cheers,
Tom

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Tom Wenseleers
Sent: 26 August 2013 19:03
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE

Dear all,
I have a hierarchically nested dataset with individuals nested in patrilines (genetic lineages derived from the same father) and patrilines nested within colonies (family groups) and a binary dependent variable what was measured for all individuals. I am now interested in calculating the within-patriline intraclass correlation (=among patriline variance / (among patriline+among colony variance+error variance) as a way to calculate heritability (since I am working with haplodiploid organisms, the heritability of my trait is given as twice this intraclass correlation, Liu and Smith 2000).

My data is
install_github("lme4",user="lme4")  # latest development version
library(lme4)
library(boot)
library(pbkrtest)
dataset<-read.table("http://www.kuleuven.be/bio/ento/temp/data.txt",header=TRUE)
dataset$colony=as.factor(dataset$colony) # factor colony with 2 levels
dataset$patriline=as.factor(dataset$patriline) # factor patriline with ca 10 levels per colony; patrilines all have unique numbers

I then fitted the hierarchically nested binomial mixed model:
gmm<-glmer(trait~(1|colony/patriline), family = binomial(link="logit"), data = dataset )

And estimated the intraclass correlation using:
ICCfun <- function(.) { vcor=VarCorr(.)  # variance components (standard deviations and correlations)
                            bpatrvar=attr(vcor$"patriline:colony","stddev")^2  #  among patriline(colony) variance
                            bcolvar=attr(vcor$colony,"stddev")^2  # among colony variance
                            if (family(gmm)$link=="probit") errvar=1 else errvar=(pi^2)/3 # error variance is (pi^2)/3 for binomial logit model, and 1 for binomial probit model (Snijders & Bosker 1999)
                            bpatrvar/(bpatrvar+bcolvar+errvar) # intraclass correlation }

ICC1 <- ICCfun(gmm)
ICC1
# intraclass correlation=0.095

My first question is whether this model would be appropriate for this dataset? Ideally I would like to have a model that allowed for the within-patriline correlations to be different for the two different colonies, but the model above would not allow this, right? (The intraclass correlation in my case could potentially be different depending on whether an allelic variant that caused variation in the trait was inherited from the mothers' side or from the fathers' side, as in the first case the intraclass correlation would be zero whereas in the second case it would be 1 if heritability was 1).
If I wanted to allow different intraclass correlations per colony, would the correct model and code be the following? :
gmm<-glmer(trait~(1|colony)+(0+1|colony/patriline), family = binomial(link="logit"), data = dataset )

With the average intraclass correlation then being estimated using:
ICCfun2 <- function(.) { vcor=VarCorr(.)  # variance components (standard deviations and correlations)
                            bpatrvar=attr(vcor$patriline,"stddev")^2  #  among patriline variances for the different colonies
                            bcolvar=attr(vcor$colony,"stddev")^2  # among colony variance
                            if (family(gmm)$link=="probit") errvar=1 else errvar=(pi^2)/3 # error variance is (pi^2)/3 for binomial logit model, and 1 for binomial probit model (Snijders & Bosker 1999)
                            mean(bpatrvar/(bpatrvar+bcolvar+errvar)) # average intraclass correlation }

ICC2 <- ICCfun2(gmm)
ICC2
# intraclass correlation=0.095

Or would I need a generalized estimating equation for this, or nlme? If so, what would be the correct specification for such a model?

Also, am I correct that the significance of among-patriline variation could be tested using parametric boostrapping using PBmodcomp, and that if I wanted to treat colony and patriline as fixed that this would be done by
largem <- glm(trait~colony+patriline, family = binomial(link="logit"), data = dataset )     # note: patriline is in fact colony:patriline since I used unique codes for the patriline  levels
smallm <- glm(trait~colony, family = binomial(link="logit"), data = dataset ) PBmodcomp(largem, smallm,nsim = 1000) #Parametric bootstrap test; time: 111.09 sec; samples: 1000 extremes: 1; #large : trait ~ colony + patriline #small : trait ~ colony
#                            stat        df            p.value
#             LRT         46.531 20            0.0006811 ***
#             PBtest    46.531                  0.0019980 **
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (this p value I think is correct, given that it closely approximates the one I get based on a permutation test in which I randomly permute data within colonies)

Whereas if I wanted to treat colony and patriline as nested random factors, that this would be done by largem <- glmer(trait~(1|colony/patriline), family = binomial(link="logit"), data = dataset ) smallm <- glmer(trait~(1|colony), family = binomial(link="logit"), data = dataset ) PBmodcomp(largem, smallm,nsim = 1000) #Parametric bootstrap test; time: 403.79 sec; samples: 1000 extremes: 17; #Requested samples: 1000 Used samples: 599 Extremes: 17 #large : trait ~ (1 | colony/patriline) #small : trait ~ (1 | colony)
#             stat        df            p.value
#LRT       3.354     1             0.06704 .
#PBtest 3.354                     0.03000 *   #---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Am I correct to assume that the reduced significance here is caused here by considering the extra uncertainty of which colonies and patrilines were sampled?

Finally, if I wanted to report 95% confidence bounds (1 sided confidence intervals) on the intraclass correlation, would it be OK to do this using parametric bootstrapping using gmm<-glmer(trait~(1|colony/patriline), family = binomial(link="logit"), data = dataset )
boo <- bootMer(gmm, ICCfun, nsim = 500,use.u=F,type="parametric")             # parametric bootstrapping
quantile(boo$t,c(0.05,0.95)) # 95% lower and upper confidence bounds (i.e. 1 sided confidence limits) calculated using percentile bootstrap
#             5%                         95%
#             1.395817e-10      3.602247e-01
In which case the p value (chance of obtaining a zero intraclass correlation by chance) would be given by
mean(boo$t<=0) # p level = probability of obtaining a zero intraclass correlation by chance # 0.03264515

Using the argument use.u=F this p-level of 0.03 seems to match the one I obtained above using PBmodcomp and the comparison of the two mixed models with or without patriline included. However, if I use use.u=T (ie also simulating the random effects) I get gmm<-glmer(trait~(1|colony/patriline), family = binomial(link="logit"), data = dataset )
boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=T,type="parametric")           # parametric bootstrapping
quantile(boo2$t,c(0.05,0.95)) # 95% lower and upper confidence bounds (i.e. 1 sided confidence limits) calculated using percentile bootstrap
#             5%                         95%
#             0.0000000           0.1074677
mean(boo2$t<=0) # p level
# 0.13
And the intraclass correlation would be no longer significantly larger than zero. Does anyone happen to know why I get this discrepancy with the approach used above using PBmodcomp and comparing the two mixed models, and which would be most correct?

Also, would anyone happen to know how I could calculate 95% confidence limits on the ICC if I wanted to consider colony and patriline both as fixed factors?

Finally, in all of the analyses above I noticed that I can only obtain positive intraclass correlations. In other datasets though I would also be interested in considering possible negative intraclass correlations (which could in rare circumstances occur). If one estimates intraclass correlations based on variance components one can never get negative intraclass correlations though (since variances are always possible. Does anyone happen to know how one could perhaps estimate intraclass correlations directly (maybe using a GEE??), as opposed to from variance comonents, thereby also allowing negative ICCs?

Any advice would be much appreciated!

Cheers & many thanks in advance for your help!
Tom

_______________________________________________________________________________________

Prof. Tom Wenseleers
*      Lab. of Socioecology and Social Evolution
           Dept. of Biology
           Zoological Institute
           K.U.Leuven
           Naamsestraat 59, box 2466
           B-3000 Leuven
           Belgium
* +32 (0)16 32 39 64 / +32 (0)472 40 45 96
* tom.wenseleers at bio.kuleuven.be
http://bio.kuleuven.be/ento/wenseleers/twenseleers.htm

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models