[R-sig-ME] Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE

Tue Aug 27 11:41:31 CEST 2013

Dear Ian,
Maybe not - but I also have problems when I code it as a fixed factor - see my other message...

Cheers,
Tom

-----Original Message-----
From: ian m s white [mailto:i.m.s.white at ed.ac.uk] 
Sent: 27 August 2013 10:50
To: Tom Wenseleers
Cc: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] Question on estimating intraclass correlation (and significance and confidence intervals) with binomial data using mixed model and/or GEE

Is it sensible to fit colony as a random effect when it has only two levels?

On 26 Aug 2013, at 18:02, Tom Wenseleers <Tom.Wenseleers at bio.kuleuven.be> wrote:

> Dear all,
> I have a hierarchically nested dataset with individuals nested in patrilines (genetic lineages derived from the same father) and patrilines nested within colonies (family groups) and a binary dependent variable what was measured for all individuals. I am now interested in calculating the within-patriline intraclass correlation (=among patriline variance / (among patriline+among colony variance+error variance) as a way to calculate heritability (since I am working with haplodiploid organisms, the heritability of my trait is given as twice this intraclass correlation, Liu and Smith 2000).
> 
> My data is
> install_github("lme4",user="lme4")  # latest development version
> library(lme4)
> library(boot)
> library(pbkrtest)
> dataset<-read.table("http://www.kuleuven.be/bio/ento/temp/data.txt",he
> ader=TRUE)
> dataset$colony=as.factor(dataset$colony) # factor colony with 2 levels
> dataset$patriline=as.factor(dataset$patriline) # factor patriline with 
> ca 10 levels per colony; patrilines all have unique numbers
> 
> I then fitted the hierarchically nested binomial mixed model:
> gmm<-glmer(trait~(1|colony/patriline), family = 
> binomial(link="logit"), data = dataset )
> 
> And estimated the intraclass correlation using:
> ICCfun <- function(.) { vcor=VarCorr(.)  # variance components (standard deviations and correlations)
>                            bpatrvar=attr(vcor$"patriline:colony","stddev")^2  #  among patriline(colony) variance
>                            bcolvar=attr(vcor$colony,"stddev")^2  # among colony variance
>                            if (family(gmm)$link=="probit") errvar=1 else errvar=(pi^2)/3 # error variance is (pi^2)/3 for binomial logit model, and 1 for binomial probit model (Snijders & Bosker 1999)
>                            bpatrvar/(bpatrvar+bcolvar+errvar) # 
> intraclass correlation }
> 
> ICC1 <- ICCfun(gmm)
> ICC1
> # intraclass correlation=0.095
> 
> My first question is whether this model would be appropriate for this dataset? Ideally I would like to have a model that allowed for the within-patriline correlations to be different for the two different colonies, but the model above would not allow this, right? (The intraclass correlation in my case could potentially be different depending on whether an allelic variant that caused variation in the trait was inherited from the mothers' side or from the fathers' side, as in the first case the intraclass correlation would be zero whereas in the second case it would be 1 if heritability was 1).
> If I wanted to allow different intraclass correlations per colony, would the correct model and code be the following? :
> gmm<-glmer(trait~(1|colony)+(0+1|colony/patriline), family = 
> binomial(link="logit"), data = dataset )
> 
> With the average intraclass correlation then being estimated using:
> ICCfun2 <- function(.) { vcor=VarCorr(.)  # variance components (standard deviations and correlations)
>                            bpatrvar=attr(vcor$patriline,"stddev")^2  #  among patriline variances for the different colonies
>                            bcolvar=attr(vcor$colony,"stddev")^2  # among colony variance
>                            if (family(gmm)$link=="probit") errvar=1 else errvar=(pi^2)/3 # error variance is (pi^2)/3 for binomial logit model, and 1 for binomial probit model (Snijders & Bosker 1999)
>                            mean(bpatrvar/(bpatrvar+bcolvar+errvar)) # 
> average intraclass correlation }
> 
> ICC2 <- ICCfun2(gmm)
> ICC2
> # intraclass correlation=0.095
> 
> Or would I need a generalized estimating equation for this, or nlme? If so, what would be the correct specification for such a model?
> 
> Also, am I correct that the significance of among-patriline variation could be tested using parametric boostrapping using PBmodcomp, and that if I wanted to treat colony and patriline as fixed that this would be done by
> largem <- glm(trait~colony+patriline, family = binomial(link="logit"), data = dataset )     # note: patriline is in fact colony:patriline since I used unique codes for the patriline  levels
> smallm <- glm(trait~colony, family = binomial(link="logit"), data = 
> dataset ) PBmodcomp(largem, smallm,nsim = 1000) #Parametric bootstrap 
> test; time: 111.09 sec; samples: 1000 extremes: 1; #large : trait ~ 
> colony + patriline #small : trait ~ colony
> #                            stat        df            p.value
> #             LRT         46.531 20            0.0006811 ***
> #             PBtest    46.531                  0.0019980 **
> # ---
> # Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (this 
> p value I think is correct, given that it closely approximates the one 
> I get based on a permutation test in which I randomly permute data 
> within colonies)
> 
> Whereas if I wanted to treat colony and patriline as nested random 
> factors, that this would be done by largem <- 
> glmer(trait~(1|colony/patriline), family = binomial(link="logit"), 
> data = dataset ) smallm <- glmer(trait~(1|colony), family = 
> binomial(link="logit"), data = dataset ) PBmodcomp(largem, smallm,nsim 
> = 1000) #Parametric bootstrap test; time: 403.79 sec; samples: 1000 
> extremes: 17; #Requested samples: 1000 Used samples: 599 Extremes: 17 
> #large : trait ~ (1 | colony/patriline) #small : trait ~ (1 | colony)
> #             stat        df            p.value
> #LRT       3.354     1             0.06704 .
> #PBtest 3.354                     0.03000 *   #---
> #Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> Am I correct to assume that the reduced significance here is caused here by considering the extra uncertainty of which colonies and patrilines were sampled?
> 
> Finally, if I wanted to report 95% confidence bounds (1 sided 
> confidence intervals) on the intraclass correlation, would it be OK to do this using parametric bootstrapping using gmm<-glmer(trait~(1|colony/patriline), family = binomial(link="logit"), data = dataset )
> boo <- bootMer(gmm, ICCfun, nsim = 500,use.u=F,type="parametric")             # parametric bootstrapping
> quantile(boo$t,c(0.05,0.95)) # 95% lower and upper confidence bounds (i.e. 1 sided confidence limits) calculated using percentile bootstrap
> #             5%                         95%
> #             1.395817e-10      3.602247e-01
> In which case the p value (chance of obtaining a zero intraclass 
> correlation by chance) would be given by
> mean(boo$t<=0) # p level = probability of obtaining a zero intraclass 
> correlation by chance # 0.03264515
> 
> Using the argument use.u=F this p-level of 0.03 seems to match the one 
> I obtained above using PBmodcomp and the comparison of the two mixed models with or without patriline included. However, if I use use.u=T (ie also simulating the random effects) I get gmm<-glmer(trait~(1|colony/patriline), family = binomial(link="logit"), data = dataset )
> boo2 <- bootMer(gmm, ICCfun, nsim = 500,use.u=T,type="parametric")           # parametric bootstrapping
> quantile(boo2$t,c(0.05,0.95)) # 95% lower and upper confidence bounds (i.e. 1 sided confidence limits) calculated using percentile bootstrap
> #             5%                         95%
> #             0.0000000           0.1074677
> mean(boo2$t<=0) # p level
> # 0.13
> And the intraclass correlation would be no longer significantly larger than zero. Does anyone happen to know why I get this discrepancy with the approach used above using PBmodcomp and comparing the two mixed models, and which would be most correct?
> 
> Also, would anyone happen to know how I could calculate 95% confidence limits on the ICC if I wanted to consider colony and patriline both as fixed factors?
> 
> Finally, in all of the analyses above I noticed that I can only obtain positive intraclass correlations. In other datasets though I would also be interested in considering possible negative intraclass correlations (which could in rare circumstances occur). If one estimates intraclass correlations based on variance components one can never get negative intraclass correlations though (since variances are always possible. Does anyone happen to know how one could perhaps estimate intraclass correlations directly (maybe using a GEE??), as opposed to from variance comonents, thereby also allowing negative ICCs?
> 
> Any advice would be much appreciated!
> 
> Cheers & many thanks in advance for your help!
> Tom
> 
> 
> ______________________________________________________________________
> _________________
> 
> Prof. Tom Wenseleers
> *      Lab. of Socioecology and Social Evolution
>           Dept. of Biology
>           Zoological Institute
>           K.U.Leuven
>           Naamsestraat 59, box 2466
>           B-3000 Leuven
>           Belgium
> * +32 (0)16 32 39 64 / +32 (0)472 40 45 96
> * tom.wenseleers at bio.kuleuven.be
> http://bio.kuleuven.be/ento/wenseleers/twenseleers.htm
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

--
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.