[R-meta] Meta-analysis of intra class correlation coefficients

Wed Oct 16 10:59:15 CEST 2024

Dear Wolfgang,
Many thanks for the suggestions!
The r-Z transformation was easily applied, as suggested.

data$zi <- FisherZ(data$r)
data$vzi <- 1/(data$n-3/2)
egger_seMZ <- regtest(MZdata$zi, MZdata$vzi)
egger_seDZ <- regtest(DZdata$zi, DZdata$vzi)

...and improved the Egger regression
zrMZ_se_z	zrDZ_se_z
-3.968 		-5.519

Also, the funnel plots look better (see attached). We subsequently carried out the rma.mv as the ICC estimates partially represent related traits estimated within the same study (i.e. the estimates are related within rMZ and within rDZ), and of course rMZ and rDZ are nested within the same study and also sometimes the same cohort (for code see below). 

For the subsequent analysis, we back-transformed the Z-score intercept from rma.mv with

rMZ <- FisherZInv(model2$b[1])
rDZ <- FisherZInv(model1$b[1])

However, we are unsure as to how to best backtransform model2$se[1] and model1$se[1], i.e. the SEs of the rZ intercepts from rma.mv using the delta method as you recommended. Could you please advise? This will create the input for the subsequent heritability analyses.

Best wishes,
Beate

#################################################################
Study_ID - Cohort ID
ESID - actual Study ID 
MZDZ_factor - (MZ=1, DZ=0)
DZMZ_factor - (DZ=1,MZ=0)

For completeness, I add the code below that was adapted from the Austerberry study for z values:
model1 <- rma.mv(
  zi,
  vzi,
  random = list(~MZDZ_factor|Study_ID, ~ MZDZ_factor | ESID),
  data = data,
  mods = ~ MZDZ_factor,
  method = "REML",
  tdist = TRUE
)

model2 <- rma.mv(
  zi,
  vzi,
  random = list(~DZMZ_factor|Study_ID, ~ DZMZ_factor | ESID),
  data = data,
  mods = ~ DZMZ_factor,
  method = "REML",
  tdist = TRUE
)

Beate St Pourcain, PhD
Senior Investigator & Group Leader
Room A207
Max Planck Institute for Psycholinguistics | Wundtlaan 1 | 6525 XD Nijmegen | The Netherlands

@bstpourcain
Tel: +31 24 3521964
Fax: +31 24 3521213
ORCID: https://orcid.org/0000-0002-4680-3517
Web: https://www.mpi.nl/departments/language-and-genetics/projects/population-variation-and-human-communication/
Further affiliations with:
MRC Integrative Epidemiology Unit | University of Bristol | UK
Donders Institute for Brain, Cognition and Behaviour | Radboud University | The Netherlands

My working hours may not be your working hours. Please do not feel obligated to reply outside of your normal working schedule.

-----Original Message-----
From: Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer using maastrichtuniversity.nl> 
Sent: Monday, October 14, 2024 3:32 PM
To: St Pourcain, Beate <Beate.StPourcain using mpi.nl>; R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-project.org>
Subject: RE: Meta-analysis of intra class correlation coefficients

It is not ideal to meta-analyze raw correlations (or raw ICC values) if they are so large. In this case, I think the r-to-z transformation is highly advisable.

What you observe is the fact that the variance of a raw correlation coefficient depends on the true correlation. In particular, the large-sample estimate of the variance of a raw correlation coefficient (or ICC based on pairs) is (1-r^2)^2 / (n-1), where r is the observed correlation and n the sample size. Therefore, as r gets close to 1, the variance will get small, as you noted.

It is therefore no surprise that the Egger regression test is highly significant. Of course, this then says nothing about potential publication bias. There is an inherent link between the correlation and its varianace (and hence standard error). The same issue also arises with other outcome measures / effect sizes (e.g., standardized mean differences).

In the present case, the r-to-z transformation 'solves' this issue, since Var[z_r] =~ 1/(n-3) for Pearson correlations and Var[z_ICC] =~ 1/(n-3/2) for ICC(1) values.

I would then consider doing a bivariate meta-analysis of the z_ICC_mz and z_ICC_dz values. Since they are based on independent samples, their sampling errors are uncorrelated, but the random effects of the bivariate model then account for potential correlation in the underlying true (transformed) ICC values. This is analogous to what people do when pooling sensitivity and specificity values in a diagnostic test meta-analysis and also directly relates to the bivariate model discussed by van Houwelingen et al. (2002):

https://www.metafor-project.org/doku.php/analyses:vanhouwelingen2002

You can then estimate the heritability from this model by back-transforming the pooled estimate for the MZ twins and the pooled estimate from the DZ twins and taking twice the difference. The SE and hence CI for this can then be obtained via the delta method.

This raises interesting questions about the difference between between 'pooled(x) - pooled(y)' versus 'pooled(x - y)' -- there are papers in the literature that discuss this issue (not in the present context) -- but the latter option doesn't appear sensible to me here anyway.

Best,
Wolfgang

> -----Original Message-----
> From: St Pourcain, Beate <Beate.StPourcain using mpi.nl>
> Sent: Monday, October 14, 2024 13:49
> To: Viechtbauer, Wolfgang (NP) 
> <wolfgang.viechtbauer using maastrichtuniversity.nl>; R Special Interest 
> Group for Meta-Analysis <r-sig-meta-analysis using r-project.org>
> Subject: RE: Meta-analysis of intra class correlation coefficients
>
> ATTACHMENT(S) REMOVED: Funnel_DZ_ICC_simple_withN.png | 
> Funnel_MZ_ICC_simple_withN.png
>
> Dear Wolfgang,
> Many thanks, very helpful!
>
> We apply the escalc "COR" option to derive the variance for ICCs, in 
> this case twin correlations, where monozygotic twins (MZ, genetically 
> identical
> individuals) will always have higher correlations than dizygotic twins 
> (DZ, genetically related like typical siblings).
>
> Following the code from a published study (Austerberry, 
> https://pubmed.ncbi.nlm.nih.gov/35994288/), we report ICCs for MZ and 
> DZ twins and their variance v, derived with escalc "COR", for studies with different N:
>
> e.g. for MZ twins
> Study A: ICC=0.96, v= 3.2975E-06, N=1865 Study B: ICC=0.96, 
> v=3.86576E-05, N=160
>
> e.g. for DZ twins
> Study A: ICC=0.85, v=4.06365E-05, N=1896 Study C: ICC=0.85, 
> v=0.000496815, N=156
>
> This suggests (to us) that study size is less relevant than ICC 
> magnitude for variance estimations. In other words, irrespective of 
> study size, when meta- analysing MZ and DZ correlations separately, MZ 
> twins will have a smaller SE (higher Z score) than DZ twins. Following 
> the code from (Austerberry), running an Egger regression, we see that (across multiple studies):
> egger_seMZ <- regtest(MZdata$yi, MZdata$vi) egger_seDZ <- 
> regtest(DZdata$yi, DZdata$vi)
>
> rMZ_se_z        rDZ_se_z
> -22.497 0       -15.359
>
> I also attach the funnel plots, for MZ and DZ twins suggesting the 
> same, with a near linear relationship between ICC magnitude and 
> variance (code is below).  I indicated the sample size of each study 
> through the lightblue pallet ranging between N~70 (lightlblue) to 
> N~1800 (darkblue). I labelled each time the 10 smallest studies 
> indicating that small studies can be found across the entire ICC range 
> of 0-1, whereas the large studies are only found for ICC values of ~ 0.8-1.
>
> It is important for us to understand these variances of the ICC 
> values/ twin correlations, as (twice) the difference of rMZ - rDZ, 
> here based on meta- analysed ICCs, can be used to estimate the 
> heritability of a trait (see also Austerberry).
>
> I am not aware of that variance-stabilizing transformations were 
> applied, but thank you for pointing this out.
>
> Many of the ICC estimates reflect multiple traits from the same 
> studies. Should this be already taken into account during the variance 
> calculation, or is it sufficient to indicate this during the meta-analysis of ICCs values?
>
> Thanks a lot for your guidance,
> Beate
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~
> ~~~
> resMZ <- rma(yi, vi, data=MZdata)
>
> funnel(resMZ,
>        xlab = "rMZ",
>        digits = c(2,2),
>        col = MZdata$n.col,
>        slab = MZdata$slab,
>        cex=0.4,
>        label = 10,
>        main = "")
>
> resDZ <- rma(yi, vi, data=DZdata)
> funnel(resDZ,
>        xlab = "rDZ",
>        digits = c(2,2),
>        col = DZdata$n.col,
>        slab = DZdata$slab,
>        cex=0.4,
>        label = 10,
>        main = "")
>
> Beate St Pourcain, PhD
> Senior Investigator & Group Leader
> Room A207
> Max Planck Institute for Psycholinguistics | Wundtlaan 1 | 6525 XD 
> Nijmegen | The Netherlands
>
> @bstpourcain
> Tel: +31 24 3521964
> Fax: +31 24 3521213
> ORCID: https://orcid.org/0000-0002-4680-3517
> Web: 
> https://www.mpi.nl/departments/language-and-genetics/projects/populati
> on-
> variation-and-human-communication/
> Further affiliations with:
> MRC Integrative Epidemiology Unit | University of Bristol | UK Donders 
> Institute for Brain, Cognition and Behaviour | Radboud University | 
> The Netherlands
>
> -----Original Message-----
> From: Viechtbauer, Wolfgang (NP)
> <mailto:wolfgang.viechtbauer using maastrichtuniversity.nl>
> Sent: Saturday, October 12, 2024 12:40 PM
> To: R Special Interest Group for Meta-Analysis 
> <mailto:r-sig-meta-analysis using r- project.org>
> Cc: St Pourcain, Beate <mailto:Beate.StPourcain using mpi.nl>
> Subject: RE: Meta-analysis of intra class correlation coefficients
>
> Dear Beate,
>
> I assume you are talking about ICC(1) values. We did a meta-analysis 
> of ICC(1) values here:
>
> Nicolaï, S. P. A., Viechtbauer, W., Kruidenier, L. M., Candel, M. J. 
> J. M., Prins, M. H., & Teijink, J. A. W. (2009). Reliability of 
> treadmill testing in peripheral arterial disease: A meta-regression 
> analysis. Journal of Vascular Surgery, 50(2), 322-329. 
> https://doi.org/10.1016/j.jvs.2009.01.042
>
> For ICC(1) values, one can apply a variance-stabilizing transformation with:
>
> y = 1/2 * log((1 + (m-1)*icc) / (1 - icc))
>
> where 'm' is the number of measurement occasions and 'n' is the number 
> of participants. The large-sample variance is then:
>
> Var[y] = m / (2*(m-1)*(n-2)).
>
> This goes back to Fisher (1925; Statistical methods for research workers).
>
> In your application (where you dealing with pairs), n is the number of 
> pairs and m is 2. In that case, you can treat ICC(1) values like regular correlations.
> However, if you do apply the r-to-z transformation, then Fisher 
> suggests to use
> 1/(n-3/2) as the variance (instead of 1/(n-3) as we typically use for 
> r-to-z transformed Pearson product-moment correlation coefficients) 
> and simulation studies I have done confirm this.
>
> Best,
> Wolfgang
>
> --
> Wolfgang Viechtbauer, PhD, Statistician | Department of Psychiatry and 
> Neuropsychology | Maastricht University | PO Box 616 (VIJV1) | 6200 MD 
> Maastricht, The Netherlands | +31(43)3884170 | https://www.wvbauer.com
>
> > -----Original Message-----
> > From: R-sig-meta-analysis 
> > <mailto:r-sig-meta-analysis-bounces using r-project.org>
> > On Behalf Of St Pourcain, Beate via R-sig-meta-analysis
> > Sent: Saturday, October 12, 2024 10:11
> > To: mailto:r-sig-meta-analysis using r-project.org
> > Cc: St Pourcain, Beate <mailto:Beate.StPourcain using mpi.nl>
> > Subject: [R-meta] Meta-analysis of intra class correlation 
> > coefficients
> >
> > Dear R-sig group,
> >
> > We would like to carry out a meta-analysis of intra class 
> > correlation
> > (ICC) coefficients (i.e. the correlation between pairs of 
> > individuals, for the same trait). For this reason, we aim to derive 
> > the variance of ICCs from published studies, without access to 
> > individual data, thus,
> preventing bootstrapping.
> > Could anyone advise us what the options are to derive the variance 
> > for ICCs using existing R packages?
> >
> > We were looking into the escalc function from the metafor package, 
> > but could not find options for ICCs, but options for Pearson 
> > product-moment correlations ("COR" option).
> >
> > We have seen research proposing to apply the escalc "COR" option 
> > also to derive the variance for ICCs. However, we are unsure about 
> > this, as we may get undesirable properties for the derived variance, 
> > given that the definition of the correlations is different.
> >
> > Any advice is welcome!
> >
> > Best wishes,
> > Beate
> >
> > Beate St Pourcain, PhD
> > Senior Investigator & Group Leader
> > Room A207
> > Max Planck Institute for Psycholinguistics | Wundtlaan 1 | 6525 XD 
> > Nijmegen | The Netherlands
> >
> > @bstpourcain
> > Tel: +31 24 3521964
> > Fax: +31 24 3521213
> > ORCID: https://orcid.org/0000-0002-4680-3517
> > Web:
> > https://www.mpi.nl/departments/language-and-genetics/projects/popula
> > ti
> > on-
> > variation-and-human-communication/
> > Further affiliations with:
> > MRC Integrative Epidemiology Unit | University of Bristol | UK 
> > Donders Institute for Brain, Cognition and Behaviour | Radboud 
> > University | The Netherlands

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Funnel_MZ_ICC_simple_withN_rZ.png
Type: image/png
Size: 23528 bytes
Desc: Funnel_MZ_ICC_simple_withN_rZ.png
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20241016/3d61bf79/attachment-0002.png>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Funnel_DZ_ICC_simple_withN_rZ.png
Type: image/png
Size: 23678 bytes
Desc: Funnel_DZ_ICC_simple_withN_rZ.png
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20241016/3d61bf79/attachment-0003.png>