[R-meta] Meta-analysis of intra class correlation coefficients

Sat Oct 19 16:16:22 CEST 2024

Dear Beate

Somewhere buried in this thread Wolfgang said

==========================
This goes back to Fisher (1925; Statistical methods for research workers).

In your application (where you dealing with pairs), n is the number of 
pairs and m is 2. In that case, you can treat ICC(1) values like regular 
correlations. However, if you do apply the r-to-z transformation, then 
Fisher suggests to use 1/(n-3/2) as the variance (instead of 1/(n-3) as 
we typically use for r-to-z transformed Pearson product-moment 
correlation coefficients) and simulation studies I have done confirm this.
======================

Michael

On 18/10/2024 19:43, St Pourcain, Beate via R-sig-meta-analysis wrote:
> Dear Wolfgang,
> No worries, I am aware of the difference and fully agree with your comments. I was just surprised by the similarity in estimates and had hoped for an approximation that  might provide a quick workaround in the current situation. Thanks again for all your help, we will take it from here.
> 
> In case you would have (at some point) a reference for the variance of Z scores for ICCs as
> 
> 1/( n-3/2)
> 
> that would be great, no rush!
> 
> Enjoy your weekend,
> Beate
> 
> Beate St Pourcain, PhD
> Senior Investigator & Group Leader
> Room A207
> Max Planck Institute for Psycholinguistics | Wundtlaan 1 | 6525 XD Nijmegen | The Netherlands
> 
> 
> @bstpourcain
> Tel: +31 24 3521964
> Fax: +31 24 3521213
> ORCID: https://orcid.org/0000-0002-4680-3517
> Web: https://www.mpi.nl/departments/language-and-genetics/projects/population-variation-and-human-communication/
> Further affiliations with:
> MRC Integrative Epidemiology Unit | University of Bristol | UK
> Donders Institute for Brain, Cognition and Behaviour | Radboud University | The Netherlands
> 
> My working hours may not be your working hours. Please do not feel obligated to reply outside of your normal working schedule.
> 
> -----Original Message-----
> From: Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
> Sent: Friday, October 18, 2024 5:23 PM
> To: St Pourcain, Beate <Beate.StPourcain using mpi.nl>; R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-project.org>
> Subject: RE: Meta-analysis of intra class correlation coefficients
> 
> Dear Beate,
> 
> Just briefly before I start my weekend: You cannot apply the inverse Fisher transformation to vcov(model). To give a simple example to show that var(f(x)) != f(var(x)):
> 
> x <- rnorm(100, mean=100, sd=15)
> var(x)
> log(var(x))
> var(log(x))
> 
> The same then of course also applies to the SD:
> 
> sd(log(x))
> sqrt(log(var(x)))
> 
> Totally different. But the delta method gives:
> 
> msm::deltamethod(~ log(x1), mean=mean(x), cov=var(x))
> 
> which is quite close.
> 
> This aside, the reason why FisherZInv(vcov(model)) and vcov(model) are essentially the same is because for values close to 0, FisherZInv() essentially changes very little. But this is besides the point, since FisherZInv(vcov(model)) isn't appropriate anyway.
> 
> Best,
> Wolfgang
> 
>> -----Original Message-----
>> From: St Pourcain, Beate <Beate.StPourcain using mpi.nl>
>> Sent: Friday, October 18, 2024 16:34
>> To: Viechtbauer, Wolfgang (NP)
>> <wolfgang.viechtbauer using maastrichtuniversity.nl>; R Special Interest
>> Group for Meta-Analysis <r-sig-meta-analysis using r-project.org>
>> Subject: RE: Meta-analysis of intra class correlation coefficients
>>
>> Dear Wolfgang,
>>
>> Many thanks! This is really helpful, indeed! For clarification,
>> related traits refer to traits that are likely to reflect a single
>> shared underlying domain (e.g. assuming a musicality phenotype: singing, pitch, tapping a beat etc).
>> These scores are available in MZ and DZ twins in the same cohort, and
>> multiple analyses using these related phenotypes have been carried out
>> with the same study, and multiple studies have analysed the same
>> cohort, and multiple cohorts are available for analysis.
>>
>> I adjusted the code as you suggested. I excluded the DZ_factor as it
>> is redundant and just an opposite coding of the MZ_factor.
>>
>>> model <- rma.mv(
>> +   zi,
>> +   vzi,
>> +   random = list(~ MZDZ_factor | Study_ID, ~ MZDZ_factor | ESID), struct="UN",
>> +   data = data,
>> +   mods = ~ 0 + MZDZ_factor,
>> +   method = "REML",
>> +   tdist = TRUE)
>>
>> This provides now estimates for MZDZ_factor0 and MZDZ_factor1.
>> As heritability can only be estimated with raw ICC (twin
>> correlations), I reverted back:
>>
>> #i.e. approximating heritability (which is usually carried out with
>> established twin analysis methodology using ML) #Reverting back to r
>> from Z scores rMZ <- FisherZInv(model$b[2]) rDZ <-
>> FisherZInv(model$b[1])
>> h2 <- 2*(rMZ - rDZ)
>>
>> #or equivalently
>> h2_v2 <- 2*((exp(2*model$b[2]) - 1) / (exp(2*model$b[2]) + 1) -
>> (exp(2*model$b[1]) - 1) / (exp(2*model$b[1]) + 1))
>>
>> #with SE:
>> h2_se <- deltamethod(~ 2*((exp(2*x2) - 1) / (exp(2*x2) + 1) -
>> (exp(2*x1) - 1) /
>> (exp(2*x1) + 1)), coef(model), vcov(model))
>>
>> #This is now the SE for the h2 reverting back from Z scores, and I
>> slightly adjusted for the order of the coefficients.
>>
>> There is indeed covariance between zrMZ and zrDZ, which may arise due
>> to the meta-analytic character of the estimates (study effects?). The
>> covariance between rMZ and rDZ is assumed to be absent within a single
>> study, but here, indeed, it is not. We will now look into
>> possibilities to derive more sophisticated twin estimates using specialised software and ML estimations.
>> Ideally, these analyses are carried out on the raw scale.
>>
>> To simplify the calculation (avoiding the Fisher transformation), do
>> you think we could transform rma estimates and vcov back from Z to the
>> raw scale (still using the delta method)? There is a remarkable
>> stability in the vcov, for z and raw r estimates:
>>
>>> FisherZInv(vcov(model))
>>               MZDZ_factor0 MZDZ_factor1
>> MZDZ_factor0   0.01522759   0.03257534
>> MZDZ_factor1   0.03257534   0.07674611
>>> vcov(model)
>>               MZDZ_factor0 MZDZ_factor1
>> MZDZ_factor0   0.01522877   0.03258687
>> MZDZ_factor1   0.03258687   0.07689732
>>
>> Thanks again for your insight and help, Beate
>>
>> -----Original Message-----
>> From: Viechtbauer, Wolfgang (NP)
>> <wolfgang.viechtbauer using maastrichtuniversity.nl>
>> Sent: Wednesday, October 16, 2024 10:21 PM
>> To: St Pourcain, Beate <Beate.StPourcain using mpi.nl>; R Special Interest
>> Group for Meta-Analysis <r-sig-meta-analysis using r-project.org>
>> Subject: RE: Meta-analysis of intra class correlation coefficients
>>
>> I don't understand what you mean by 'related traits'. Also, if
>> multiple estimates are extracted from the same cohort, you may need to
>> account for this as well. But leaving this aside, you should fit a
>> single bivariate model to the
>> z(rMZ) and z(rDZ) values. Maybe something like this:
>>
>> model <- rma.mv(
>>    zi,
>>    vzi,
>>    random = list(~ MZDZ_factor | Study_ID, ~ MZDZ_factor | ESID), struct="UN",
>>    data = data,
>>    mods = ~ 0 + MZDZ_factor + DZMZ_factor,
>>    method = "REML",
>>    tdist = TRUE)
>>
>> although not sure whether this will converge.
>>
>> coef(model) then gives you the two coefficients and vcov(model) the
>> corresponding var-cov matrix thereof. The multivariate delta method is
>> described
>> here:
>>
>> https://en.wikipedia.org/wiki/Delta_method#Multivariate_delta_method
>>
>> This is implemented, for example in msm::deltamethod(). Double-check this, but:
>>
>> deltamethod(~ 2*((exp(2*x1) - 1) / (exp(2*x1) + 1) - (exp(2*x2) - 1) /
>> (exp(2*x2) + 1)), coef(model), vcov(model))
>>
>> should then give you the SE of 2*(rMZ - rDZ).
>>
>> Best,
>> Wolfgang
>>
>>> -----Original Message-----
>>> From: St Pourcain, Beate <Beate.StPourcain using mpi.nl>
>>> Sent: Wednesday, October 16, 2024 10:59
>>> To: Viechtbauer, Wolfgang (NP)
>>> <wolfgang.viechtbauer using maastrichtuniversity.nl>; R Special Interest
>>> Group for Meta-Analysis <r-sig-meta-analysis using r-project.org>
>>> Subject: RE: Meta-analysis of intra class correlation coefficients
>>>
>>> ATTACHMENT(S) REMOVED: Funnel_DZ_ICC_simple_withN_rZ.png |
>>> Funnel_MZ_ICC_simple_withN_rZ.png
>>>
>>> Dear Wolfgang,
>>> Many thanks for the suggestions!
>>> The r-Z transformation was easily applied, as suggested.
>>>
>>> data$zi <- FisherZ(data$r)
>>> data$vzi <- 1/(data$n-3/2)
>>> egger_seMZ <- regtest(MZdata$zi, MZdata$vzi) egger_seDZ <-
>>> regtest(DZdata$zi, DZdata$vzi)
>>>
>>> ...and improved the Egger regression
>>> zrMZ_se_z       zrDZ_se_z
>>> -3.968          -5.519
>>>
>>> Also, the funnel plots look better (see attached). We subsequently
>>> carried out the rma.mv as the ICC estimates partially represent
>>> related traits estimated within the same study (i.e. the estimates
>>> are related within rMZ and within rDZ), and of course rMZ and rDZ
>>> are nested within the same study and also sometimes the same cohort
>>> (for code see
>> below).
>>>
>>> For the subsequent analysis, we back-transformed the Z-score
>>> intercept from rma.mv with
>>>
>>> rMZ <- FisherZInv(model2$b[1])
>>> rDZ <- FisherZInv(model1$b[1])
>>>
>>> However, we are unsure as to how to best backtransform model2$se[1]
>>> and model1$se[1], i.e. the SEs of the rZ intercepts from rma.mv
>>> using the delta method as you recommended. Could you please advise?
>>> This will create the input for the subsequent heritability analyses.
>>>
>>> Best wishes,
>>> Beate
>>>
>>> #################################################################
>>> Study_ID - Cohort ID
>>> ESID - actual Study ID
>>> MZDZ_factor - (MZ=1, DZ=0)
>>> DZMZ_factor - (DZ=1,MZ=0)
>>>
>>> For completeness, I add the code below that was adapted from the
>>> Austerberry study for z values:
>>> model1 <- rma.mv(
>>>    zi,
>>>    vzi,
>>>    random = list(~MZDZ_factor|Study_ID, ~ MZDZ_factor | ESID),
>>>    data = data,
>>>    mods = ~ MZDZ_factor,
>>>    method = "REML",
>>>    tdist = TRUE
>>> )
>>>
>>> model2 <- rma.mv(
>>>    zi,
>>>    vzi,
>>>    random = list(~DZMZ_factor|Study_ID, ~ DZMZ_factor | ESID),
>>>    data = data,
>>>    mods = ~ DZMZ_factor,
>>>    method = "REML",
>>>    tdist = TRUE
>>> )
>>>
>>> Beate St Pourcain, PhD
>>> Senior Investigator & Group Leader
>>> Room A207
>>> Max Planck Institute for Psycholinguistics | Wundtlaan 1 | 6525 XD
>>> Nijmegen | The Netherlands
>>>
>>> @bstpourcain
>>> Tel: +31 24 3521964
>>> Fax: +31 24 3521213
>>> ORCID: https://orcid.org/0000-0002-4680-3517
>>> Web:
>>> https://www.mpi.nl/departments/language-and-genetics/projects/popula
>>> ti
>>> on-
>>> variation-and-human-communication/
>>> Further affiliations with:
>>> MRC Integrative Epidemiology Unit | University of Bristol | UK
>>> Donders Institute for Brain, Cognition and Behaviour | Radboud
>>> University | The Netherlands
>>>
>>> -----Original Message-----
>>> From: Viechtbauer, Wolfgang (NP)
>>> <mailto:wolfgang.viechtbauer using maastrichtuniversity.nl>
>>> Sent: Monday, October 14, 2024 3:32 PM
>>> To: St Pourcain, Beate <mailto:Beate.StPourcain using mpi.nl>; R Special
>>> Interest Group for Meta-Analysis
>>> <mailto:r-sig-meta-analysis using r-project.org>
>>> Subject: RE: Meta-analysis of intra class correlation coefficients
>>>
>>> It is not ideal to meta-analyze raw correlations (or raw ICC values)
>>> if they are so large. In this case, I think the r-to-z
>>> transformation is
>> highly advisable.
>>>
>>> What you observe is the fact that the variance of a raw correlation
>>> coefficient depends on the true correlation. In particular, the
>>> large-sample estimate of the variance of a raw correlation
>>> coefficient (or ICC based on pairs) is (1-r^2)^2 / (n-1), where r is
>>> the observed correlation and n the sample size. Therefore, as r gets
>>> close to 1, the
>> variance will get small, as you noted.
>>>
>>> It is therefore no surprise that the Egger regression test is highly
>>> significant. Of course, this then says nothing about potential
>>> publication
>> bias.
>>> There is an inherent link between the correlation and its varianace
>>> (and hence standard error). The same issue also arises with other
>>> outcome measures / effect sizes (e.g., standardized mean differences).
>>>
>>> In the present case, the r-to-z transformation 'solves' this issue,
>>> since Var[z_r] =~ 1/(n-3) for Pearson correlations and Var[z_ICC] =~
>>> 1/(n-3/2) for
>>> ICC(1) values.
>>>
>>> I would then consider doing a bivariate meta-analysis of the
>>> z_ICC_mz and z_ICC_dz values. Since they are based on independent
>>> samples, their sampling errors are uncorrelated, but the random
>>> effects of the bivariate model then account for potential
>>> correlation in the underlying true (transformed) ICC values. This is
>>> analogous to what people do when pooling sensitivity and specificity
>>> values in a diagnostic test meta-analysis and also directly relates
>>> to the bivariate model
>> discussed by van Houwelingen et al. (2002):
>>>
>>> https://www.metafor-project.org/doku.php/analyses:vanhouwelingen2002
>>>
>>> You can then estimate the heritability from this model by
>>> back-transforming the pooled estimate for the MZ twins and the
>>> pooled estimate from the DZ twins and taking twice the difference.
>>> The SE and hence CI for this can then be obtained via the delta method.
>>>
>>> This raises interesting questions about the difference between
>>> between
>>> 'pooled(x) - pooled(y)' versus 'pooled(x - y)' -- there are papers
>>> in the literature that discuss this issue (not in the present
>>> context) -- but the latter option doesn't appear sensible to me here anyway.
>>>
>>> Best,
>>> Wolfgang
> 
> _______________________________________________
> R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
> To manage your subscription to this mailing list, go to:
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
> 

-- 
Michael