[R-meta] confusion point: the various 'correlation' (rho, ρ) in multivariate meta-analytic model

Wed May 4 12:52:00 CEST 2022

Dear Yefang,

Please be careful with using specialized symbols/formatting in your text, since this is a plain-text mailing list and such symbols/formatting might not display correctly for the recipients (note the ? mark symbols below and in how this ends up looking in the archives: https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2022-May/004026.html). 

Also, when I get questions via email, I typically redirect them here or places like StackOverflow (https://stackoverflow.com) or CrossValidated (https://stats.stackexchange.com) because the answer I or others provide might be beneficial to more people, not just the person asking. And yes, sometimes I do not have the time to answer.

So, to your question(s):

The V matrix (which we can construct/approximate with vcalc()) is the variance-covariance matrix of the sampling errors of the effect size estimates within each study. To keep things simple, say our 'effect size measure' is simply a raw mean. And we have measured two different things (like cognition and anxiety) in a single group of subjects. So, the raw data in a study would simply be two columns, one for each variable, for the n subjects. Now imagine we would repeat this study over and over, each time computing the means of the two variables based on new samples of subjects drawn from the same population and then we would correlate these pairs of means -- that is the within-study correlation. But we just have run the study once. So we cannot correlate the pairs of means, since you cannot compute the correlation when you just have two numbers. However, it turns out that the correlation between the raw data (say, r) is an estimate of the correlation between the two means!

This is essentially the same principle as what is used in computing (or rather: estimating) the sampling variance of an effect size measure. In theory, the sampling variance is the variance in the effect size estimates we would obtain if we theoretically would repeat the same study over and over under identical circumstances. But we don't do that, since the study was run once. We have our effect size estimate from that study and now we want to know what its variance would have been if we had repeated the study over and over. For a mean, we can estimate its sampling variance by dividing the variance of the raw data by n. So, we can do this for the cognition mean (variance of the raw cognition values divided by n) and the anxiety mean (variance of the raw anxiety values divided by n). Those are the sampling variances. And the covariance between the two means is estimated by taking the correlation between the raw data and multiplying that by the square root of the product of the two sampling variances (i.e., cov = r * sqrt(sampling_var_1 * sampling_var_2)).

This will give me the 2x2 matrix that goes into the V matrix for this particular study, which reflects the dependency in the two estimates within this study. So, the 'rho' for vcalc() is about this within-study correlation ('r' above).

Sidenote: For other effect size measures that are more commonly used in meta-analyses (instead of simply means) like standardized mean differences, (log) odds/risk ratios, and so on, the equations that need to be used to compute/estimate their sampling variances and the correlation between two estimates computed based on the same sample of subjects are of course different than those used for means.

But in a meta-analaysis, we have multiple studies. And for each study, there is a pair of means (or whatever the effect size measure is). Another type of correlation we can ask about is the correlation in the underlying *true* means (assuming that the true means of the two variables are not constant across studies). This is what we can estimate by fitting a model like:

rma.mv(yi, V, mods = ~ outcome, random = ~ outcome | study, struct="UN", data=mydata)

So, yi is the vector with the means for the two outcomes for all of the studies, V is the var-cov matrix we constructed above (block-diagonal with 2x2 blocks), outcome distinguishes if a value in yi is a mean for the first or the second outcome, and 'study' is a study identifier.

Sidenote: If a study did not measure both outcomes, then this is no problem -- it just provides one value to 'yi' and its part in the V matrix is just a 1x1 block.

By adding a random effect for 'outcome within study' (random = ~ outcome | study) with an 'unstructured variance-covariance matrix' (struct="UN"), we estimate the variance in the *true* means for the first and second outcome (and we allow those two variances to be different) and we estimate the covariance/correlation between the true means. This is the 'rho' for the correlation of the true means.

Note that it is important above that we use rma.mv(yi, V, ...) and not just rma.mv(yi, vi, ...), since the latter would assume that the within-study correlations are 0. The effect of this is (typically) that we would overestimate the correlation of the true means.

And finally, yes, there can be other 'rho' types (or more generally, correlations, because what we call these correlations is completely arbitrary). For example, there can be autocorrelations when the same effect size is repeatedly measured over time in a group of subjects (then we can have within-study autocorrelation and also the autocorrelation in the true means).

I hope this clarifies things a bit.

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On
>Behalf Of Yefeng Yang
>Sent: Tuesday, 03 May, 2022 16:02
>To: r-sig-meta-analysis using r-project.org
>Subject: [R-meta] confusion point: the various 'correlation' (rho, ρ) in
>multivariate meta-analytic model
>
>Hi subscribers
>
>I am writing to ask one question regarding the various 'correlation' (rho) in
>multivariate models (implemented in rma.mv() function).  I would be grateful if
>would like to clarify my confusion.  I am hoping someone who is familiar with
>meta-analytic models can address my questions. Wolfgang definitely knows the
>answers, but I guess he is so busy that he has no time to answer my questions.
>Anyway, I briefly describe my question below.
>
>Assume I am conducting a bivariate meta-analysis to estimate (1) the overall
>effects of two outcomes (cognition and anxiety), and (2) the correlation between
>the two outcomes. I am confused with the two types of correlations (probably
>three types) involved in this bivariate meta-analysis.
>
>(i) rho in the variance-covariance matrix of dependent effect size's sampling
>errors:
>
>when constructing (or approximately imputing) variance-covariance matrix of
>dependent effect size's sampling errors (via vcalc() function), we can use the
>argument rho (��) to specify the correlation of observed effect sizes or outcomes
>measured concurrently:
>
>### construct the variance-covariance matrix assuming rho = 0.66 for effect sizes
>corresponding to the 'verbal' and 'math' outcome types
>
> V <- vcalc(vi, cluster=studyID, type=outcome, data=dat, rho=0.66)
>
> (ii) rho in the variance-covariance matrix of random effects structure:
>
>say I am using ~ inner | outer to define the random effects structure ~ outcome |
>studyID. For any of the variance structures (e.g., compound symmetric structure
>[CS], heteroscedastic compound symmetric structure [HCS], UN [unstructured]),
>there is a correlation coefficient rho (��) denoting the correlation between the
>different levels of inner variable (in the our case, outcome). Then we can fit
>the bivariate random-effects model using rma.mv(), for example:
>
>rma.mv(yi, vi, mods = ~ outcome, random = ~ group | studyID, struct="UN",
>data=mydata)
>
>(iii) correlation between true effects size/outcomes
>
>My question is,
>
>  1.   what are the differences and relationships between �� (sampling
>correlation; scenario i) in V matrix and �� in the random effects structure
>(scenario ii) in the context of my example and a more general condition?
>  2.  whether the second �� (correlation in the random effects structure;
>scenario ii) is exactly the third �� (correlation between the underlying true
>effects size/outcomes; scenario iii).
>  3.  Is �� (sampling correlation; scenario i) meaning the correlation between
>the observed effects/outcomes and �� in the random effects structure (scenario
>ii) is the correlation between the true effects/outcomes.
>  4.  Is there any other �� in multivariate or multilevel meta-analytic models?
>
>Best,
>
>Yefeng Yang PhD
>https://scholar.google.com/citations?user=V1WGHHIAAAAJ
>UNSW, Sydney
>City University of Hong Kong, China