[R-meta] clarification on "V" in "rma.mv()" from the "metafor" package

Wed Apr 28 11:52:58 CEST 2021

Dear Tim,

It's not clear to me how many effect sizes you want to compute in this example and you did not mention what effect size measure you are using.

Are you computing three effect size estimates (for the 3 outcomes) at the pre-test and three estimates at the post-test, so 6 in total? But how does this relate to "control group info"? Are there multiple treatment groups that are being compared to a single control group?

Best,
Wolfgang

>-----Original Message-----
>From: Tip But [mailto:fswfswt using gmail.com]
>Sent: Tuesday, 27 April, 2021 0:50
>To: Viechtbauer, Wolfgang (SP)
>Cc: r-sig-meta-analysis using r-project.org
>Subject: Re: [R-meta] clarification on "V" in "rma.mv()" from the "metafor"
>package
>
>Dear Dr. Viechtbauer,
>
>This is the clearest response I could ever expect, Thank you.
>
>WV: For example, if the numbers 2 and 4 above are the means of two variables x and
>y measured in a single group of n individuals and the observed correlation between
>variables x and y was r in the sample, then r is also the estimated correlation
>between those two means. But without knowing r, you cannot know what the
>correlation is between the means 2 and 4.
>
>TB: Suppose I have 3 sources of dependency among effect sizes within individual
>studies (i.e., use of several outcomes, multiple measurements, and the use control
>group info. in calculating effect sizes).
>
>Say for study 1, if I know the correlation between pre-and post-test performances
>across its 2 time points (r = .6), and the correlation between performances on the
>between its 3 the outcomes (r12 = .3, r13 = .4, r32 = .5), [I assume to form a "V"
>matrix so far we should expect an array].
>
>In my above scenario, do I need to know anything else from the primary studies
>(e.g., correlation, data) to account for the dependency due to the use control
>group info. in calculating effect sizes?
>
>Once again, thank you for the very clear explanation,
>Tim
>
>On Mon, Apr 26, 2021 at 3:00 AM Viechtbauer, Wolfgang (SP)
><wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>Dear Tim,
>
>Please see my responses below.
>
>Best,
>Wolfgang
>
>>-----Original Message-----
>>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On
>>Behalf Of Tip But
>>Sent: Saturday, 17 April, 2021 20:40
>>To: r-sig-meta-analysis using r-project.org
>>Subject: [R-meta] clarification on "V" in "rma.mv()" from the "metafor" package
>>
>>Dear All,
>>
>>I had some clarification questions regarding the "rma.mv()" from the
>>"metafor" package.
>>
>>In regular (i.e., non-meta-regression) multivariate multilevel models, we
>>naturally get an estimated variance-covariance matrix for the DV values
>>across different levels of the grouping variable (ID) by specifying the
>>random effects:
>>
>>nlme::lme(DV_values ~ DV_indx-1, random = ~ DV_indx -1 | ID,
>>          data = data, correlation = NULL, weights = NULL)     ## DON'T RUN
>>
>>But in "rma.mv()", there is an additional "V" argument to provide a list of
>>known/guesstimated variance-covariance matrices between the DV values
>>[i.e., individual effect sizes] in each study (i.e., grouping variable) as
>>well.
>>
>>The R documentation on the "V" argument in "rma.mv()" is very terse. But,
>
>:( It is true that the help page for rma.mv() does not explain the theory in great
>detail (although more so than is typical, at least for most packages I know). But
>I don't think the help pages are the place to explain the theory in the first
>place. That's what the references at the bottom are for.
>
>>(1) Does the use of "V" arise whenever each study generally produces
>>multiple dependent effect sizes OR it is reserved for when we have a pool
>>of e.g., multi-outcome and/or longitudinal studies?
>
>The V matrix plays a role whenever the sampling errors of the estimates are
>correlated. One cannot estimate the covariance/correlation between two estimates
>based on just the observed values of the two estimates. In essence, that's would
>asking for the correlation between the numbers 2 and 4. So, what we have to do is
>use the statistical theory underlying the outcome measure to derive an estimate of
>their covariance/correlation. For example, if the numbers 2 and 4 above are the
>means of two variables x and y measured in a single group of n individuals and the
>observed correlation between variables x and y was r in the sample, then r is also
>the estimated correlation between those two means. But without knowing r, you
>cannot know what the correlation is between the means 2 and 4. So, this kind of
>'outside' information needs to be brought in to construct an appropriate V matrix.
>That is what makes meta-analytic models somewhat different from standard mixed-
>effects models, where we can (typically) estimate everything from the data.
>
>That difference already comes into play even in much simpler models where we only
>have to think about the sampling variances of the estimates. In a meta-analytic
>model, we again estimate the sampling variances of the estimates based on other
>information / statistical theory than the data itself. For example, you cannot
>know what the sampling variance of the mean 2 is without additional information.
>But if you know that the observed SD of x was 0.5, then we know that the
>(estimated) sampling variance of the mean is 0.5^2 / n.
>
>Since every estimate has its 'own' sampling variance, we have as many sampling
>variances as there are estimates. Similarly, for every pair of estimates, there is
>one covariance/correlation. This is what goes into the V matrix.
>
>Now to get back to your actual question: I would roughly say that we have to think
>about the covariance between the sampling errors of two estimates whenever there
>is at least one subject (or whatever the unit of analysis is) that contributes
>data to both estimates. That covers the example above, where all n subjects
>contribute data to both means. Those two means might be two different 'outcomes'
>(variables) or the same outcome measured at two different time points.
>
>There does not need to be full overlap of subjects either. One case that arises
>ocasionally in meta-analysis is studies with multiple treatment groups and a
>single control group (or something similar). We might then compute T1 - C and T2 -
>C, where T1, T2, and C are the means (or risks or something analogous) for the
>different conditions. Since the n_C control subjects contribute data to both T1 -
>C and T2 - C, we again have *at least one subject* that has contributed data to
>those two estimates and again correlated sampling errors.
>
>On the other hand, different studies have (presumably) used completely different
>subjects. Hence, no (i.e. zero) covariance and so we end up with that block-
>diagonal matrix in V. For example, as in:
>
>dat <- dat.berkey1998
>V <- lapply(split(dat[c("v1i", "v2i")], dat$trial), as.matrix)
>V <- bldiag(V)
>V
>
>>(2) Given the lack of an extended documentation, is/are there any general
>>equivalent(s) for the "V" argument in the context of regular (i.e.,
>>non-meta-regression) multilevel modeling packages (e.g., combination of
>>"correlation" and "weights" arguments from the "nlme::lme()")?
>
>If you have the raw data, then you can directly estimate the sampling variances
>and covariances from them as part of the model fitting. And then you would indeed
>use the "correlation" and "weights" arguments from lme() to do so.
>
>>(3) Why the multi-level structure alone can't account for the correlations
>>among effect sizes within each study needing us to specify an additional
>>"V" list of  variance-covariance matrices?
>
>For the same reason that the sampling variances of the estimates do not tell you
>anything about the variance in the underlying true outcomes. Suppose I have the
>observed means 1, 3, 2, 5 with sampling variances 0.6^2 / 60, 0.3^2 / 40, 0.5^2 /
>100, 0.4^2 / 80. Those sampling variances in essence tell you how much a mean
>would vary *within a study* if that study was repeated over and over again (under
>identical circumstances). But those sampling variances do not tell you anything
>about how much the underlying true means might vary. That is why we estimate
>'tau^2' in a random-effects model - to estimate the variance in the underlying
>true effects/outcomes. The same applied to the covariances among the sampling
>errors. Those do not tell you anything about the covariance among the underlying
>true effect/outcomes.
>
>If you want to think about it more from a 'traditional' multilevel analysis
>context: It's a bit like the within- and between- group relationship issue. What
>you observe at one level (within groups) does not tell you anything about the
>relationship at another level (between groups).
>
>>Thank you very much for your knowledge and expertise,
>>Tim