[R-meta] clarification on "V" in "rma.mv()" from the "metafor" package

Mon Apr 26 09:59:51 CEST 2021

Dear Tim,

Please see my responses below.

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On
>Behalf Of Tip But
>Sent: Saturday, 17 April, 2021 20:40
>To: r-sig-meta-analysis using r-project.org
>Subject: [R-meta] clarification on "V" in "rma.mv()" from the "metafor" package
>
>Dear All,
>
>I had some clarification questions regarding the "rma.mv()" from the
>"metafor" package.
>
>In regular (i.e., non-meta-regression) multivariate multilevel models, we
>naturally get an estimated variance-covariance matrix for the DV values
>across different levels of the grouping variable (ID) by specifying the
>random effects:
>
>nlme::lme(DV_values ~ DV_indx-1, random = ~ DV_indx -1 | ID,
>          data = data, correlation = NULL, weights = NULL)     ## DON'T RUN
>
>But in "rma.mv()", there is an additional "V" argument to provide a list of
>known/guesstimated variance-covariance matrices between the DV values
>[i.e., individual effect sizes] in each study (i.e., grouping variable) as
>well.
>
>The R documentation on the "V" argument in "rma.mv()" is very terse. But,

:( It is true that the help page for rma.mv() does not explain the theory in great detail (although more so than is typical, at least for most packages I know). But I don't think the help pages are the place to explain the theory in the first place. That's what the references at the bottom are for.

>(1) Does the use of "V" arise whenever each study generally produces
>multiple dependent effect sizes OR it is reserved for when we have a pool
>of e.g., multi-outcome and/or longitudinal studies?

The V matrix plays a role whenever the sampling errors of the estimates are correlated. One cannot estimate the covariance/correlation between two estimates based on just the observed values of the two estimates. In essence, that's would asking for the correlation between the numbers 2 and 4. So, what we have to do is use the statistical theory underlying the outcome measure to derive an estimate of their covariance/correlation. For example, if the numbers 2 and 4 above are the means of two variables x and y measured in a single group of n individuals and the observed correlation between variables x and y was r in the sample, then r is also the estimated correlation between those two means. But without knowing r, you cannot know what the correlation is between the means 2 and 4. So, this kind of 'outside' information needs to be brought in to construct an appropriate V matrix. That is what makes meta-analytic models somewhat different from standard mixed-effects models, where we can (typically) estimate everything from the data.

That difference already comes into play even in much simpler models where we only have to think about the sampling variances of the estimates. In a meta-analytic model, we again estimate the sampling variances of the estimates based on other information / statistical theory than the data itself. For example, you cannot know what the sampling variance of the mean 2 is without additional information. But if you know that the observed SD of x was 0.5, then we know that the (estimated) sampling variance of the mean is 0.5^2 / n.

Since every estimate has its 'own' sampling variance, we have as many sampling variances as there are estimates. Similarly, for every pair of estimates, there is one covariance/correlation. This is what goes into the V matrix.

Now to get back to your actual question: I would roughly say that we have to think about the covariance between the sampling errors of two estimates whenever there is at least one subject (or whatever the unit of analysis is) that contributes data to both estimates. That covers the example above, where all n subjects contribute data to both means. Those two means might be two different 'outcomes' (variables) or the same outcome measured at two different time points.

There does not need to be full overlap of subjects either. One case that arises ocasionally in meta-analysis is studies with multiple treatment groups and a single control group (or something similar). We might then compute T1 - C and T2 - C, where T1, T2, and C are the means (or risks or something analogous) for the different conditions. Since the n_C control subjects contribute data to both T1 - C and T2 - C, we again have *at least one subject* that has contributed data to those two estimates and again correlated sampling errors.

On the other hand, different studies have (presumably) used completely different subjects. Hence, no (i.e. zero) covariance and so we end up with that block-diagonal matrix in V. For example, as in:

dat <- dat.berkey1998
V <- lapply(split(dat[c("v1i", "v2i")], dat$trial), as.matrix)
V <- bldiag(V)
V

>(2) Given the lack of an extended documentation, is/are there any general
>equivalent(s) for the "V" argument in the context of regular (i.e.,
>non-meta-regression) multilevel modeling packages (e.g., combination of
>"correlation" and "weights" arguments from the "nlme::lme()")?

If you have the raw data, then you can directly estimate the sampling variances and covariances from them as part of the model fitting. And then you would indeed use the "correlation" and "weights" arguments from lme() to do so.

>(3) Why the multi-level structure alone can't account for the correlations
>among effect sizes within each study needing us to specify an additional
>"V" list of  variance-covariance matrices?

For the same reason that the sampling variances of the estimates do not tell you anything about the variance in the underlying true outcomes. Suppose I have the observed means 1, 3, 2, 5 with sampling variances 0.6^2 / 60, 0.3^2 / 40, 0.5^2 / 100, 0.4^2 / 80. Those sampling variances in essence tell you how much a mean would vary *within a study* if that study was repeated over and over again (under identical circumstances). But those sampling variances do not tell you anything about how much the underlying true means might vary. That is why we estimate 'tau^2' in a random-effects model - to estimate the variance in the underlying true effects/outcomes. The same applied to the covariances among the sampling errors. Those do not tell you anything about the covariance among the underlying true effect/outcomes.

If you want to think about it more from a 'traditional' multilevel analysis context: It's a bit like the within- and between- group relationship issue. What you observe at one level (within groups) does not tell you anything about the relationship at another level (between groups).

>Thank you very much for your knowledge and expertise,
>Tim