[R-meta] Some additions to metafor and a general workflow for a meta-analysis with dependent estimates

Mon Nov 1 13:39:16 CET 2021

Dear R-sig-meta-analysis readers,

I have made some additions to the metafor package that are currently in the 'development' version (at https://github.com/wviechtb/metafor) that I would like to bring to your attention. Given that discussions around dependencies, multilevel/multivariate models, and cluster-robust inference (i.e., robust-variance estimation) frequently arise on this mailing list, I think that this might be of interest to at least some of you.

Dependencies in meta-analytic data can arise due to various reasons, for example when we compute multiple estimates from the same group of subjects (based on different response variables to measure the same underlying construct, different response variables to measure different constructs, and/or repeated assessments thereof) or when an effect size measure reflects the difference between two groups (e.g., standardized mean differences, risk/odds ratios, risk differences) and multiple estimates are computed where the information from one group is shared across computations (e.g., treatment group A versus control and treatment group B versus control). Anytime there is at least some overlap in subjects (or whatever the unit of analysis is) across estimates, then this essentially induces dependency in the sampling errors of these estimates. A first step in the analysis is then the computation of the covariances between the estimates (in addition to their sampling variances) and hence the construction of a var-cov matrix (what I call the 'V' matrix) that contains these variances and covariances.

Equations for computing the covariances under certain types of dependency for various effect size measures can be found in several sources (e.g., Gleser & Olkin, 2009; Lajeunesse, 2011; Wei & Higgins, 2013; Steiger, 1980). Unfortunately, the information needed to compute the covariances is often not available and implementing the equations can be difficult. This is where the first addition comes into play, namely the vcalc() function:

https://wviechtb.github.io/metafor/reference/vcalc.html

This function provides a fairly general framework for constructing an approximate V matrix for a wide variety of circumstances. One can't get around the fact that some information needs to be specified (e.g., correlations if there are multiple estimates based on different response variables, autocorrelations if the same effect was repeatedly assessed), but one can often make at least some reasonable guesses about the size of these inputs. Given that one has to make some simplifying assumptions anyway when constructing an approximate V matrix, the goal isn't anyway to get things perfectly right, but to create a var-cov matrix that reflects, to a reasonable degree, the various dependencies underlying the estimates.

The next step in the analysis is the use of an appropriate multilevel/multivariate model that includes random effects to capture the various sources of heterogeneity and dependencies (not in the sampling errors but the underlying true effects!) deemed to be relevant. This is a science all to itself as we can see questions about the use of functions like rma.mv() coming back again and again on this list. For more details in this function, see:

https://wviechtb.github.io/metafor/reference/rma.mv.html

To the extent that the V matrix is just an approximation and when we are worried that the fitted model might not fully capture all sources of heterogeneity and dependencies underlying the data, a final step would be the use of cluster-robust inference methods. There was already the robust() function in metafor, but even better are the methods implemented in the clubSandwich package (for better small-sample performance). In agreement with James Pustejovsky (the clubSandwich package author), robust() now allows you to set clubSandwich=TRUE, in which case the function makes use of the clubSandwich methods:

https://wviechtb.github.io/metafor/reference/robust.html

(note that this is not the default for backwards compatibility).

So, to summarize, the general workflow is this:

V <- vcalc(...)
res <- rma.mv(yi, V, random = ...)
robust(res, cluster=..., clubSandwich=TRUE)

Because I like lame jokes/puns, I guess we can call this the Triple Decker Club Sandwich.

James Pustejovsky and Elizabeth Tipton have recently published a paper that also describes a workflow along these lines:

https://www.jepusto.com/publication/rve-meta-analysis-expanding-the-range/

Although vcalc() now goes a long way in making this workflow easier, I am interested in feedback regarding its implementation and/or documentation. The same of course goes with any other function that is part of this workflow.

Thanks for reading and have a lovely week.

Best,
Wolfgang