[R-meta] Meta-analysis of R^2 Values

Thu Jun 1 14:51:14 CEST 2023

Hi all,

On a number of occasions, the question has been raised on this mailing list whether it is possible to meta-analyze R^2 values (I have also received this question a number of times via email). See, for example:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2021-March/002708.html
https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2023-January/004325.html
https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2023-April/004554.html

In these discussions, valid concerns about this have been raised. For example, R^2 values are 'directionless' (in contrast to the more commonly used outcome measures used for meta-analyses, where positive and negative values can cancel each other out). The question is also how to compute the sampling variance of R^2 values and whether some kind of transformation may be needed (to normalize the sampling distribution).

I share (and raised some of) these concerns but I would also say that it is not inherently wrong to meta-analyze R^2 values. Therefore, after a bit of further reading, thinking, and running some simulations, I have now implemented measures "R2" and "ZR2" in escalc(). The former is for raw R^2 values, although it should be better to use the latter as it uses a variance-stabilizing transformation of R^2 that also has normalizing properties (similar to the well-known r-to-z transformation for raw correlation coefficients). You can find the documentation about this here:

https://wviechtb.github.io/metafor/reference/escalc.html

(if you search for 'R-squared', you will find the right place in this ever growing help page).

Some of the caveats / limitations are also mentioned there (e.g., the equations assume that we are in a multivariate normal setting and that the true R^2 values are non-zero).

If you want to try this out, first install the 'devel' version of metafor:

install.packages("remotes")
remotes::install_github("wviechtb/metafor")

and then this will work:

library(metafor)

dat <- dat.aloe2013

par(mfrow=c(2,1))

dat <- escalc(measure="R2", r2i=R2, mi=preds, ni=n, data=dat, slab=study)
res <- rma(yi, vi, data=dat)
res
forest(res, header=TRUE, xlim=c(-0.6,1.4), alim=c(0,1), refline=coef(res), efac=2)
title(expression(bold("Using Raw " * R^2 * " Values")))

dat <- escalc(measure="ZR2", r2i=R2, mi=preds, ni=n, data=dat, slab=study)
res <- rma(yi, vi, data=dat)
res
pred <- predict(res, transf=transf.ztor2)
pred
forest(res, header=TRUE, xlim=c(-0.6,1.4), alim=c(0,1), transf=transf.ztor2, refline=pred$pred, efac=2)
title(expression(bold("Using z-transformed " * R^2 * " Values (back-transformed)")))

I cannot say whether a meta-analysis of the R^2 values for this particular dataset is sensible. Just using it for illustration purposes.

If somebody has a dataset with R^2 values where they have a legitimate reason for such a meta-analysis, I would love to hear about it. Any feedback in general is of course welcome.

Best,
Wolfgang