[R-meta] Interpretation of the Q-test statistic in a multilevel meta-analysis

Wed Sep 11 10:22:41 CEST 2024

Dear List Members,
We employed the rma.mv function from the metafor package to perform a 
meta-analysis where effect sizes were nested within samples, and samples 
were nested within countries. The total number of effect sizes exceeded 
8,000. Below, I provide a toy example, in which I randomly sampled 626 
effect sizes from 351 samples across 87 countries.
We specified a variance-covariance matrix (vcov_mat) to account for the 
observed effect sizes within each sample. The corresponding code was as 
follows:

M1 <- rma.mv(yi = Corrz, V = vcov_mat, data = tmp_es_dat, random = list(~ 1 
| COUNTRY / SampleID / ESID), sparse = FALSE)

Here are the results:
Multivariate Meta-Analysis Model (k = 626; method: REML)

     logLik    Deviance         AIC         BIC        AICc 
   728.1443  -1456.2886  -1448.2886  -1430.5376  -1448.2241 

Variance Components:

             estim    sqrt  nlvls  fixed                 factor
sigma^2.1  0.0042  0.0648     87     no                COUNTRY
sigma^2.2  0.0037  0.0610    351     no       COUNTRY/SampleID
sigma^2.3  0.0021  0.0459    626     no  COUNTRY/SampleID/ESID

Test for Heterogeneity:
Q(df = 625) = 23584.2025, p-val < .0001

Model Results:

estimate      se      zval    pval    ci.lb    ci.ub     
  -0.2620  0.0085  -30.7263  <.0001  -0.2788  -0.2453  ***

In addition to I² and the variance components at various levels (effect 
sizes, samples, and countries), we used the Q-test statistic to assess the 
heterogeneity of effect sizes.
An expert reviewer of our meta-analysis pointed out potential ambiguities in 
how we interpreted the Q-test statistic. Specifically, the reviewer said 
that the Q-test statistic is "the test of the between-clusters variation 
(whatever the clusters are in the model)."
However, I am unsure how to apply this interpretation to the Q-test 
statistic included in the metafor output. I learned from the help section of 
the rma.mv function that the Q "is the generalized/weighted least squares 
extension of Cochran's Q-test, which tests whether the variability in the 
observed effect sizes or outcomes is larger than one would expect based on 
sampling variability (and the given covariances among the sampling errors) 
alone. A significant test suggests that the true effects/outcomes are 
heterogeneous."
In our case, the Q suggests that the observed effect sizes vary 
significantly (p < .0001) around the average effect size (r = -0.26). 
Furthermore, the Q provided by metafor points to statistically significant 
heterogeneity, with heterogeneity referring to the total variance 
encompassing all potential sources of variance, including effect sizes, 
samples, and countries. However, I am unsure whether this is what the 
reviewer meant by interpreting the Q as "between-clusters variation."
I would highly appreciate any help in clarifying the interpretation of the 
Q-test statistic.
Thank you!
Best regards,
Martin

PS: I apologize for the poor formatting of the metafor output, but my email 
program does not support better formatting options
.