[R-meta] RVE or not RVE in meta-regressions with small number of studies?

Thu Apr 20 19:51:03 CEST 2023

Wolfgang, thanks for jumping in (have been swamped so not much time for
mailing list correspondence).

Surprising nobody, my perspective is very much in agreement with the
argument Wolfgang laid out. I think it's useful to think about these
questions in three stages:

1. What working model should you use? The cited paper used robumeta so
either a CE or HE working model. As Sebastian points out, the CHE working
model is more flexible and lets you decompose heterogeneity across levels
of the model. Generally, the best working model is the one that most
closely approximates the real data-generating process.

2. How should you calculate standard errors? From the analyst's point of
view, the more work you put in on checking the working model specification,
the better position you will be in to trust its assumptions. If you are
willing to accept the assumptions (including homoskedasticity of random
effects at each level, etc.), then it is reasonable to use the model-based
standard errors generated by rma.mv(). On the other hand, if there are
substantial differences between the model-based SEs and the cluster-robust
SEs, that is probably a sign that the working model is mis-specified in
some way, which casts doubt on trusting the model-based SEs.

3. How should you do inference (hypothesis testing and confidence
intervals)? Here, there is a further difference between model-based
approaches and cluster-robust approaches. The model-based approaches
(either Wald tests with model-based standard errors or likelihood ratio
tests) involve asymptotic approximations, so you need to gauge whether your
database includes a large enough number of studies to trust the asymptotic
approximation. (In principle, one could use small-sample adjustments to
Wald tests, such as the Kenward-Roger corrections, but these are not
implemented in metafor). Robust variance estimation as implemented in
robumeta or clubSandwich uses methods with small-sample adjustments (such
as Satterthwaite degrees of freedom) that perform well even in quite small
samples. Thus, another reason there might be differences between
model-based CIs and robust CIs is that the robust CIs are based on more
accurate approximations, so apparent advantages of the model-based CI might
be illusory.

Further inline comments below.

James

On Tue, Apr 18, 2023 at 9:13 AM Röhl, Sebastian via R-sig-meta-analysis <
r-sig-meta-analysis using r-project.org> wrote:

> Dear all,
>
> I came across an article in RER that argues that one could or should forgo
> RVE for analysis of categorical moderators in case of smaller study numbers:
>
> Cao, Y., Grace Kim, Y.‑S., & Cho, M. (2022). Are Observed Classroom
> Practices Related to Student Language/Literacy Achievement? Review of
> Educational Research, 003465432211306.
> https://doi.org/10.3102/00346543221130687
> Page 10: “We acknowledge the superiority of robust variance estimation
> (RVE) for handling dependent effect sizes. However, it has a few important
> limitations. First, it neither
> models heterogeneity at multiple levels nor provides corresponding
> hypothesis tests.

When the authors refer to "RVE" here, I think they are referencing the
models implemented in the robumeta package. These models (the CE and HE
working models) are indeed limited in terms of modeling heterogeneity at
multiple levels and limited in that they do not provide means of conducting
hypothesis tests about variance components. As Sebastian noted, the first
limitation can be resolved by using the CHE or related working models. The
second limitation can be resolved in some sense by using ML or REML
estimation of variance components. One can then use likelihood ratio tests
for the variance components, although such tests are not "robust" in the
sense of RVE. Rather, they are predicated (at least to some extent?) on
having correctly specified the working model.

> Second, the power of the categorical moderator highly depends on the
> number of studies and features of the covariate (Tanner-Smith, Tipton, &
> Polanin, 2016). When the number of studies is small, the test statistics
> and confidence intervals based on RVE can have inflated Type I error
> (Hedges et al., 2010; Tipton & Pustejovsky, 2015).

Inflated Type I error is true for RVE not involving small-sample
corrections (i.e., the approaches called CR0 or CR1 in clubSandwich, or the
approach implemented in metafor::robust() with clubSandwich = FALSE).
Inflated Type I error is much less of an issue with the CR2 adjustment and
Satterthwaite degrees of freedom.

> Relating to our cases, many of our moderators had imbalanced distributions
> […]. Consequently, tests of particular moderators may be severely
> underpowered.”
> Of course, the first argument can be invalidated by the use of correlated
> hierarchical effects models with RVE. However, I find the second argument
> very relevant from my experience.
>

As Wolfgang noted, the question here is: "severely underpowered" relative
to what alternative?

In the social sciences, after all, we more often conduct meta-analyses with
> relatively small study corpus (n<100 or n<50). In high-ranked journals in
> this research field (e.g., Psychological Bulletin, Review of Educational
> Research, Educational Research Review…) I very rarely find the use of RVE /
> CRVE.
>
> I think this is changing (finally). Recent submissions to Psych Bulletin
regularly use RVE/CRVE, but RER and ERR have been slower to shift practice.

> In mentioned types of moderator analyses with small number of studies in
> one category, I also often face the same problem that effects become
> non-significant when using CRVE as soon as moderator levels are populated
> with less than 10-15 studies. Joshi et al (2022) also talk about RVE being
> (too) highly conversative in these cases.

Joshi's comments about tests being too conservative here pertain to
hypothesis tests involving multiple contrasts, such as testing the equality
of effect sizes across a moderator with 3 or 4 categories (mu_1 = mu_2 =
mu_3, etc.). For single-parameter tests and confidence intervals, CR2
standard errors and Satterthwaite degrees of freedom are well calibrated
unless the degrees of freedom are very small (df < 4, as suggested in
Tipton, 2015).

> I have also used cluster wild bootstrapping for significance testing of
> individual effects in this case. However, the problem of missing SEs and
> C.I.s as well as the high computation time arises here.
>
> Have you tried the latest version of wildmeta? From version 0.3.1
(released in February), parallel processing is supported, which can help
with computation time quite a bit. But again, this is really only relevant
for hypothesis tests involving multiple contrasts.

> Right now, I am again facing the problem of model selection for a
> meta-analysis with about 50 studies and 500 ES (correlations). Since we are
> dealing with ES within studies, I would choose a correlated hierarchical
> effects model with CRVE, which also works very well for the main effects,
> but again leads to said very large SEs for the moderators. As a pure CHE
> model (which in my opinion still fits better than the pure HE model in the
> above mentioned article by Cao et al) the SEs are of course somewhat more
> moderate.
> Do you have any tips or hints for an alternative?
>
Two things to consider:
A. Have you tried group-mean centering the predictors? It could be a
contextual effects issue that leads to discrepancies between model-based
and robust SEs.
B. If that doesn't resolve the issue, then it seems like the discrepancy
could be driven by mis-specification of the working model (see my point #2
above). If you group-mean center the predictors, you could include random
slopes in the model to see if there is heterogeneity in the within-study
slopes. Unmodeled random slopes could again lead to discrepancies between
model-based and robust SEs.

****************************
> Dr. Sebastian Röhl
> Eberhard Karls Universität Tübingen
> Institut für Erziehungswissenschaft
> Tübingen School of Education (TüSE)
> Wilhelmstraße 31 / Raum 302
> 72074 Tübingen
>
> Telefon: +49 7071 29-75527
> Fax: +49 7071 29-35309
> E-Mail: sebastian.roehl using uni-tuebingen.de<mailto:
> sebastian.roehl using uni-tuebingen.de>
> Twitter: @sebastian_roehl  @ResTeacherEdu
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
> To manage your subscription to this mailing list, go to:
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>

	[[alternative HTML version deleted]]