[R-meta] RVE or not RVE in meta-regressions with small number of studies?

Thu Apr 20 23:31:14 CEST 2023

Just a curious follow-up, Wolfgang, are you possibly aware of any
reference(s) evaluating the performance of dfs="contain" (when it applies)
method in metafor for fully nested rma.mv models in terms of Type I error
rates and/or CI coverage for fixed-effect estimates relative to those from
wald-type and/or potentially RVE inferences?

Thanks,
Reza

On Thu, Apr 20, 2023 at 12:51 PM James Pustejovsky via R-sig-meta-analysis <
r-sig-meta-analysis using r-project.org> wrote:

> Wolfgang, thanks for jumping in (have been swamped so not much time for
> mailing list correspondence).
>
> Surprising nobody, my perspective is very much in agreement with the
> argument Wolfgang laid out. I think it's useful to think about these
> questions in three stages:
>
> 1. What working model should you use? The cited paper used robumeta so
> either a CE or HE working model. As Sebastian points out, the CHE working
> model is more flexible and lets you decompose heterogeneity across levels
> of the model. Generally, the best working model is the one that most
> closely approximates the real data-generating process.
>
> 2. How should you calculate standard errors? From the analyst's point of
> view, the more work you put in on checking the working model specification,
> the better position you will be in to trust its assumptions. If you are
> willing to accept the assumptions (including homoskedasticity of random
> effects at each level, etc.), then it is reasonable to use the model-based
> standard errors generated by rma.mv(). On the other hand, if there are
> substantial differences between the model-based SEs and the cluster-robust
> SEs, that is probably a sign that the working model is mis-specified in
> some way, which casts doubt on trusting the model-based SEs.
>
> 3. How should you do inference (hypothesis testing and confidence
> intervals)? Here, there is a further difference between model-based
> approaches and cluster-robust approaches. The model-based approaches
> (either Wald tests with model-based standard errors or likelihood ratio
> tests) involve asymptotic approximations, so you need to gauge whether your
> database includes a large enough number of studies to trust the asymptotic
> approximation. (In principle, one could use small-sample adjustments to
> Wald tests, such as the Kenward-Roger corrections, but these are not
> implemented in metafor). Robust variance estimation as implemented in
> robumeta or clubSandwich uses methods with small-sample adjustments (such
> as Satterthwaite degrees of freedom) that perform well even in quite small
> samples. Thus, another reason there might be differences between
> model-based CIs and robust CIs is that the robust CIs are based on more
> accurate approximations, so apparent advantages of the model-based CI might
> be illusory.
>
> Further inline comments below.
>
> James
>
> On Tue, Apr 18, 2023 at 9:13 AM Röhl, Sebastian via R-sig-meta-analysis <
> r-sig-meta-analysis using r-project.org> wrote:
>
> > Dear all,
> >
> > I came across an article in RER that argues that one could or should
> forgo
> > RVE for analysis of categorical moderators in case of smaller study
> numbers:
> >
> > Cao, Y., Grace Kim, Y.‑S., & Cho, M. (2022). Are Observed Classroom
> > Practices Related to Student Language/Literacy Achievement? Review of
> > Educational Research, 003465432211306.
> > https://doi.org/10.3102/00346543221130687
> > Page 10: “We acknowledge the superiority of robust variance estimation
> > (RVE) for handling dependent effect sizes. However, it has a few
> important
> > limitations. First, it neither
> > models heterogeneity at multiple levels nor provides corresponding
> > hypothesis tests.
>
>
> When the authors refer to "RVE" here, I think they are referencing the
> models implemented in the robumeta package. These models (the CE and HE
> working models) are indeed limited in terms of modeling heterogeneity at
> multiple levels and limited in that they do not provide means of conducting
> hypothesis tests about variance components. As Sebastian noted, the first
> limitation can be resolved by using the CHE or related working models. The
> second limitation can be resolved in some sense by using ML or REML
> estimation of variance components. One can then use likelihood ratio tests
> for the variance components, although such tests are not "robust" in the
> sense of RVE. Rather, they are predicated (at least to some extent?) on
> having correctly specified the working model.
>
>
> > Second, the power of the categorical moderator highly depends on the
> > number of studies and features of the covariate (Tanner-Smith, Tipton, &
> > Polanin, 2016). When the number of studies is small, the test statistics
> > and confidence intervals based on RVE can have inflated Type I error
> > (Hedges et al., 2010; Tipton & Pustejovsky, 2015).
>
>
> Inflated Type I error is true for RVE not involving small-sample
> corrections (i.e., the approaches called CR0 or CR1 in clubSandwich, or the
> approach implemented in metafor::robust() with clubSandwich = FALSE).
> Inflated Type I error is much less of an issue with the CR2 adjustment and
> Satterthwaite degrees of freedom.
>
>
> > Relating to our cases, many of our moderators had imbalanced
> distributions
> > […]. Consequently, tests of particular moderators may be severely
> > underpowered.”
> > Of course, the first argument can be invalidated by the use of correlated
> > hierarchical effects models with RVE. However, I find the second argument
> > very relevant from my experience.
> >
>
> As Wolfgang noted, the question here is: "severely underpowered" relative
> to what alternative?
>
> In the social sciences, after all, we more often conduct meta-analyses with
> > relatively small study corpus (n<100 or n<50). In high-ranked journals in
> > this research field (e.g., Psychological Bulletin, Review of Educational
> > Research, Educational Research Review…) I very rarely find the use of
> RVE /
> > CRVE.
> >
> > I think this is changing (finally). Recent submissions to Psych Bulletin
> regularly use RVE/CRVE, but RER and ERR have been slower to shift practice.
>
>
> > In mentioned types of moderator analyses with small number of studies in
> > one category, I also often face the same problem that effects become
> > non-significant when using CRVE as soon as moderator levels are populated
> > with less than 10-15 studies. Joshi et al (2022) also talk about RVE
> being
> > (too) highly conversative in these cases.
>
>
> Joshi's comments about tests being too conservative here pertain to
> hypothesis tests involving multiple contrasts, such as testing the equality
> of effect sizes across a moderator with 3 or 4 categories (mu_1 = mu_2 =
> mu_3, etc.). For single-parameter tests and confidence intervals, CR2
> standard errors and Satterthwaite degrees of freedom are well calibrated
> unless the degrees of freedom are very small (df < 4, as suggested in
> Tipton, 2015).
>
>
> > I have also used cluster wild bootstrapping for significance testing of
> > individual effects in this case. However, the problem of missing SEs and
> > C.I.s as well as the high computation time arises here.
> >
> > Have you tried the latest version of wildmeta? From version 0.3.1
> (released in February), parallel processing is supported, which can help
> with computation time quite a bit. But again, this is really only relevant
> for hypothesis tests involving multiple contrasts.
>
>
> > Right now, I am again facing the problem of model selection for a
> > meta-analysis with about 50 studies and 500 ES (correlations). Since we
> are
> > dealing with ES within studies, I would choose a correlated hierarchical
> > effects model with CRVE, which also works very well for the main effects,
> > but again leads to said very large SEs for the moderators. As a pure CHE
> > model (which in my opinion still fits better than the pure HE model in
> the
> > above mentioned article by Cao et al) the SEs are of course somewhat more
> > moderate.
> > Do you have any tips or hints for an alternative?
> >
> Two things to consider:
> A. Have you tried group-mean centering the predictors? It could be a
> contextual effects issue that leads to discrepancies between model-based
> and robust SEs.
> B. If that doesn't resolve the issue, then it seems like the discrepancy
> could be driven by mis-specification of the working model (see my point #2
> above). If you group-mean center the predictors, you could include random
> slopes in the model to see if there is heterogeneity in the within-study
> slopes. Unmodeled random slopes could again lead to discrepancies between
> model-based and robust SEs.
>
> ****************************
> > Dr. Sebastian Röhl
> > Eberhard Karls Universität Tübingen
> > Institut für Erziehungswissenschaft
> > Tübingen School of Education (TüSE)
> > Wilhelmstraße 31 / Raum 302
> > 72074 Tübingen
> >
> > Telefon: +49 7071 29-75527
> > Fax: +49 7071 29-35309
> > E-Mail: sebastian.roehl using uni-tuebingen.de<mailto:
> > sebastian.roehl using uni-tuebingen.de>
> > Twitter: @sebastian_roehl  @ResTeacherEdu
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
> > To manage your subscription to this mailing list, go to:
> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
> To manage your subscription to this mailing list, go to:
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>

	[[alternative HTML version deleted]]