[R-meta] Wald_test - is it powerful enough?

James Pustejovsky jepu@to @end|ng |rom gm@||@com
Thu Sep 2 16:31:06 CEST 2021


Catia,

I'll add a few observations to Wolfgang's points. I agree that it doesn't
really make sense to compare the model-based QM test to the robust HTZ test
because they don't necessarily have the same Type I error rates. As
Wolfgang noted, the QM test will only maintain Type I error if you've
correctly modeled the correlations and random effects structure (which
seems implausible given that r = 0.6 is a very rough, arbitrary
assumption). A further issue is that the QM test also relies on
large-sample approximation (because it uses a chi-squared reference
distribution) and so requires a sufficiently large number of studies to
provide calibrated Type I error. With a small number of studies, it will
tend to have overly generous Type I error (yielding p-values that are
smaller than they should be). So two strikes against it.

The cluster-robust HTZ test provides a level of insurance against model
mis-specification. It also uses small-sample adjustments so that it should
properly control Type I error rates even when based on a fairly small
number of studies. However, it does still entail a degree of approximation
and will not necessarily have *exactly* calibrated Type I error rates. In
Tipton & Pustejovsky (2015, cited in Wolfgang's response), we found that
the HTZ test properly controls Type I error rates, meaning that error rates
were always at or below the nominal level. But we also observed that HTZ
can be conservative, in the sense that it sometimes has Type I error rates
substantially below the nominal level (such as .01 when alpha = .05). This
suggests that the test can sometimes have very limited power. It seems
you've identified one such situation.

As Wolfgang noted, the denominator degrees of freedom of the robust HTZ
test are very small here, which indicates a scenario where the HTZ might
have problems. For this particular example, I expect that it is because
effect sizes of type "covert" occur only in a single study and effects of
type "overt" occur only in a few:

library(dplyr)
dat %>%
  group_by(deltype) %>%
  summarise(
    studies = n_distinct(study),
    effects = n()
  )

# A tibble: 3 x 3
  deltype studies effects
  <chr>     <int>   <int>
1 covert        1       9
2 general      17      78
3 overt         3      13

This is a situation where RVE is not going to work well (if at all) because
RVE is based on only between-study variation in effect sizes. Another way
to check this is to look at the Satterthwaite degrees of freedom of the
individual coefficients you are testing against zero:

conf_int(res, vcov = "CR2")

            Coef Estimate     SE  d.f. Lower 95% CI Upper 95% CI
1  deltypecovert   -0.290 0.0823  4.20       -0.515      -0.0658
2 deltypegeneral    0.416 0.0997 14.76        0.203       0.6288
3   deltypeovert    0.160 0.0860  2.67       -0.134       0.4539

As you can see, the first and the third coefficients have very few degrees
of freedom, so the uncertainty around them will be less well quantified.

In situations like this, I think it is advisable to limit the tests to
coefficients for which at least some minimum number of studies have effect
size estimates (e.g., at least 4 studies). Applying that rule here would
mean limiting the test to only deltype = "general":

Wald_test(res, constraints = constrain_zero(2), vcov = "CR2")

test Fstat df_num df_denom  p_val sig
  HTZ  17.4      1     14.8 <0.001 ***

or equivalently:
coef_test(res, vcov = "CR2", coefs = 2)

           Coef. Estimate     SE t-stat d.f. p-val (Satt) Sig.
1 deltypegeneral    0.416 0.0997   4.17 14.8       <0.001  ***

James


On Thu, Sep 2, 2021 at 5:07 AM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:

> Dear Cátia,
>
> A comparison of power is only really appropriate if the two tests would
> have the same Type I error rate. I can always create a test that
> outperforms all other tests in terms of power by *always* rejecting, but
> then my test also has a 100% Type I error rate, so it is useless.
>
> So, whether the cluster-robust Wald test (i.e, Wald_test()) or the
> standard Wald-type Q-M test differ in power is a futile question unless we
> know that both tests control the Type I error rate. This is impossible to
> say in general - it depends on many factors.
>
> In this example, you are using an approximate V matrix and fitting a
> multilevel model (using the multivariate parameterization). That might be a
> reasonable working model, although the V matrix is just a very rough
> approximation (one would have to look at the details of all articles to see
> what kind of dependencies there are between the estimates within studies)
> and r=0.6 might or might not be reasonable.
>
> So using cluster-robust inference is a sensible further step as an
> additional 'safeguard', although there are 'only' 17 studies.
> Cluster-robust inference methods work asymptotically, so as the number of
> studies goes to infinity. How 'close to infinity' we have to be before we
> can trust the cluster-robust inferences is another difficult question that
> is impossible to answer in general. These article should provide some
> discussions around this:
>
> Tanner-Smith, E. E., & Tipton, E. (2014). Robust variance estimation with
> dependent effect sizes: Practical considerations including a software
> tutorial in Stata and SPSS. Research Synthesis Methods, 5(1), 13-30.
> https://doi.org/10.1002/jrsm.1091
>
> Tipton, E., & Pustejovsky, J. E. (2015). Small-sample adjustments for
> tests of moderators and model fit using robust variance estimation in
> meta-regression. Journal of Educational and Behavioral Statistics, 40(6),
> 604-634. https://doi.org/10.3102/1076998615606099
>
> Tipton, E. (2015). Small sample adjustments for robust variance estimation
> with meta-regression. Psychological Methods, 20(3), 375-393.
> https://doi.org/10.1037/met0000011
>
> Tanner-Smith, E. E., Tipton, E., & Polanin, J. R. (2016). Handling complex
> meta-analytic data structures using robust variance estimates: A tutorial
> in R. Journal of Developmental and Life-Course Criminology, 2(1), 85-112.
> https://doi.org/10.1007/s40865-016-0026-5
>
> Here, the cluster-robust Wald-test makes use of a small-sample correction
> that should improve its performance when the number of studies is small. I
> assume though that also with this correction, there are limits to how well
> the test works when the number of studies gets really low. James or
> Elizabeth might be in a better position to comment on this.
>
> An interesting question is whether the degree of discrepancy between the
> standard and the cluster-robust Wald-test could be used as a rough measure
> to what extent the working model is reasonable, and if so, how to quantify
> the degree of the discrepancy. Despite the difference in p-values, the size
> of the test statistics are actually quite large for both tests. It's just
> that the estimated denominator degrees of freedom (which I believe is based
> on a Satterthwaite approximation, which is also an asymptotic method) for
> the F-test (1.08) are very small, so that even with F=40.9 (and df=3 in the
> numerator), the test ends up being not significant (p=0.0998) -- but that
> just narrowly misses being a borderline trend approaching the brink of
> statistical significance ... :/
>
> I personally would say that the tests are actually not *that* discrepant,
> although I have a relatively high tolerance for discrepancies when it comes
> to such sensitivity analyses (I just know how much fudging is typically
> involved when it comes to things like the extraction / calculation of the
> effect size estimates themselves, so that discussions around these more
> subtle statistical details - which are definitely fun and help me
> procrastinate - kind of miss the elephant in the room).
>
> Best,
> Wolfgang
>
> >-----Original Message-----
> >From: R-sig-meta-analysis [mailto:
> r-sig-meta-analysis-bounces using r-project.org] On
> >Behalf Of Cátia Ferreira De Oliveira
> >Sent: Thursday, 02 September, 2021 2:23
> >To: R meta
> >Subject: [R-meta] Wald_test - is it powerful enough?
> >
> >Hello,
> >
> >I hope you are well.
> >Is the Wald_test a lot less powerful than the QM test? I ask this because
> >in the example below the QM test is significant but the Wald test is not,
> >shouldn't they be equivalent?
> >If it is indeed the case that the Wald_test is not powerful enough to
> >detect a difference, is there a good equivalent test more powerful than
> the
> >Wald test that can be used alongside the robumeta package?
> >
> >Best wishes,
> >
> >Catia
> >
> >*dat <- dat.assink2016*
> >*V <- impute_covariance_matrix(dat$vi, cluster=dat$study, r=0.6)*
> >
> >*# fit multivariate model with delinquency type as moderator*
> >
> >*res <- rma.mv(yi, V, mods = ~ deltype-1, random = ~
> >factor(esid) | study, data=dat)*
> >*res*
> >
> >*Multivariate Meta-Analysis Model (k = 100; method: REML)*
> >
> >*Variance Components:*
> >
> >*outer factor: study        (nlvls = 17)*
> >*inner factor: factor(esid) (nlvls = 22)*
> >
> >*estim    sqrt  fixed*
> >*tau^2      0.2150  0.4637     no*
> >*rho        0.3990             no*
> >
> >*Test for Residual Heterogeneity:*
> >*QE(df = 97) = 639.0911, p-val < .0001*
> >
> >*Test of Moderators (coefficients 1:3):*
> >*QM(df = 3) = 28.0468, p-val < .0001*
> >
> >*Model Results:*
> >
> >*                estimate      se     zval    pval    ci.lb   ci.ub
> >*deltypecovert    -0.2902  0.2083  -1.3932  0.1635  -0.6984  0.1180
> >*deltypegeneral    0.4160  0.0975   4.2688  <.0001   0.2250  0.6070***
> >*deltypeovert      0.1599  0.1605   0.9963  0.3191  -0.1546  0.4743
> >
> >*---*
> >*Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1*
> >
> >*Wald_test(res, constraints=constrain_zero(1:3), vcov="CR2",
> >cluster=dat$study)*
> >
> >*test Fstat df_num df_denom  p_val sig*
> >*  HTZ  40.9      3     1.08 0.0998   .*
> >
> >Thank you,
> >
> >Catia
> _______________________________________________
> R-sig-meta-analysis mailing list
> R-sig-meta-analysis using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>

	[[alternative HTML version deleted]]



More information about the R-sig-meta-analysis mailing list