[R-meta] Unrealistic confidence limits for heterogeneity?

Wed Mar 29 01:51:26 CEST 2023

Uncertainty in heterogeneity in a meta-analysis is a big deal, because it
addresses the issue of the trustworthiness of the estimate of real
differences in the effect between settings. The uncertainty in the
heterogeneity contributes to the prediction interval, which represents the
uncertainty in the effect in any given real setting. The prediction interval
is more important than the confidence interval for the mean effect. I know
that R is the most popular package for meta-analysis, so this post is about
what I think is unrealistic confidence limits for the estimate of
heterogeneity provided by the metafor package. But I am not a user of R, so
I might be misreading the documentation at
https://cran.r-project.org/web/packages/metafor/metafor.pdf. I've
Google-Scholared "Viechtbauer unrealistic meta-analysis", and I also
searched this list's archive, but the hits didn't look like they addressed
this issue.  Please put me on the right track, if you think the following
argument is flawed. 

I use the mixed model in SAS to do meta-analyses. I routinely allow negative
variance for the random effect(s) representing heterogeneity, because I have
found that allowing only positive variance results in unrealistically high
upper confidence limits when the effective degrees of freedom of the
heterogeneity (DF = 2*Z^2, where Z = estimate/SE) is less than 10 or so.
(Trust me, they are unrealistic-I take into account smallest important and
other magnitude thresholds when I evaluate effects and standard deviations.)
The metafor documentation is a bit hard for me to follow, but it looks like
negative variance is not an option. When I last looked at R a few years ago,
it wasn't an option in the mixed models, either-is that still the case? I
understand that the methods in metafor for estimating confidence limits for
the heterogeneity give good coverage in terms of including the true value in
simulations, but it seems to me that it is doing it at the expense of
unrealistic wide coverage of the intervals, and sometimes giving zero
variance, when really it should be negative.

Of course, you could argue that allowing negative variance gives an
unrealistic (negative) lower confidence limit and even sometimes an
unrealistic (negative) point estimate, because you can't have negative
variance. Well, actually, you CAN have negative variance that has a
meaningful real interpretation. If the observed between study variance is
less than what you would expect to arise purely from the sampling variation
in each study, then the heterogeneity is negative. That could happen simply
because of sampling variation, when the study-estimates have sufficiently
large standard errors and the true variance is positive and sufficiently
close to zero. It could also happen if the study-estimates are not
independent, so allowing for negative variance accommodates these
possibilities. It can also happen when you have multiple effects from some
studies, and you treat those multiple effects as if they come from separate
studies, especially if the multiple effects come from the same subjects. The
correct way to deal with multiple effects is to include an extra random
effect for heterogeneity within studies, but again, if the within-study
effects come from the same subjects, you can expect negative variance for
the within-study heterogeneity. I presume that combining the within- and
between-study variances gives a realistic estimate of between-study
heterogeneity, even when the within-study variance is negative.

I know there's an issue of the coverage of the interval when you allow
negative variance. In simulations I have done, 10 studies in the
meta-analysis gives intervals that are too narrow (90% intervals cover the
true value 82-85% of the time, depending on the standard errors in each
study and the true heterogeneity). When I used the t distribution rather
than the z distribution to derive the intervals (with the effective DF given
above), the coverage was closer to 90%, but for some combinations of true
heterogeneity, study-effect standard errors and number of studies, the
coverage sometimes exceeded 90% (which is not a bad thing-it's better for
the interval to be too wide than too narrow). I'll be using the t
distribution in future, unless someone on this list can point me to better
confidence limits when allowing for negative variance.

One more point... This posting arose indirectly from someone bringing my
attention to a recent article in which Bayesian priors were used to offset
apparent underestimation of heterogeneity. That prompted me to revisit my
simulations to look at bias and coverage. The original article promoting the
method is here: https://psyarxiv.com/7tbrm/. I suspect that this method is
akin to using a sledgehammer to crack a nut (which, apart from being
overkill, also damages the kernel). In my simulations allowing negative
variance, there is no underestimation in the estimate of heterogeneity as a
variance, regardless of standard errors, true heterogeneity, and number of
studies (even as low as 5). There IS underestimation when you take the
square root to express the heterogeneity as a standard deviation, but that
appears to be just the well-known small-sample bias in standard deviations.
In my simulations, the bias appears to be corrected practically perfectly
with the Gurland and Tripathi factor of 1+1/(4DF), where DF is the effective
degrees of freedom of the variance.

Will Hopkins
https://sportsci.org
https://sportsci.org/will