[R-meta] Calculation of p values in selmodel

Mon Mar 18 12:36:40 CET 2024

This thread intermingles several issues, so let me add my 2 cents to each one:

This thread started out with Will's criticism that the selection models implemented in selmodel() compute p-values based on a standard Wald-type test of zi = yi / sqrt(vi). This issue was also recently raised in this post:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2024-March/005098.html

to which I and James provided the following replies:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2024-March/005099.html
https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2024-March/005100.html

Let me reiterate: p-values are not just computed for the observed effect sizes, but also as part of an additional step that involves integrating over the weighted density of the effect size estimates. The weights are a function of the p-values, which in turn are a function of the integrand. So, if one wants to use p-values that are not computed based on a standard Wald-type test, one doesn't just have to change how the p-values of the observed effect sizes are computed, but also how the p-values in this integration step are computed. And this is not automatically solved by supplying degrees of freedom and then using a t-distribution. What if a study author has used a Mann–Whitney U test instead of a t-test? What if they used a permutation test? The p-values are then not coming from a t-test, so those dfs are irrelevant. For other effect size measures (e.g., odds ratios), the authors may have used a Pearson chi^2 test (with or without a continuity correction) or maybe the odds ratio came from a logistic regression model with additional predictors. If one wants to use the p-values as reported, then the p-values in the integration step would have to be computed in the same manner. Implementing this for all these possibilities is essentially impossible.

Also, if we want to be so precise, then one should not assume that yi is normally distributed, but use the exact distribution for the density and the integration step. So for d it would be a scaled and shifted non-central t-distribution, for r it would be the exact distribution of a Pearson correlation coefficient, for raw proportions it would be a binomial distribution, and so on and so on -- this will be different for every single effect size measure out there. Implementing this again is essentially impossible.

Will also asked: "Is this issue likely to make any real difference to the performance of selmodel with meta-analyses of realistic small-sample studies?" An answer to this depends on many factors (what effect size measure? how many studies? what are their sample sizes? what method was used to compute the actual p-values (could be different for the different studies)? which selection model is being used? what is the true selection function?) and as far as I know, there are no simulation studies addressing this question in the first place, so the answer is: Nobody knows.

Based on this, I did a small simulation study for standardized mean differences and found no noteworthy difference between using Wald-type tests instead of the 'proper' t-tests with k=50 studies and sample sizes between 20 to 60. But there are many factors one could manipulate here so this might not generalize.

Also, as I also mentioned before, this concern is probably a relatively minor issue in comparison to the fact that selection models are really rough approximations to a much more complex data generating mechanism anyway. The moment some study authors are deviating in their behavior from the assumed selection model (which is guaranteed to be case), then this is likely to have a much greater impact on the performance of selection models.

However, based on this discussion, I have now implemented the possibility to supply p-values directly to selmodel() via the *undocumented* 'pval' argument. To illustrate with Will's dataset:

library(metafor)

dat <- structure(list(Sim = c(448L, 448L, 448L, 448L, 448L, 448L, 448L, 448L,
448L, 448L, 448L, 448L, 448L, 448L), StudID = c(1L, 6L, 11L, 21L, 26L, 31L,
36L, 10L, 14L, 17L, 30L, 38L, 39L, 40L), Sex = c("Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male", "Male", "Male"), SSize = c(10L, 10L, 10L, 28L, 12L, 22L, 10L, 18L,
10L, 13L, 10L, 10L, 10L, 28L), Ydelta = c(3.72, 3.08, 4.49, 4.95, 3.82, 2.13,
3.27, 4.46, 3.2, 4.32, 1.16, 3.61, 2.49, 1.92), YdelSE = c(0.684, 0.901,
0.926, 0.777, 1.25, 0.991, 1.13, 2.29, 1.64, 1.97, 0.467, 1.24, 0.828, 0.602),
tOrig = c(5.44, 3.42, 4.85, 6.37, 3.06, 2.15, 2.89, 2.03, 1.98, 2.19, 2.48,
2.91, 3.01, 3.19), tNew = c(5.44, 3.42, 4.85, 6.37, 3.06, 2.15, 2.89, 1.95,
1.95, 2.19, 2.48, 2.91, 3.01, 3.19), pValue = c(0.000413, 0.00766, 0.000906,
8.08e-07, 0.0109, 0.0433, 0.0177, 0.0578, 0.0795, 0.049, 0.0348, 0.0175,
0.0148, 0.0036)), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14"))

res <- rma(Ydelta, sei=YdelSE, data=dat, method="ML")
res

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.025, 1))
sel

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.025, 1), pval=dat$pValue)
sel

To be clear, the p-values in the integration step are still computed based on the Wald-type test and this still assumes normally distributed effect size estimates. The impact of doing this are unclear and this might be horribly wrong. Use at your own discretion.

The other issue that was raised is empty intervals in step-function selection models. This is really a separate issue (even if it arose in one particular example due to different methods being used in the studies themselves and the selection model for computing the p-values). James nicely explained why empty intervals cause problems for ML estimation. In connection with this, I will illustrate this with an example (and some features of the selmodel() function):

library(metafor)

dat <- dat.hackshaw1998

res <- rma(yi, vi, data=dat, method="ML")
res

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.025, .05, 1), verbose=TRUE)

Another undocumented argument that was already implemented is the possibility to skip the interval check:

sel <- selmodel(res, type="stepfun", alternative="greater",
                steps=c(.025, .05, 1), skipintcheck=TRUE)
sel

As one can see, the estimate of delta[2] drifts to 0 (which then causes issues with inverting the Hessian, which is yet another separate issue).

To illustrate the opposite:

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.001, 1))
sel

Now delta[2] wants to drift to infinity, which it can't due to 100 being the default limit for the delta values (bounds are imposed for numerical stability issues). This can be adjusted:

sel <- selmodel(res, type="stepfun", alternative="greater", steps=c(.001, 1),
                control=list(delta.max=Inf))
sel

It doesn't quite get to infinity, but close enough I would say.

I had considered the option to automatically collapse empty intervals (or at least, optionally), but decided not to do this. I leave it up to the user to implement such steps if they like.

Best,
Wolfgang

> -----Original Message-----
> From: R-sig-meta-analysis <r-sig-meta-analysis-bounces using r-project.org> On Behalf
> Of Will Hopkins via R-sig-meta-analysis
> Sent: Sunday, March 17, 2024 21:06
> To: 'R Special Interest Group for Meta-Analysis' <r-sig-meta-analysis using r-
> project.org>
> Cc: Will Hopkins <willthekiwi using gmail.com>
> Subject: Re: [R-meta] Calculation of p values in selmodel
>
> Thanks for the suggestion of a Bayesian approach, James. I want to avoid priors,
> if possible, and go as far as I can with the selmodel approaches, for now. And I
> don't want to move the p-value threshold around, since it's the studies with
> p>0.05 that are less likely to get published. The 3-parameter selection model,
> with one step at 0.025, works brilliantly in the simulations when there isn't
> too much publication bias, including, importantly, when there is none, where it
> works much better than the PEESE method. Of course, you don't know how much
> publication bias there is, so it's important to use a method that works across
> the possible range of none through lots, including 100% failure to publish non-
> significant effects. That's why it's so disappointing that the 3PSM doesn't work
> with no non-significant effects.
>
> When I looked at the data I showed in my last message, you could get the
> impression that the problem is simply that selmodel needs at least one non-
> significant study estimate for each level of the factor Sex in the model. But it
> isn't so. There are plenty of sims where there are no non-significant estimates
> for the females and just one for the males. For example, one sim has 11 study
> estimate consisting of 5 significant females, 5 significant males, and one non-
> significant male (p=0.58). No problem. So maybe the error message is misleading.
> For about 5% of the simulations I get the warning message "Error when trying to
> invert Hessian", but it still produces adjusted point estimate for the fixed
> effects and tau2, so that's not the problem. The problem is the occasional sim
> (about 1 in 300, with the current simulation) where the error message "One or
> more intervals do not contain any observed p-values" is wrong, and where it then
> crashes out of the list processing.
>
> Will
>
> -----Original Message-----
> From: R-sig-meta-analysis <r-sig-meta-analysis-bounces using r-project.org> On Behalf
> Of James Pustejovsky via R-sig-meta-analysis
> Sent: Monday, March 18, 2024 5:47 AM
> To: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-
> project.org>
> Cc: James Pustejovsky <jepusto using gmail.com>
> Subject: Re: [R-meta] Calculation of p values in selmodel
>
> This is an issue with maximum likelihood estimation of the step function
> selection model generally (rather than a problem with the software
> implementation).
>
> The step function model assumes that there are different selection probabilities
> for effect size estimates with p-values that fall into different intervals. For
> a 3-parameter model, the intervals are [0, .025] and (.025, 1], with the first
> interval fixed to have selection probability
> 1 and the second interval having selection probability lambda > 0 (an unknown
> parameter of the model). If there are no observed ES estimates in the first
> interval, then the ML estimate of lambda is infinite. If there are no observed
> ES estimates in the second interval, then the ML estimate of lambda is zero,
> outside of the parameter space.
>
> In some of my work, I've implemented an ad hoc fix for the issue by moving the
> p-value threshold around so that there are at least three ES estimates in each
> interval. This isn't based on any principle in particular, although Jack Vevea
> once suggested to me that this might be the sort of thing an analyst might do
> just to get the model to converge.
>
> A more principled way to fix the issue would be to use penalized likelihood or
> Bayesian methods with an informative prior on lambda. See the publipha package
> (https://cran.r-project.org/package=publipha) for one implementation.
>
> James
>
> On Sat, Mar 16, 2024 at 10:23 PM Will Hopkins via R-sig-meta-analysis < r-sig-
> meta-analysis using r-project.org> wrote:
>
> > No-one has responded to this issue. It's now causing a problem in my
> > simulations when I am analyzing for publication bias arising from
> > deletion of 90% of nonsignificant study estimates and ending up with
> > small numbers
> > (10-30) of included studies. See below (and attached as an
> > easier-to-read text file) for an example. Two of the 14 study
> > estimates (Row 8 and 9) were non-significant, but the original t value
> > (tOrig) would have made them significant in selmodel(…, type = "step",
> > steps = (0.025)). So I processed any observations with non-significant
> > p values and t>1.96 by replacing the standard error (YdelSE) with
> > Ydelta/1.95. The resulting new t vslues (tNew) are 1.95 for both those
> > observations, whereas all the other t values are unchanged. So they
> > should be non-significant in selmodel, right?  But I still get this error
> message:
> >
> > Error in selmodel.rma.uni(x, type = "step", steps = (0.025)) :
> >
> >   One or more intervals do not contain any observed p-values (use
> > 'verbose=TRUE' to see which).
> >
> > I must be doing something idiotic, but what? Help, please!
> >
> > Oh, and thanks again to Tobias Saueressig for his help with
> > list-processing of the objects created by rma, selmodel and confint.
> > My original for-loop approach fell over when the values of the Sim
> > variable were not consecutive integers (for example, when I had
> > generated the sims and then deleted any lacking non-significant study
> > estimates), but separate processing of the lists as suggested by
> > Tobias worked perfectly. It stops working when it crashes out with the
> > above error, but hopefully someone will solve that problem.
> >
> > Will
> >
> > *From:* Will Hopkins <willthekiwi using gmail.com>
> > *Sent:* Friday, March 15, 2024 8:39 AM
> > *To:* 'R Special Interest Group for Meta-Analysis' <
> > r-sig-meta-analysis using r-project.org>
> > *Subject:* Calculation of p values in selmodel
> >
> > According to your documentation, Wolfgang, the selection models in
> > selmodel are based on the p values of the study estimates, but these
> > are computed by assuming the study estimate divided by its standard
> > error has a normal distribution, whereas significance in the original
> > studies of mean effects of continuous variables would have been based on a t
> distribution.
> > It could make a difference when sample sizes in the original studies
> > are
> > ~10 or so, because some originally non-significant effects would be
> > treated as significant by selmodel. For example, with a sample size of
> > 10, a mean change has 9 degrees of freedom, so a p value of 0.080
> > (i.e., non-significant, p>0.05) in the original study will be given a
> > p value of
> > 0.049 (i.e., significant, p<0.05) by selmodel. Is this issue likely to
> > make any real difference to the performance of selmodel with
> > meta-analyses of realistic small-sample studies? I guess that only a
> > small (negligible?) proportion of p values will fall between 0.05 and
> > 0.08, in the worst-case scenario of a true effect close to the
> > critical value and with only 9 degrees of freedom for the SE. If it is
> > an issue, you could include the SE's degrees of freedom in the rma object that
> gets passed to selmodel.
> >
> > Will