[R-meta] Calculation of p values in selmodel

Sun Mar 17 21:05:51 CET 2024

Thanks for the suggestion of a Bayesian approach, James. I want to avoid priors, if possible, and go as far as I can with the selmodel approaches, for now. And I don't want to move the p-value threshold around, since it's the studies with p>0.05 that are less likely to get published. The 3-parameter selection model, with one step at 0.025, works brilliantly in the simulations when there isn't too much publication bias, including, importantly, when there is none, where it works much better than the PEESE method. Of course, you don't know how much publication bias there is, so it's important to use a method that works across the possible range of none through lots, including 100% failure to publish non-significant effects. That's why it's so disappointing that the 3PSM doesn't work with no non-significant effects.

When I looked at the data I showed in my last message, you could get the impression that the problem is simply that selmodel needs at least one non-significant study estimate for each level of the factor Sex in the model. But it isn't so. There are plenty of sims where there are no non-significant estimates for the females and just one for the males. For example, one sim has 11 study estimate consisting of 5 significant females, 5 significant males, and one non-significant male (p=0.58). No problem. So maybe the error message is misleading. For about 5% of the simulations I get the warning message "Error when trying to invert Hessian", but it still produces adjusted point estimate for the fixed effects and tau2, so that's not the problem. The problem is the occasional sim (about 1 in 300, with the current simulation) where the error message "One or more intervals do not contain any observed p-values" is wrong, and where it then crashes out of the list processing.

Will

-----Original Message-----
From: R-sig-meta-analysis <r-sig-meta-analysis-bounces using r-project.org> On Behalf Of James Pustejovsky via R-sig-meta-analysis
Sent: Monday, March 18, 2024 5:47 AM
To: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-project.org>
Cc: James Pustejovsky <jepusto using gmail.com>
Subject: Re: [R-meta] Calculation of p values in selmodel

This is an issue with maximum likelihood estimation of the step function selection model generally (rather than a problem with the software implementation).

The step function model assumes that there are different selection probabilities for effect size estimates with p-values that fall into different intervals. For a 3-parameter model, the intervals are [0, .025] and (.025, 1], with the first interval fixed to have selection probability
1 and the second interval having selection probability lambda > 0 (an unknown parameter of the model). If there are no observed ES estimates in the first interval, then the ML estimate of lambda is infinite. If there are no observed ES estimates in the second interval, then the ML estimate of lambda is zero, outside of the parameter space.

In some of my work, I've implemented an ad hoc fix for the issue by moving the p-value threshold around so that there are at least three ES estimates in each interval. This isn't based on any principle in particular, although Jack Vevea once suggested to me that this might be the sort of thing an analyst might do just to get the model to converge.

A more principled way to fix the issue would be to use penalized likelihood or Bayesian methods with an informative prior on lambda. See the publipha package (https://cran.r-project.org/package=publipha) for one implementation.

James

On Sat, Mar 16, 2024 at 10:23 PM Will Hopkins via R-sig-meta-analysis < r-sig-meta-analysis using r-project.org> wrote:

> No-one has responded to this issue. It's now causing a problem in my 
> simulations when I am analyzing for publication bias arising from 
> deletion of 90% of nonsignificant study estimates and ending up with 
> small numbers
> (10-30) of included studies. See below (and attached as an 
> easier-to-read text file) for an example. Two of the 14 study 
> estimates (Row 8 and 9) were non-significant, but the original t value 
> (tOrig) would have made them significant in selmodel(…, type = "step", 
> steps = (0.025)). So I processed any observations with non-significant 
> p values and t>1.96 by replacing the standard error (YdelSE) with 
> Ydelta/1.95. The resulting new t vslues (tNew) are 1.95 for both those 
> observations, whereas all the other t values are unchanged. So they 
> should be non-significant in selmodel, right?  But I still get this error message:
>
> Error in selmodel.rma.uni(x, type = "step", steps = (0.025)) :
>
>   One or more intervals do not contain any observed p-values (use 
> 'verbose=TRUE' to see which).
>
> I must be doing something idiotic, but what? Help, please!
>
>
>
> Oh, and thanks again to Tobias Saueressig for his help with 
> list-processing of the objects created by rma, selmodel and confint. 
> My original for-loop approach fell over when the values of the Sim 
> variable were not consecutive integers (for example, when I had 
> generated the sims and then deleted any lacking non-significant study 
> estimates), but separate processing of the lists as suggested by 
> Tobias worked perfectly. It stops working when it crashes out with the 
> above error, but hopefully someone will solve that problem.
>
>
>
> Will
>
>
>
>                 Sim        StudID  Sex        SSize    Ydelta  YdelSE
> tOrig     tNew    pValue
>
>                 <dbl>    <dbl>    <fct>     <dbl>    <dbl>    <dbl>
> <dbl>    <dbl>    <dbl>
>
> 1             448        1             Female 10           3.72
> 0.684    5.44       5.44       0.000413
>
> 2             448        6             Female 10           3.08
> 0.901    3.42       3.42       0.00766
>
> 3             448        11           Female 10           4.49
> 0.926    4.85       4.85       0.000906
>
> 4             448        21           Female 28           4.95
> 0.777    6.37       6.37       0.000000808
>
> 5             448        26           Female 12           3.82
> 1.25       3.06       3.06       0.0109
>
> 6             448        31           Female 22           2.13
> 0.991    2.15       2.15       0.0433
>
> 7             448        36           Female 10           3.27
> 1.13       2.89       2.89       0.0177
>
> 8             448        10           Male     18           4.46
> 2.29       2.03       1.95       0.0578
>
> 9             448        14           Male     10           3.2
> 1.64       1.98       1.95       0.0795
>
> 10           448        17           Male     13           4.32
> 1.97       2.19       2.19       0.049
>
> 11           448        30           Male     10           1.16
> 0.467    2.48       2.48       0.0348
>
> 12           448        38           Male     10           3.61
> 1.24       2.91       2.91       0.0175
>
> 13           448        39           Male     10           2.49
> 0.828    3.01       3.01       0.0148
>
> 14           448        40           Male     28           1.92
> 0.602    3.19       3.19       0.0036
>
>
>
> *From:* Will Hopkins <willthekiwi using gmail.com>
> *Sent:* Friday, March 15, 2024 8:39 AM
> *To:* 'R Special Interest Group for Meta-Analysis' < 
> r-sig-meta-analysis using r-project.org>
> *Subject:* Calculation of p values in selmodel
>
>
>
> According to your documentation, Wolfgang, the selection models in 
> selmodel are based on the p values of the study estimates, but these 
> are computed by assuming the study estimate divided by its standard 
> error has a normal distribution, whereas significance in the original 
> studies of mean effects of continuous variables would have been based on a t distribution.
> It could make a difference when sample sizes in the original studies 
> are
> ~10 or so, because some originally non-significant effects would be 
> treated as significant by selmodel. For example, with a sample size of 
> 10, a mean change has 9 degrees of freedom, so a p value of 0.080 
> (i.e., non-significant, p>0.05) in the original study will be given a 
> p value of
> 0.049 (i.e., significant, p<0.05) by selmodel. Is this issue likely to 
> make any real difference to the performance of selmodel with 
> meta-analyses of realistic small-sample studies? I guess that only a 
> small (negligible?) proportion of p values will fall between 0.05 and 
> 0.08, in the worst-case scenario of a true effect close to the 
> critical value and with only 9 degrees of freedom for the SE. If it is 
> an issue, you could include the SE's degrees of freedom in the rma object that gets passed to selmodel.
>
>
>
> Will
> _______________________________________________
> R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org 
> To manage your subscription to this mailing list, go to:
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org To manage your subscription to this mailing list, go to:
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis