[R] confidence intervals around p-values

Thu Sep 9 18:29:49 CEST 2010

One other case where a confidence interval on a p-value may make sense is permutation (or other resampling) tests.  The population parameter p-value would be the p-value that would be obtained from the distribution of all possible permutations, but in practice we just sample from that population and estimate a p-value.  The confidence interval would then be based on the number of sample permutations and could give an idea if that number was big enough.  If the full confidence interval is less than alpha then you can be confident that the "true" p-value would give significance, if it is completely above alpha then it is not significant.  The real problem comes when the confidence interval includes alpha, that would indicate that B (the number of resamples/permutations) was not large enough.  Be careful, doing a small number of permutations then deciding to do more based on the CI would likely introduce bias (how much is another question).

The nice thing is that in this case the p-value is a simple proportion and the confidence interval can be computed using binom.test.

But, I fully agree that in most cases the idea of a CI for a p-value is not meaningful, you need to have some case where your p-value is an estimate of a "population parameter p-value" that has some meaning.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ted Harding
> Sent: Thursday, September 09, 2010 8:25 AM
> To: r-help at r-project.org
> Cc: Fernando Marmolejo Ramos
> Subject: Re: [R] confidence intervals around p-values
> 
> On 09-Sep-10 13:21:07, Duncan Murdoch wrote:
> >   On 09/09/2010 6:44 AM, Fernando Marmolejo Ramos wrote:
> >> Dear all
> >>
> >> I wonder if anyone has heard of confidence intervals around
> >> p-values...
> >
> > That doesn't really make sense.  p-values are statistics, not
> > parameters. You would compute a confidence interval around a
> > population mean because that's a parameter, but you wouldn't
> > compute a confidence interval around the sample mean: you've
> > observed it exactly.
> >
> > Duncan Murdoch
> 
> Duncan has succinctly stated the essential point in the standard
> interpretation. The P-value is calculated from the sample in
> hand, a definite null hypothesis, and the distribution of the
> test statistic given the null hyptohesis, so (given all of these)
> there is no scope for any other answer.
> 
> However, there are circumstances in which the notion of "confidence
> interval for a P-value" makes some sense. One such might be the
> Mann-Whitney test for identity of distribution of two samples
> of continuous variables, where (because of discretisation of the
> values when they were recorded) there are ties.
> 
> Then you know in theory that the "underlying values" are all
> different, but because you don't know where these lie in the
> discretisation intervals you don't know which way a tie may
> split. So it would make sense to simulate by splitting ties
> at random (e.g. uniformly distribute each "1.5" value over the
> interval (1.5,1.6) or (1.45,1.55)).
> 
> For each such simulated tie-broken sample, calculate the P-value.
> Then you get a distribution of exact P-values calculated from
> samples without ties which are consistent with the recorded data.
> The central 95% of this distribution could be interpreted as a 95%
> coinfidence interval for the true P-value.
> 
> To bring this closer to on-topic, here is an example in R
> (rounding to intervals of 0.2):
> 
>   set.seed(51324)
>   X <- sort(2*round(0.5*rnorm(12),1))
>   Y <- sort(2*round(0.5*rnorm(12)+0.25,1))
>   rbind(X,Y)
> #   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
> # X -1.8 -1.2 -0.8 -0.6  0.0    0  0.2  0.2  1.2   1.8     2   2.2
> # Y -1.2 -0.4 -0.2  0.4  0.4    1  1.0  1.0  1.2   1.8     2   2.6
> # So several ties (-1.2,1.2,1.8,2.0), as well as 0.0, 0.4, 1.0
> # which don't matter.
> wilcox.test(X,Y,alternative="less",exact=TRUE,correct=FALSE)
> # data:  X and Y   W = 54, p-value = 0.1488
> 
>   Ps <- numeric(1000)
>   for(i in (1:1000)){
>     Xr <- (X-0.1) + 0.2*runif(10)
>     Yr <- (Y-0.1) + 0.2*runif(10)
>     Ps[i] <- wilcox.test(Xr,Yr,alternative="less",
>              exact=TRUE,correct=FALSE)$p.value
>   }
>   hist(Ps)
>   table(round(Ps,4))
>   # 0.1328 0.1457 0.1593 0.1737 0.1888
>   #     81    267    336    226     90
> 
> So this gives you a picture of the uncertainty in the P-value
> (0.1488, calculated from the rounded data) relative to what it
> really should have been (if calculated from unrounded data).
> Since each possible "true" (tie-broken) sample can be viewed
> as a hypothesis about unobserved "truth", it does make a certain
> sense to view these results as a kind of confidence distribution
> for the P-value you should have got. However, this is more of a
> Bayesian argument, since the above calculation has assigned
> equal prior probability to the tie-breaks!
> 
> One could also, I suppose, consider the question of what
> distribution of P-values might arise if the/an alternative
> huypothesis were true, and where in this does the P-value that
> we actually got lie? But these are murkier waters ...
> 
> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 09-Sep-10                                       Time: 15:24:29
> ------------------------------ XFMail ------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.