[R] Depletion of small p values upon iterative testing of identical normal distributions
Thomas Lumley
tlumley at u.washington.edu
Mon Sep 20 17:14:30 CEST 2010
On Mon, 20 Sep 2010, Duncan Murdoch wrote:
> On 20/09/2010 9:54 AM, A wrote:
>> Dear all,
>>
>> I'm performing a t-test on two normal distributions with identical mean&
>> standard deviation, and repeating this tests a very large number of times
>> to
>> describe an representative p value distribution in a null case. As a part
>> of
>> this, the program bins these values in 10 evenly distributed bins between 0
>> and 1 and reports the number of observations in each bin. What I have
>> noticed is that even after 500,000 replications the number in my lowest bin
>> is consistently ~5% smaller than the number in all the other bins, which
>> are
>> similar within about 1% of each other. Is there any reason, perhaps to do
>> with random number generation in R or the nature of the normal distribution
>> simulated by the rnorm function that could explain this depletion?
>
> No, equal sized bins should expect equal numbers of entries. But your code
> may have errors in it.
>
It's actually more interesting than that.
Here's the code in my preferred format for small simulations
one.test <- function (m = -0.065, s = 0.0837)
{
x <- rnorm(6, m, s)
y <- rnorm(6, m, s)
t.test(x, y)$p.value
}
pvals <- replicate(1e5, one.test())
Now, binning could be done wrong, and in any case isn't a good way to compare distributions, so we try
qqplot(pvals, ppoints(1e5))
and, since the issue is with small p-values
qqplot(log10(pvals), log10(ppoints(1e5)))
and we see that there are in fact fewer small p-values than there should be, starting at about 10^(-2.5).
After thinking about this for a while, we realize that the t.test() function shouldn't be expected to give exactly uniform p-values -- it's not as if it is doing an exact test, after all.
We smugly repeat the exercise with
one.ev.test <- function (m = -0.065, s = 0.0837)
{
x <- rnorm(6, m, s)
y <- rnorm(6, m, s)
t.test(x, y, var.equal=TRUE)$p.value
}
ev.pvals <- replicate(1e5, one.test())
and the problem is solved.
-thomas
Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle
More information about the R-help
mailing list