[R] p-values < 2.2e-16 not reported
Will Eagle
will.eagle at gmx.net
Thu May 20 18:16:09 CEST 2010
> On 2010-05-20 08:52, Shi, Tao wrote:
>> Will,
>>
>> I'm wondering if you have any
>> insights after looking at the cor.test source code. It seems to be fine to me, as the p value is either calculated by "your first method" or a
>> .C code.
>>
>> ...Tao
Dear Tao,
I think the described problem of p-values < 2.2e-16 not being reported
(or set to zero, respectively) applies at least to the cor.test(,
method="pearson", alternative=c("greater","two.sided")) function, but
also other statistical test functions. You find the critical piece of
code with methods(cor.test) and getAnywhere(cor.test.default) in the
following lines:
...
p <- pt(STATISTIC, df) # line 27
...
PVAL <- switch(alternative, # lines 144-145, reformatted
less = p,
greater = 1 - p,
two.sided = 2 * min(p, 1 - p))
This means that if you calculate a pearson correlation (which is the
default) with option alternativ="two.sided" (which is the default) or
alternative="greater", an additive operation of the 2nd kind is
performed, which causes that a very small p-value is set to zero. This
problem should not apply to alternative="less" since the p-value has
been generated without an additive operation with the pt() function.
What I still find confusing is that:
> pt(10,100,lower=FALSE)
[1] 4.950844e-17
> pt(10,100,lower=TRUE) # should give 1 - 4.950844e-17
[1] 1
> 1-pt(10,100,lower=TRUE) # should give 4.950844e-17
[1] 0
So it seems that this distribution function(s) only give the exact value
if the option lower.tail=FALSE is used. Therefore, also cor.test(...,
method="pearson", alternative="less") might not report p-values <
2.2e-16. I find this behavior of R very unintuitive.
The .C code routines are by the way only used for methods=c("kendall",
"spearman") as far as I have seen.
Best, Will
More information about the R-help
mailing list