[Rd] Spearman's rank correlation test
Petr Savicky
savicky at cs.cas.cz
Thu Feb 12 13:33:33 CET 2009
Hi All:
help(cor.test) claims
For Spearman's test, p-values are computed using algorithm AS 89.
Algorithm AS 89 was introduced by the paper
D. J. Best & D. E. Roberts (1975), Algorithm AS 89: The Upper Tail
Probabilities of Spearman's rho. Applied Statistics, Vol. 24, No. 3, 377-379.
Table 1(a) in this paper presents maximum absolute error |\Delta_m|, of the
approximation for all possible values of the statistic S for samples sizes
n = 7, 9, 11, 13. The presented errors are
n |\Delta_m|
7 0.0046
9 0.0011
11 0.0006
13 0.0005
Due to the problem explained in detail including a patch at
https://stat.ethz.ch/pipermail/r-devel/2009-January/051936.html
the error of R implementation of Spearman's rank correlation test is larger
than the above bounds for the sample size n = 11 and some of the values of S,
which correspond to positive correlation.
For example, for n = 11 and S = 90, we have
x <- 1:11
y <- c(6:1, 7, 11:8)
out <- cor.test(x, y, method="spearman", alternative="greater")
out$statistic # 90
out$p.value # 0.02921104
while the correct p-value is 0.03044548, so the absolute difference
is 0.00123444. This is larger than the absolute error 0.0006 guaranteed
for AS 89. In my opinion, this means that the claim from help(cor.test)
cited above is not correct.
To see the error of AS 89 in the example above, one can use
cor.test(x, -y, method="spearman", alternative="less")$p.value # 0.03036413
since on the side of negative correlation, R calls AS 89 correctly.
So, for the x, y above, correctly called AS 89 has absolute error 0.00008135.
There is a package pspearman currently included to CRAN, which provides a
correction of the problem without the need to modify R base.
Petr.
More information about the R-devel
mailing list