[Rd] NAs by integer overflow in Spearman's test p-value (PR#8087)
jtk@cmp.uea.ac.uk
jtk at cmp.uea.ac.uk
Tue Aug 23 15:15:09 CEST 2005
Full_Name: Jan T. Kim
Version: 2.1.0 (and better)
OS: Linux
Submission from: (NULL) (139.222.3.229)
The p value in Spearman's test is NA if the length of x exceeds 46340, due to
an integer overflow, occurring if length(n) > sqrt(2^31):
> n <- 46341;
> set.seed(1);
> x <- runif(n);
> y <- runif(n);
> cor.test(x, y, method = "spearman");
Spearman's rank correlation rho
data: x and y
S = 1.654965e+13, p-value = NA
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.002199426
Warning message:
NAs produced by integer overflow in: n * n
The integer overflow occurs in src/library/stats/R/cor.test.R, and it can be
fixed
by converting n to double appropriately (see *** fix label ***, lines 110
onwards
are shown):
## Use the test statistic S = sum(rank(x) - rank(y))^2
## and AS 89 for obtaining better p-values than via the
## simple normal approximation.
## In the case of no ties, S = (1-rho) * (n^3-n)/6.
pspearman <- function(q, n, lower.tail = TRUE) {
if(n <= 1290) # n*(n^2 - 1) does not overflow
.C("prho",
as.integer(n),
as.double(q + 1),
p = double(1),
integer(1),
as.logical(lower.tail),
PACKAGE = "stats")$p
else { # for large n: aymptotic t_{n-2}
n <- as.double(n); # *** fix ***
r <- 1 - 6 * q / (n*(n*n - 1))
pt(r / sqrt((1 - r^2)/(n-2)), df = n-2,
lower.tail= !lower.tail)
}
}
q <- round((n^3 - n) * (1 - r) / 6)
STATISTIC <- c(S = q)
PVAL <-
switch(alternative,
"two.sided" = {
p <- if(q > (n^3 - n) / 6)
pspearman(q - 1, n, lower.tail = FALSE)
else
pspearman(q, n, lower.tail = TRUE)
min(2 * p, 1)
},
"greater" = pspearman(q, n, lower.tail = TRUE),
"less" = pspearman(q - 1, n, lower.tail = FALSE))
if(TIES)
warning("p-values may be incorrect due to ties")
Inserting the typecast only in the pspearman function is a minimally invasive
fix -- alternatively, replacing at line 17
n <- length(x)
with
n <- as.double(length(x))
achieves the same fix and may take care of other unnecessary integer overflows,
but it may also introduce new problems e.g. in .C calls etc.
More information about the R-devel
mailing list