[Rd] kendall tau correlation test for ties: Potential error (PR#8076)

Thu Aug 18 13:43:29 CEST 2005

Full_Name: Dirk Koschuetzki
Version: 2.1.1
OS: source code
Submission from: (NULL) (194.94.136.34)

Hello,

>From the source code (R-2.1.1, file: .../R-2.1.1/src/library/stats/R/)

******************************
cor.test.default <-
function(x, y, alternative = c("two.sided", "less", "greater"),
         method = c("pearson", "kendall", "spearman"), exact = NULL,
         conf.level = 0.95, ...)
{
    alternative <- match.arg(alternative)
    method <- match.arg(method)
    DNAME <- paste(deparse(substitute(x)), "and", deparse(substitute(y)))

    if(length(x) != length(y))
	stop("'x' and 'y' must have the same length")
    OK <- complete.cases(x, y)
    x <- x[OK]
    y <- y[OK]
    n <- length(x)

    PVAL <- NULL
    NVAL <- 0
    conf.int <- FALSE

    if(method == "pearson") {
	// Omitted
    }
    else {
	if(n < 2)
	    stop("not enough finite observations")
	PARAMETER <- NULL
	TIES <- (min(length(unique(x)), length(unique(y))) < n)
	if(method == "kendall") {
	    method <- "Kendall's rank correlation tau"
	    names(NVAL) <- "tau"
	    r <- cor(x,y, method = "kendall")
            ESTIMATE <- c(tau = r)

            if(!is.finite(ESTIMATE)) {  # all x or all y the same
                ESTIMATE[] <- NA
                STATISTIC <- c(T = NA)
                PVAL <- NA
            }
            else {
                if(is.null(exact))
                    exact <- (n < 50)
                if(exact && !TIES) {
                    q <- round((r + 1) * n * (n - 1) / 4)
                    pkendall <- function(q, n) {
                        .C("pkendall",
                           length(q),
                           p = as.double(q),
                           as.integer(n),
                           PACKAGE = "stats")$p
                    }
                    PVAL <-
                        switch(alternative,
                               "two.sided" = {
                                   if(q > n * (n - 1) / 4)
                                       p <- 1 - pkendall(q - 1, n)
                                   else
                                       p <- pkendall(q, n)
                                   min(2 * p, 1)
                               },
                               "greater" = 1 - pkendall(q - 1, n),
                               "less" = pkendall(q, n))
                    STATISTIC <- c(T = q)
                } else {
                    STATISTIC <- c(z = r / sqrt((4 * n + 10) / (9 * n*(n-1))))
                    p <- pnorm(STATISTIC)
                    if(exact && TIES)
                        warning("Cannot compute exact p-value with ties")
                }
            }
	} else {
	// OMITTED
        }
    }

    if(is.null(PVAL)) # for "pearson" only, currently
	PVAL <- switch(alternative,
		       "less" = p,
		       "greater" = 1 - p,
		       "two.sided" = 2 * min(p, 1 - p))

    RVAL <- list(statistic = STATISTIC,
                 parameter = PARAMETER,
                 p.value = as.numeric(PVAL),
                 estimate = ESTIMATE,
                 null.value = NVAL,
                 alternative = alternative,
                 method = method,
                 data.name = DNAME)
    if(conf.int)
        RVAL <- c(RVAL, list(conf.int = cint))
    class(RVAL) <- "htest"
    RVAL
}
*************

Please look at the computation of the p-value for Kendalls tau. There is an
assignment to "p" right above the warning. In the bottom of the function there
is a comment that for the pearson case we have to use the modification and set
PVAL. 

The problem is: 
* Either the comment is wrong because the modification should be done with
kendall too, or
* The variable PVAL has to be assigned in the kendall block.

I hope this is clear so far.

Please send me some comments, because I'm not sure if my observation is ok. And
currently I try to figure out the significance in the biserial case which of
course makes heavy use of the tied case.

Cheers,
Dirk