[R] The three routines in R that calculate the wilcoxon signed-rank test give different p-values.......which is correct?

Wed Apr 13 13:12:54 CEST 2011

On Apr 13, 2011, at 01:57 , Michael G Rupert wrote:

> I have a question concerning the Wilcoxon signed-rank test, and 
> specifically, which R subroutine I should use for my particular dataset. 
> There are three different commands in R (that I'm aware of) that calculate 
> the Wilcoxon signed-rank test; wilcox.test, wilcox.exact, and 
> wilcoxsign_test. When I run the three commands on the same dataset, I get 
> different p-values. I'm hoping that someone can give me guidance on the 
> strengths and weaknesses of each command, why they produce different 
> p-values, and which one is the most appropriate for my particular needs.

Well, there are two version of zero-handling, and for each of these, you can have exact p values or asymptotic p values with or without continuity correction, so that's 6 possibilities already. 

> 
> First, let me describe the dataset I am working with. The project I am 
> working on collected water samples from groups/networks of about 30 water 
> wells and analyzed them for nitrate, major ions, and other chemical 
> constituents. We revisited those same wells about 10 years later and 
> analyzed the water samples for the same chemical constituents. I now have 
> a paired dataset, and the question I would like to answer is whether there 
> was a "significant" change in concentrations of those chemical 
> constituents (such as nitrate or chloride). Concentrations measured in 
> water from some wells have increased, some have decreased, and some have 
> stayed the same over the ten-year time period. In water from some wells, 
> the concentrations were below the laboratory detection limits, so those 
> concentrations are "tied" at the reporting level. The following is an 
> example of the data I am evaluating. 
> 
> x <- c(13.60, 9.10, 22.01, 9.08, 1.97, 2.81, 0.66, 0.97, 0.21, 2.23, 0.08, 
> 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 
> 0.06, 0.06, 3.44, 15.18, 5.25, 4.27, 17.81)
> y <- c( 4.32, 3.39, 16.36, 7.10, 0.08, 2.02, 0.19, 0.59, 0.06, 2.15, 0.06, 
> 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 0.06, 
> 0.06, 0.06, 4.02, 16.13, 7.30, 7.98, 24.37)
> 
> The nonparametric Wilcoxon signed-rank test seems to be the most 
> appropriate test for these data. There are two different methods to 
> calculate the signed-rank test. The first is by Wilcoxon (1945), who 
> discards any tied data and then calculates the signed ranks. The second 
> method incorporates tied values in the ranking procedure (see J.W. Pratt, 
> 1959, Remarks on zeros and ties in the Wilcoxon signed rank procedure: 
> Journal of the American Statistical Association, Vol. 54, No. 287, pp. 
> 655-667). There are two commands in R that calculate the original method 
> by Wilcoxon (that I know of), wilcox.test and wilcoxsign_test (make sure 
> to include the argument "zero.method = c("Wilcoxon")"). There are two 
> other commands in R that incorporate ties in the signed-rank test, 
> wilcox.exact and wilcoxsign_test (make sure to include the 
> argument"zero.method = c("Pratt")").
> 
> Here's my problem. I get different p-values from each of the 4 signed-rank 
> tests in R, and I don't know which one to believe. Wilcox.test and 
> wilcoxsign_test(zero.method = c("Wilcoxon") calculate the standard 
> Wilcoxon signed-rank test. Even though they are not designed to deal with 
> tied data, they should at least calculate the same p-value, but they do 
> not.

They do if you turn off the continuity correction in wilcox.test:

> wilcox.test(x, y, alternative='two.sided', paired=TRUE, correct=F)

	Wilcoxon signed rank test

data:  x and y 
V = 39, p-value = 0.05061
alternative hypothesis: true location shift is not equal to 0 

> I ran the same datasets in SYSTAT and Minitab to check on the results 
> from R. Minitab gives the same results as wilcox.test, and SYSTAT gives 
> the same results as wilcoxsign_test(zero.method = c("Wilcoxon").

So one does continuity correction and the other not.

>  
> Similarly, wilcox.exact and wilcoxsign_test(zero.method = c("Pratt")) are 
> designed to incorporate ties, but they give different p-values from each 
> other.

They still handle zeros differently. wilcox.exact does not handle the Pratt ranking.

To get exact p values for Pratt ranks, try

> perm.test(c(-3,-4,-5,6:11))

	1-sample Permutation Test

data:  c(-3, -4, -5, 6:11) 
T = 51, p-value = 0.08984
alternative hypothesis: true mu is not equal to 0 

... and for the asymptotic counterpart:

> perm.test(c(-3,-4,-5,6:11), exact=F)

	Asymptotic 1-sample Permutation Test

data:  c(-3, -4, -5, 6:11) 
T = 51, p-value = 0.08144
alternative hypothesis: true mu is not equal to 0 

> The signed-rank test procedure is relatively straightforward, so 
> I'm surprised I'm not getting identical results. 
> 
> To check on these R commands, I calculated the signed-rank tests using the 
> dataset shown on page 658-659 of Pratt (1959).

Not found. Apparently, you _constructed_ a data set to get the same set of ranks. 

> These R routines do not 
> produce the same results as that listed in Pratt, which makes me think 
> that the R routines are not calculating the statistics correctly.

Apparently, the Pratt paper predates the convention that a p value is the probability of observing "the test statistic or more extreme" and he switches back and forth between "less than" and "less than or equal" (to a negative rank sum of 6 and 12 resp.). Also, his p-values are one-sided.

Using modern technology, it is pretty easy to generate the enumerations that Pratt is referring to:

> M <- as.matrix(do.call("expand.grid", rep(list(0:1),9)))
> table(M%*%1:9)

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
 1  1  1  2  2  3  4  5  6  8  9 10 12 13 15 17 18 19 21 21 22 23 23 
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 
23 23 22 21 21 19 18 17 15 13 12 10  9  8  6  5  4  3  2  2  1  1  1 
> table(M%*%3:11)

 0  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
 1  1  1  1  1  2  2  3  3  4  4  5  6  7  7  8 10 10 11 12 13 13 15 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 
15 16 16 17 17 18 17 17 18 17 17 16 16 15 15 13 13 12 11 10 10  8  7 
48 49 50 51 52 53 54 55 56 57 58 59 60 63 
 7  6  5  4  4  3  3  2  2  1  1  1  1  1 

From these, you can see that the probability of "6 or less" with the Wilcoxon ranking is 
14/512 and that of "12 or less" with the Pratt procedure is 23/512. To get two-sided tests, include the (symmetric) opposite tail, i.e., multiply by two and get

> 14/256
[1] 0.0546875
> 23/256
[1] 0.08984375

Both of those should be recognizable from the exact tests in coin/exactRankTests.

> The 
> following text shows the commands I use in R to calculate the signed-rank 
> test using these different R commands:
> 
> Thanks in advance for any assistance.
> 
> --Mike
> 
> #################################################################################
> library(exactRankTests) #this loads the package for calculating the 
> modified signed-rank test
> library(coin) #this adds additional routines for the wilcoxon signed-rank 
> test and the Pratt signed-rank test
> #
> # Data from Page 658 of Pratt
> x <- c(1, 1, 1, 1, 1, 7, 10, 12, 13, 16, 17)
> y <- c(1, 1, 3, 4, 6, 1,  1,  1,  1,  1, 1)
> #
> # STANDARD WILCOXON SIGNED RANK USING WILCOX.TEST
> wilcox.test(x, y, alternative='two.sided', paired=TRUE)
> # STANDARD WILCOXON SIGNED-RANK USING WILCOXSIGN_TEST. 
> wilcoxsign_test (x ~ y, zero.method = c("Wilcoxon"))
> #
> #
> # MODIFIED WILCOXON SIGNED-RANK USING WILCOX.EXACT
> wilcox.exact(x, y, alternative='two.sided', paired=TRUE, mu=0)
> # PRATT SIGNED-RANK TEST
> wilcoxsign_test (x ~ y, zero.method = c("Pratt"))
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com