[R] Weird differing results when using the Wilcoxon-test

Tue Aug 17 13:48:10 CEST 2010

Hi,

I became a little bit confused when working with the Wilcoxon test in R.
As far as I understood, there are mainly two versions:
1) wilcox.test{stats}, which is the default and an approximation, especially, 
when ties are involved
2) wilcox_test{coin}, which does calculate the distribution _exactly_ even, 
with ties.

I have the following scenario:

#---BeginCode---
# big example
size = 60
big1 = rnorm(size, 0, 1)
big2 = rnorm(size, 0.5, 1

g1f = rep(1, size)
g2f = rep(2, size)
big = c(big1, big2)
data_frame = data.frame(big, gr=as.factor(c(g1f, g2f)))

wilcox_approx = wilcox.test(big1, big2)
wilcox_exact = wilcox_test(big ~ gr, data=data_frame, distribution="exact")
#---EndCode---

I found here http://www-stat.stanford.edu/~susan/courses/s141/hononpara.pdf 
that wilcox.test (at least for the signed rank test) relies on exact 
(p-)values until N = n1 + n2 = 50.
I can reproduce this, when using e.g. size = 15. The p-values then are the 
same, as I would expect it, having read the info from the link.

#---BeginCode---
print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
#---EndCode---

That said, if I set e.g. size = 60, then the p-values of wilcox.test and 
wilcox_test differ, as expected.

What's puzzling me particularly is the differing results when wanting to 
calculate the p-value manually, for bigger sample sizes.

So, if we get the W-score from wilcox.test:

#---BeginCode---
W_big = wilcox.test(big1, big2))$statistic
#---EndCode---

and "convert" it to a Z-score, like this:

#---BeginCode---
mu_big = (size^2)/2
sd_big = sqrt(size*size*(size + size + 1)/12)
N = size + size
sd_big_corr = sqrt( (size * size) / (N * (N - 1)) * (N^3 - N) / 12 )

Z_big = (((W_big - mu_big)/sd_big)
#---EndCode---

The Z-Score (Z_big) is equal to the statistic of wilcox_test.
So far so good. And now comes the main problem.
When I follow the documentation correctly, the p-value for a given W-score/-
statistic ist calculated using the normal-approximation with the Z-score.
However, when I do that, I get a different result than what I would expect.
Because I would expect the p-value of wilcox.test to be equal to 
2*pnorm(Z_big), which is in fact _not_ equal. Please see:

#---BeginCode---
p_value_manual = 2 * pnorm(Z_big)

print("--- Resulting pvalues --- ", quote=F)
print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
print(paste("P-value manual:", p_value_manual), quote=F)
#---EndCode---

So how is the calculation of the p-value performed in wilcox.test, when the 
sample sizes are big? Because this might explain why the value differs from 
that being calculated manually.

Best regards,

Cedric