[R] trouble with wilcox.test

Greg Hather ghather at berkeley.edu
Thu Aug 18 20:53:54 CEST 2005


Ok, I will think more about the appropriateness of the Wilcoxon test 
here.  I was using

R version 2.1.1, 2005-06-20
Windows XP SP2
512MB RAM

--Greg

----- Original Message ----- 
From: "Prof Brian Ripley" <ripley at stats.ox.ac.uk>
To: "Greg Hather" <ghather at berkeley.edu>
Cc: <r-help at stat.math.ethz.ch>
Sent: Wednesday, August 17, 2005 11:45 PM
Subject: Re: [R] trouble with wilcox.test


> On Wed, 17 Aug 2005, Greg Hather wrote:
>
>> I'm having trouble with the wilcox.test command in R.
>
> Are you sure it is not the concepts that are giving 'trouble'?
> What real problem are you trying to solve here?
>
>> To demonstrate the anomalous behavior of wilcox.test, consider
>>
>>> wilcox.test(c(1.5,5.5), c(1:10000), exact = F)$p.value
>> [1] 0.01438390
>>> wilcox.test(c(1.5,5.5), c(1:10000), exact = T)$p.value
>> [1] 6.39808e-07 (this calculation takes noticeably longer).
>>> wilcox.test(c(1.5,5.5), c(1:20000), exact = T)$p.value
>> (R closes/crashes)
>>
>> I believe that wilcox.test(c(1.5,5.5), c(1:10000), exact = F)$p.value 
>> yields a bad result because of the normal approximation which R uses 
>> when exact = F.
>
> Expecting an approximation to be good in the tail for m=2 is pretty 
> unrealistic.  But then so is believing the null hypothesis of a common 
> *continuous* distribution.  Why worry about the distribution under a 
> hypothesis that is patently false?
>
> People often refer to this class of tests as `distribution-free', but 
> they are not.  The Wilcoxon test is designed for power against shift 
> alternatives, but here there appears to be a very large difference in 
> spread.  So
>
>> wilcox.test(5000+c(1.5,5.5), c(1:10000), exact = T)$p.value
> [1] 0.9989005
>
> even though the two samples differ in important ways.
>
>
>> Any suggestions for how to compute wilcox.test(c(1.5,5.5), 
>> c(1:20000), exact = T)$p.value?
>
> I get (current R 2.1.1 on Linux)
>
>> wilcox.test(c(1.5,5.5), c(1:20000), exact = T)$p.value
> [1] 1.59976e-07
>
> and no crash.  So the suggestion is to use a machine adequate to the 
> task, and that probably means an OS with adequate stack size.
>
>> [[alternative HTML version deleted]]
>
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html
>
> Please do heed it.  What version of R and what machine is this?  And 
> do take note of the request about HTML mail.
>
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list