[R] wilcox.test p-value = 0

Wed Sep 16 16:54:36 CEST 2009

That's right, if the test is exact it is not possible to get a p-value of zero.  wilcox.test does not provide an exact p-value in the presence of ties so if there are any ties in your data you are getting a normal approximation.  Incidentally, if there are any ties in your data set I would strongly recommend computing the *exact* p-value because using the normal approximation on tied data sets will either inflate type I error rate or reduce power depending on how the ties are distributed.  Depending on the pattern of ties this can result in gross under or over estimation of the p-value.

I guess this is all by way of saying that you should always compute the exact p-value if possible.

The package exactRankTests uses the algorithm by Mehta Patel and Tsiatis (1984).  If your sample sizes are larger, there is a freely available .exe by Cheung and Klotz (1995) that will do exact p-values for sample sizes larger than 100 in each group!

You can find it at http://pages.cs.wisc.edu/~klotz/

Bryan

> Hi Murat,
> I am not an expert in either statistics nor R, but I can imagine that since the 
> default is exact=TRUE, It numerically computes the probability, and it may 
> indeed be 0. if you use wilcox.test(x, y, exact=FALSE) it will give you a 
> normal aproximation, which will most likely be different from zero.

No, the exact p-value can't be zero for a discrete distribution. The smallest possible value in this case would, I think, be 1/choose(length(x)+length(y),length(x)), or perhaps twice that.

More generally, the approach used by format.pvalue() is to display very small p-values as <2e-16, where 2e-16 is machine epsilon.  I wouldn't want to claim optimality for this choice, but it seems a reasonable way to represent "very small".

     -thomas

> Hope this helps.
> Keo.
>
> Murat Tasan escribi?:
>> hi, folks,
>> 
>> how have you gone about reporting a p-value from a test when the
>> returned value from a test (in this case a rank-sum test) is
>> numerically equal to 0 according to the machine?
>> 
>> the next lowest value greater than zero that is distinct from zero on
>> the machine is likely algorithm-dependent (the algorithm of the test
>> itself), but without knowing the explicit steps of the algorithm
>> implementation, it is difficult to provide any non-zero value.  i
>> initially thought to look at .Machine at double.xmin, but i'm not
>> comfortable with reporting p < .Machine at double.xmin, since without
>> knowing the specifics of the implementation, this may not be true!
>> 
>> to be clear, if i have data x, and i run the following line, the
>> returned value is TRUE.
>> 
>> wilcox.test(x)$p.value == 0
>> 
>> thanks for any help on this!
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

------------------------------

-------------
Bryan Keller, Doctoral Student/Project Assistant
Educational Psychology - Quantitative Methods
The University of Wisconsin - Madison