[R] wilcox.test - difference between p-values of R and online calculators

David L Carlson dcarlson at tamu.edu
Wed Sep 3 19:13:01 CEST 2014


Since they all have the same W/U value, it seems likely that the difference is how the different versions adjust the standard error for ties. Here are a couple of posts addressing the issues of ties:

http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9200.html
http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and

David C

From: wbradleyknox at gmail.com [mailto:wbradleyknox at gmail.com] On Behalf Of W Bradley Knox
Sent: Wednesday, September 3, 2014 9:20 AM
To: David L Carlson
Cc: Tal Galili; r-help at r-project.org
Subject: Re: [R] wilcox.test - difference between p-values of R and online calculators

Tal and David, thanks for your messages.

I should have added that I tried all variations of true/false values for the exact and correct parameters. Running with correct=FALSE makes only a tiny change, resulting in W = 485, p-value = 0.0002481.

At one point, I also thought that the discrepancy between R and these online calculators might come from how ties are handled, but the fact that R and two of the online calcultors reach the same U/W values seems to indicate that ties aren't the issue, since (I believe) the U or W values contain all of the information needed to calculate the p-value, assuming the number of samples is also known for each condition. (However, it's been a while since I looked into how MWU tests work, so maybe now's the time to refresh.) If that's correct, the discrepancy seems to be based in what R does with the W value that is identical to the U values of two of the online calculators. (I'm also assuming that U and W have the same meaning, which seems likely.)

- Brad

____________________
W. Bradley Knox, PhD
http://bradknox.net<http://bradknox.net/>
bradknox at mit.edu<mailto:bradknox at mit.edu>

On Wed, Sep 3, 2014 at 9:10 AM, David L Carlson <dcarlson at tamu.edu<mailto:dcarlson at tamu.edu>> wrote:
That does not change the results. The problem is likely to be the way ties are handled. The first sample has 25 values of which 23 are identical (359). The second sample has 26 values of which 12 are identical (359). The difference between the implementations may be a result of the way the ties are ranked. For example the R function rank() offers 5 different ways of handling the rank on tied observations. With so many ties, that could make a substantial difference.

Package coin has wilxon_test() which uses Monte Carlo simulation to estimate the confidence limits.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org> [mailto:r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>] On Behalf Of Tal Galili
Sent: Wednesday, September 3, 2014 5:24 AM
To: W Bradley Knox
Cc: r-help at r-project.org<mailto:r-help at r-project.org>
Subject: Re: [R] wilcox.test - difference between p-values of R and online calculators

It seems your numbers has ties. What happens if you run wilcox.test with
correct=FALSE, will the results be the same as the online calculators?



----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili at gmail.com<mailto:Tal.Galili at gmail.com> |
Read me: www.talgalili.com<http://www.talgalili.com> (Hebrew) | www.biostatistics.co.il<http://www.biostatistics.co.il> (Hebrew) |
www.r-statistics.com<http://www.r-statistics.com> (English)
----------------------------------------------------------------------------------------------



On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox <bradknox at mit.edu<mailto:bradknox at mit.edu>> wrote:

> Hi.
>
> I'm taking the long-overdue step of moving from using online calculators to
> compute results for Mann-Whitney U tests to a more streamlined system
> involving R.
>
> However, I'm finding that R computes a different result than the 3 online
> calculators that I've used before (all of which approximately agree). These
> calculators are here:
>
> http://elegans.som.vcu.edu/~leon/stats/utest.cgi
> http://vassarstats.net/utest.html
> http://www.socscistatistics.com/tests/mannwhitney/
>
> An example calculation is
>
>
> *wilcox.test(c(359,359,359,359,359,359,335,359,359,359,359,359,359,359,359,359,359,359,359,359,359,303,359,359,359),c(332,85,359,359,359,220,231,300,359,237,359,183,286,355,250,105,359,359,298,359,359,359,28.6,359,359,128))*
>
> which prints
>
>
>
>
>
>
>
>
>
> *Wilcoxon rank sum test with continuity correction  data: c(359, 359, 359,
> 359, 359, 359, 335, 359, 359, 359, 359, 359, and c(332, 85, 359, 359, 359,
> 220, 231, 300, 359, 237, 359, 183, 359, 359, 359, 359, 359, 359, 359, 359,
> 359, 303, 359, 359, and 286, 355, 250, 105, 359, 359, 298, 359, 359, 359,
> 28.6, 359, 359) and 359, 128)  W = 485, p-value = 0.0002594 alternative
> hypothesis: true location shift is not equal to 0 Warning message: In
> wilcox.test.default(c(359, 359, 359, 359, 359, 359, 335, 359, : cannot
> compute exact p-value with ties*
>
>
> However, all of the online calculators find p-values close to 0.0025, 10x
> the value output by R. All results are for a two-tailed case. Importantly,
> the W value computed by R *does agree* with the U values output by the
> first two online calculators listed above, yet it has a different p-value.
>
> Can anyone shed some light on how and why R's calculation differs from that
> of these online calculators? Thanks for your time.
>
> ____________________
> W. Bradley Knox, PhD
> http://bradknox.net
> bradknox at mit.edu<mailto:bradknox at mit.edu>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


	[[alternative HTML version deleted]]



More information about the R-help mailing list