[R] Wilcoxon-Mann-Whitney U value: outcomes from different stat packages

Wed May 30 09:33:46 CEST 2012

On May 29, 2012, at 17:55 , maxbre wrote:

> Given this example
> 
> #start code
> 
> a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940,
> 
> 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
> 
> b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90,
>     3220,490,20790,290,740,5350,940,3910,0,640,850,260)
> 
> wilcox.test(a, b, paired=FALSE)
> 
> #sum of rank for first sample
> sum.rank.a <- sum(rank(c(a,b))[1:29]) #sum of ranks assigned to the group a
> W1<- sum.rank.a - (length(a)*(length(a)+1)) / 2
> W1
> 
> U1 <- length(a)*length(b)/2-W1
> U1
> 
> #sum of ranks for second sample
> sum.rank.b <-sum(rank(c(a,b))[30:58]) #sum of ranks assigned to the group b
> W2 <- sum.rank.b - (length(b)*(length(b)+1)) / 2 
> W2
> 
> U2 <- length(a)*length(b)/2-W2
> U2
> 
> #end code
> 
> And given the fact that:
> 
> - in the note of R Wilcox.test is clearly stated: “ The literature is not
> unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney
> tests. The two most common definitions correspond to the sum of the ranks of
> the first sample with the minimum value subtracted or not. R subtracts [….],
> giving a value which is larger by m(m+1)/2 for a first sample of size m”

NB: You are quoting like the Devil reads the Bible: The bit in [...] is "and S-PLUS does not". So R's value is _smaller_ by m(m+1)/2. 

> 
> - as result of the same test performed with different stat packages (i.e.
> STATISTICA and PAST) I’ve got an U value of 200.5 as in W2 (see my script)
> with the same p-value
> 
> What can I conclude regarding STATISTICA and PAST packages?... are they
> giving W2 (see my script) instead of U?

Most likely. Or, equivalently, they are basing U on the 2nd group instead of the first. This varies between software, as does conventions for which way you subtract in a two sample t test. Some textbooks say that you use the _smallest_ group, and tabulate critical regions only for those cases, to save paper.

> 
> A crucial point is that the variant of the algorithm used for computation by
> the packages is very rarely indicated in the output or documented in the
> help facility and the manuals.
> See also this link (I’ve found after a long meandering on the web) about the
> comparison of “wilcoxon mann whitney” u test outcomes from different stat
> packages: 
> http://www.jstor.org/discover/10.2307/2685616?uid=3738296&uid=2129&uid=2&uid=70&uid=4&sid=47699045750617 
> 
> Any of you have faced the same type of issues? Or am I completely wrong?
> 
> maxbre
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-Mann-Whitney-U-value-outcomes-from-different-stat-packages-tp4631703.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com