[R] Mann-Whitney U

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Aug 15 21:49:13 CEST 2007


Lucke, Joseph F wrote:
> R and SPSS are using different but equivalent statistics.  R is using
> the rank sum of group1 adjusted for the mean rank. SPSS is using the
> rank sum of group2 adjusted for the mean rank. 
>
>   
Close: It is the _minimum_ possible rank sum that is getting subtracted. 
If everyone in group1 is less than everyone in group2, R's W statistic  
will be zero. Other way around in SPSS.

> Example.
>   
>> G1=group1
>> G2=group2[-length(group2)] #get rid of the NA
>> n1=length(G1) #n1=28
>> n2=length(G2) #n2=27
>>     
> # convert to ranks
>   
>> W=rank(c(G1,G2))
>> R1=W[1:n1] #put the ranks back into the groups
>> R2=W[n1+1:n2]
>>     
> #Get the sum of the ranks for each group
>   
>> W1=sum(R1)
>> W2=sum(R2)
>>     
> #Adjust for mean rank for group 1
>   
>> W1-n1*(n1+1)/2
>>     
> [1] 405.5
> #Adjust for mean rank for group 2
>   
>> W2-n2*(n2+1)/2
>>     
> [1] 350.5
>
> W1-n1*(n1+1)/2 gives R's result; W2-n2*(n2+1)/2 gives SPSS's result.
>
> Ties throw a wrench in the works.  R uses a continuity correction by
> default, SPSS does not.
> Taking out the continuity correction,
>   
>> wilcox.test(G1,G2,correct=FALSE)
>>     
>
>         Wilcoxon rank sum test
>
> data:  G1 and G2 
> W = 405.5, p-value = 0.6433
> alternative hypothesis: true location shift is not equal to 0 
>
> Warning message:
> cannot compute exact p-value with ties in: wilcox.test.default(G1, G2,
> correct = FALSE) 
>
> This p-value is the same as SPSS's.
>
>
> Consult a serious non-parametrics text.  I used
> Lehmann, E. L., Nonparametrics: Statistical methods based on ranks.
> 1975. Holden-Day. San Francisco, CA.
>
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Natalie O'Toole
> Sent: Wednesday, August 15, 2007 1:07 PM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] Mann-Whitney U
>
> Hi,
>
> I do want to use the Mann-Whitney test which ranks my data and then uses
> those ranks rather than the actual data.
>
> Here is the R code i am using:
>
>  group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
> 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
>   
>> group2<-
>>     
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
> 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
>   
>> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 
>> 0.95,
>>     
> na.action)
>
> paired = FALSE so that the Wilcoxon rank sum test which is equivalent to
> the Mann-Whitney test is used (my samples are NOT paired).
> conf.level = 0.95 to specify the confidence level na.action is used
> because i have a NA value (i suspect i am not using na.action in the
> correct manner)
>
> When i use this code i get the following error message:
>
> Error in arg == choices : comparison (1) is possible only for atomic and
> list types
>
> When i use this code:
>
>  group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
> 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
>   
>> group2<-
>>     
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
> 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
>   
>> result <-  wilcox.test(group1, group2, paired=FALSE, conf.level = 
>> 0.95)
>>     
>
> I get the following result:
>
>   Wilcoxon rank sum test with continuity correction
>
> data:  group1 and group2
> W = 405.5, p-value = 0.6494
> alternative hypothesis: true location shift is not equal to 0 
>
> Warning message:
> cannot compute exact p-value with ties in: wilcox.test.default(group1,
> group2, paired = FALSE, conf.level = 0.95) 
>
> The W value here is 405.5 with a p-value of 0.6494
>
>
> in SPSS, i am ranking my data and then performing a Mann-Whitney U by
> selecting analyze - non-parametric tests - 2 independent samples  and
> then checking off the Mann-Whitney U test.
>
> For the Mann-Whitney test in SPSS i am gettting the following results:
>
> Mann-Whitney U = 350.5
>  2- tailed p value = 0.643
>
> I think maybe the descrepancy has to do with the specification of the NA
> values in R, but i'm not sure.
>
>
> If anyone has any suggestions, please let me know!
>
> I hope i have provided enough information to convey my problem.
>
> Thank-you, 
>
> Nat
> __________________
>
>
> Natalie,
>
> It's best to provide at least a sample of your data.  Your field names 
> suggest 
> that your data might be collected in units of mm^2 or some similar 
> measurement of area.  Why do you want to use Mann-Whitney, which will
> rank 
>
> your data and then use those ranks rather than your actual data?  Unless
>
> your 
> sample is quite small, why not use a two sample t-test?  Also,are your 
> samples paired?  If they aren't, did you use the "paired = FALSE"
> option?
>
> JWDougherty
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ------------------------------------------------------------------------
> ------------------------------------------------ 
>
> This communication is intended for the use of the recipient to which it
> is 
> addressed, and may
> contain confidential, personal, and or privileged information. Please 
> contact the sender
> immediately if you are not the intended recipient of this communication,
>
> and do not copy,
> distribute, or take action relying on it. Any communication received in 
> error, or subsequent
> reply, should be deleted or destroyed.
>
>
> ------------------------------------------------------------------------
> ------------------------------------------------ 
>
> This communication is intended for the use of the recipient to which it
> is 
> addressed, and may
> contain confidential, personal, and or privileged information. Please 
> contact the sender
> immediately if you are not the intended recipient of this communication,
>
> and do not copy,
> distribute, or take action relying on it. Any communication received in 
> error, or subsequent
> reply, should be deleted or destroyed.
> ------------------------------------------------------------------------
> ------------------------------------------------ 
>
> This communication is intended for the use of the recipient to which it
> is 
> addressed, and may
> contain confidential, personal, and or privileged information. Please 
> contact the sender
> immediately if you are not the intended recipient of this communication,
>
> and do not copy,
> distribute, or take action relying on it. Any communication received in 
> error, or subsequent
> reply, should be deleted or destroyed.
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list