[R] Mann-Whitney U
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Aug 15 21:49:13 CEST 2007
Lucke, Joseph F wrote:
> R and SPSS are using different but equivalent statistics. R is using
> the rank sum of group1 adjusted for the mean rank. SPSS is using the
> rank sum of group2 adjusted for the mean rank.
>
>
Close: It is the _minimum_ possible rank sum that is getting subtracted.
If everyone in group1 is less than everyone in group2, R's W statistic
will be zero. Other way around in SPSS.
> Example.
>
>> G1=group1
>> G2=group2[-length(group2)] #get rid of the NA
>> n1=length(G1) #n1=28
>> n2=length(G2) #n2=27
>>
> # convert to ranks
>
>> W=rank(c(G1,G2))
>> R1=W[1:n1] #put the ranks back into the groups
>> R2=W[n1+1:n2]
>>
> #Get the sum of the ranks for each group
>
>> W1=sum(R1)
>> W2=sum(R2)
>>
> #Adjust for mean rank for group 1
>
>> W1-n1*(n1+1)/2
>>
> [1] 405.5
> #Adjust for mean rank for group 2
>
>> W2-n2*(n2+1)/2
>>
> [1] 350.5
>
> W1-n1*(n1+1)/2 gives R's result; W2-n2*(n2+1)/2 gives SPSS's result.
>
> Ties throw a wrench in the works. R uses a continuity correction by
> default, SPSS does not.
> Taking out the continuity correction,
>
>> wilcox.test(G1,G2,correct=FALSE)
>>
>
> Wilcoxon rank sum test
>
> data: G1 and G2
> W = 405.5, p-value = 0.6433
> alternative hypothesis: true location shift is not equal to 0
>
> Warning message:
> cannot compute exact p-value with ties in: wilcox.test.default(G1, G2,
> correct = FALSE)
>
> This p-value is the same as SPSS's.
>
>
> Consult a serious non-parametrics text. I used
> Lehmann, E. L., Nonparametrics: Statistical methods based on ranks.
> 1975. Holden-Day. San Francisco, CA.
>
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Natalie O'Toole
> Sent: Wednesday, August 15, 2007 1:07 PM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] Mann-Whitney U
>
> Hi,
>
> I do want to use the Mann-Whitney test which ranks my data and then uses
> those ranks rather than the actual data.
>
> Here is the R code i am using:
>
> group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
> 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
>
>> group2<-
>>
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
> 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
>
>> result <- wilcox.test(group1, group2, paired=FALSE, conf.level =
>> 0.95,
>>
> na.action)
>
> paired = FALSE so that the Wilcoxon rank sum test which is equivalent to
> the Mann-Whitney test is used (my samples are NOT paired).
> conf.level = 0.95 to specify the confidence level na.action is used
> because i have a NA value (i suspect i am not using na.action in the
> correct manner)
>
> When i use this code i get the following error message:
>
> Error in arg == choices : comparison (1) is possible only for atomic and
> list types
>
> When i use this code:
>
> group1<-
> c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,
> 2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)
>
>> group2<-
>>
> c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.9
> 7,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)
>
>> result <- wilcox.test(group1, group2, paired=FALSE, conf.level =
>> 0.95)
>>
>
> I get the following result:
>
> Wilcoxon rank sum test with continuity correction
>
> data: group1 and group2
> W = 405.5, p-value = 0.6494
> alternative hypothesis: true location shift is not equal to 0
>
> Warning message:
> cannot compute exact p-value with ties in: wilcox.test.default(group1,
> group2, paired = FALSE, conf.level = 0.95)
>
> The W value here is 405.5 with a p-value of 0.6494
>
>
> in SPSS, i am ranking my data and then performing a Mann-Whitney U by
> selecting analyze - non-parametric tests - 2 independent samples and
> then checking off the Mann-Whitney U test.
>
> For the Mann-Whitney test in SPSS i am gettting the following results:
>
> Mann-Whitney U = 350.5
> 2- tailed p value = 0.643
>
> I think maybe the descrepancy has to do with the specification of the NA
> values in R, but i'm not sure.
>
>
> If anyone has any suggestions, please let me know!
>
> I hope i have provided enough information to convey my problem.
>
> Thank-you,
>
> Nat
> __________________
>
>
> Natalie,
>
> It's best to provide at least a sample of your data. Your field names
> suggest
> that your data might be collected in units of mm^2 or some similar
> measurement of area. Why do you want to use Mann-Whitney, which will
> rank
>
> your data and then use those ranks rather than your actual data? Unless
>
> your
> sample is quite small, why not use a two sample t-test? Also,are your
> samples paired? If they aren't, did you use the "paired = FALSE"
> option?
>
> JWDougherty
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ------------------------------------------------------------------------
> ------------------------------------------------
>
> This communication is intended for the use of the recipient to which it
> is
> addressed, and may
> contain confidential, personal, and or privileged information. Please
> contact the sender
> immediately if you are not the intended recipient of this communication,
>
> and do not copy,
> distribute, or take action relying on it. Any communication received in
> error, or subsequent
> reply, should be deleted or destroyed.
>
>
> ------------------------------------------------------------------------
> ------------------------------------------------
>
> This communication is intended for the use of the recipient to which it
> is
> addressed, and may
> contain confidential, personal, and or privileged information. Please
> contact the sender
> immediately if you are not the intended recipient of this communication,
>
> and do not copy,
> distribute, or take action relying on it. Any communication received in
> error, or subsequent
> reply, should be deleted or destroyed.
> ------------------------------------------------------------------------
> ------------------------------------------------
>
> This communication is intended for the use of the recipient to which it
> is
> addressed, and may
> contain confidential, personal, and or privileged information. Please
> contact the sender
> immediately if you are not the intended recipient of this communication,
>
> and do not copy,
> distribute, or take action relying on it. Any communication received in
> error, or subsequent
> reply, should be deleted or destroyed.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list