[R] high p values

Wed Mar 20 00:16:59 CET 2019

Hi,

Since folks are taking the time to point out some subtle issues here, taking an example from the UCLA Stats web site:

https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-why-is-the-mann-whitney-significant-when-the-medians-are-equal/

Grp1 <- rep(c(-2, 0, 5), each = 20)
Grp2 <- rep(c(-1, 0, 10), each = 20)

> Grp1
 [1] -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2  0  0
[23]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  5  5  5  5
[45]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
> Grp2
 [1] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  0  0
[23]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 10 10 10 10
[45] 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

> median(Grp1)
[1] 0
> median(Grp2)
[1] 0

> wilcox.test(Grp1, Grp2)

	Wilcoxon rank sum test with continuity correction

data:  Grp1 and Grp2
W = 1400, p-value = 0.03096
alternative hypothesis: true location shift is not equal to 0

So, in contrast to the original problem, here is an example where you have equal medians, but a significant test result.

The key concept is that the Wilcoxon Rank Sum test is not strictly a test of differences in medians. That is, the null hypothesis for the test is not that the medians are equal, and you are either accepting or rejecting that null. 

Javed, I would suggest spending some time with a good tutorial on non-parametric statistics.

Regards,

Marc Schwartz

> On Mar 19, 2019, at 6:25 PM, Jim Lemon <drjimlemon using gmail.com> wrote:
> 
> Hi Javed,
> Easy.
> 
> A<-c(2000,2100,2300,2400,6900,7000,7040,7050,7060)
> median(A)
> [1] 6900
> B<-c(3300,3350,3400,3450,3500,7000,7100,7200,7300)
> median(B)
> [1] 3500
> wilcox.test(A,B,paired=FALSE)
> 
>       Wilcoxon rank sum test with continuity correction
> 
> data:  A and B
> W = 26.5, p-value = 0.233
> alternative hypothesis: true location shift is not equal to 0
> 
> Jim
> 
> On Wed, Mar 20, 2019 at 3:48 AM javed khan <javedbtk111 using gmail.com> wrote:
>> 
>> Hi
>> 
>> This is my function:
>> 
>> wilcox.test(A,B, data = data, paired = FALSE)
>> 
>> It gives me high p value, though the median of A column is 6900 and B
>> column is 3500.
>> 
>> Why it gives p value high if there is a difference in the median?
>> 
>> Regards