[R] high p values
Marc Schwartz
m@rc_@chw@rtz @end|ng |rom me@com
Wed Mar 20 00:16:59 CET 2019
Hi,
Since folks are taking the time to point out some subtle issues here, taking an example from the UCLA Stats web site:
https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-why-is-the-mann-whitney-significant-when-the-medians-are-equal/
Grp1 <- rep(c(-2, 0, 5), each = 20)
Grp2 <- rep(c(-1, 0, 10), each = 20)
> Grp1
[1] -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 0 0
[23] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5
[45] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
> Grp2
[1] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0
[23] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 10 10 10
[45] 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
> median(Grp1)
[1] 0
> median(Grp2)
[1] 0
> wilcox.test(Grp1, Grp2)
Wilcoxon rank sum test with continuity correction
data: Grp1 and Grp2
W = 1400, p-value = 0.03096
alternative hypothesis: true location shift is not equal to 0
So, in contrast to the original problem, here is an example where you have equal medians, but a significant test result.
The key concept is that the Wilcoxon Rank Sum test is not strictly a test of differences in medians. That is, the null hypothesis for the test is not that the medians are equal, and you are either accepting or rejecting that null.
Javed, I would suggest spending some time with a good tutorial on non-parametric statistics.
Regards,
Marc Schwartz
> On Mar 19, 2019, at 6:25 PM, Jim Lemon <drjimlemon using gmail.com> wrote:
>
> Hi Javed,
> Easy.
>
> A<-c(2000,2100,2300,2400,6900,7000,7040,7050,7060)
> median(A)
> [1] 6900
> B<-c(3300,3350,3400,3450,3500,7000,7100,7200,7300)
> median(B)
> [1] 3500
> wilcox.test(A,B,paired=FALSE)
>
> Wilcoxon rank sum test with continuity correction
>
> data: A and B
> W = 26.5, p-value = 0.233
> alternative hypothesis: true location shift is not equal to 0
>
> Jim
>
> On Wed, Mar 20, 2019 at 3:48 AM javed khan <javedbtk111 using gmail.com> wrote:
>>
>> Hi
>>
>> This is my function:
>>
>> wilcox.test(A,B, data = data, paired = FALSE)
>>
>> It gives me high p value, though the median of A column is 6900 and B
>> column is 3500.
>>
>> Why it gives p value high if there is a difference in the median?
>>
>> Regards
More information about the R-help
mailing list