[RsR] About the inconsistency p-value of two sample Kolmogorov–Smirnov test in R and Matlab

Phillip Alday ph||||p@@|d@y @end|ng |rom mp|@n|
Sun Nov 24 18:05:08 CET 2019


Both MATLAB and R generate the same test statistic, but differ in their
p-values. For a p-value that small, you're getting to the point where
the probabilities involved are on the order of a random bit flip in your
computer's memory, so I wouldn't worry about that too much. A quick look
at the R documentation suggests that the approximation used can break
down for small sample sizes and that an exact test is only possible with
more data. MATLAB didn't provide as much detail about how they converted
the test statistic to p-value.

In other words, don't worry about it. You should be trying to interpret
p values as a precise number anyway, especially on such a small sample.


Best,

Phillip

On 19/11/2019 22:35, Zhou, Xionghui wrote:
> Dear Phillip,
> Thanks for your reply. If I use two-sample KS test in R to test whether one vector is significantly smaller than another vector, I use the command below as a demo:  p = ks.test((1:10), (20:200), alternative = "greater"). The p is 5.873e-09. However, If I use Matlab to do the same case: [~,p]=kstest2((1:10)',(20:200)','tail','larger'). P-value is 8.2222e-10. The differences are also present in other method, such as Wilcoxon rank sum test and Poission test. Thanks!
>
> Regards,
>
>
> Xionghui 
>
> On 11/18/19, 11:26 AM, "Phillip Alday" <phillip.alday using mpi.nl> wrote:
>
>     Dear Xionghui,
>     
>     Cross-posting to two lists simultaneously generally isn't desirable.
>     
>     Can you provide a minimum working example (code+data) for this? 
>     Otherwise, it's hard to see what's going on.
>     
>     Best,
>     Phillip
>     
>     
>     
>     On 11/11/19 9:50 pm, Zhou, Xionghui wrote:
>     > Hi guys,
>     > 
>     > 
>     > Recently, when I try to repeat my method using R, which was implemented in Matlab before, I found that the p-values for the two-sample ks-test in the two languages are different, even with the same data and parameters (The p-value in R is greater than the one in Matlab). In the meanwhile, the p-values of two-sample ks-test are the same in Matlab and python. In addition, I also test the p-value in Mann-Whitney-Wilcoxon test and Poission test, the p-values for the tests in R and Matlab are also different. Of course, the difference in two-sample ks-test is the most significant. May anyone tell me the reason for that and which language is more reliable? Thanks in advance!
>     > 
>     > 
>     > Best,
>     > 
>     > 
>     > 
>     > Xionghui
>     > 
>     > 
>     > Xionghui Zhou Ph.D.
>     > Research Fellow
>     > Division of Human Genetics
>     > Cincinnati Children�s Hospital Medical Center
>     > 
>     > Phone: +1 (513) 636-4200
>     > Email: Xionghui.Zhou using cchmc.org<mailto:Yaping.Liu using cchmc.org>
>     > Office: R1.1026
>     > 3333 Burnet Ave
>     > Cincinnati, OH 45229
>     > 
>     > 
>     > 	[[alternative HTML version deleted]]
>     > 
>     > 
>     > _______________________________________________
>     > R-SIG-Robust using r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>     > 
>     
>



More information about the R-SIG-Robust mailing list