[R] Kolmogorov-Smirnov test

Greg Snow Greg.Snow at imail.org
Sat Sep 24 00:51:06 CEST 2011

Are you doing the 2 sample KS test? Comparing if 2 samples come from the same distribution?

With 3,000 points you will still likely have power to find meaningless differences, what exactly are you trying to accomplish by doing the comparison?

I am really only familiar with the KS test done in R (which did not make your list, yet you are asking on an R mailing list).  Differences could be due to errors, different assumptions, different algorithms, sunspots, or any number of other things.  Are the differences meaningful?  R lets you see exactly what it is doing so you can check errors/assumptions/algorithms, I don't know about the ones you show.

You will need to ask someone who knows the programs you reference to determine what input they are expecting.  R expects the raw data. 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of rommel
Sent: Friday, September 23, 2011 7:51 AM
To: r-help at r-project.org
Subject: Re: [R] Kolmogorov-Smirnov test

Dear Dr. Snow,

I would like to ask for help on my three questions regarding Kolmogorov
Smirnov test.

'With a sample size over 10,000 you will have power to detect differences
that are not practically meaningful. '
    -Is sample size of 3000 for each sample okay for using Kolmogorov
Smirnov test?

I am checking whether my KS procedure is correct. 
I have compared results of KS tests using the following 3 softwares:
1. Statistica
2. http://www.wessa.net/rwasp_Reddy-Moores%20K-S%20Test.wasp
3. http://www.physics.csbsju.edu/stats/KS-test.html

I have observed that the three softwares produced the same results only if
the samples sizes are equal. 
However, when samples are not equal, I did not get similar results
particularly from the wessa.net calculator.
Is it allowed to do a KS test to compare samples with unequal sizes?

Is it allowed to use the raw data values in doing KS test? Or should I use
the frequencies obtained from frequency distribution table of the raw data
from each sample?
I think that when I use the frequency, the KS test will construct new
cumulative fractions from the frequencies, which I think is not right. 

Hope you can assist me. Thanks!


View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3836910.html
Sent from the R help mailing list archive at Nabble.com.

R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list