[R] Kolmogorov-Smirnov test

Greg Snow Greg.Snow at imail.org
Mon Sep 26 19:45:17 CEST 2011


There are criteria to tell if differences are meaningless, but they come from the science and the researcher, not from statistics tests and algorithms.  Consider the question: "Is one second of difference important?"  to answer that needs a bunch of context.  One second can be a large period of time in nuclear physics or the 100 yard dash, but a small amount of time in geology or a marathon.  Consider the distribution function that is equal to 1 when 0 < x < 0.99 or 99.99 < x < 100 and 0 otherwise, is this distribution meaningfully different from the uniform between 0 and 1?  In some cases yes, others probably not (and some distribution tests would have an easier or harder time finding this difference).

As for the differences in output between the programs, when the sample sizes are the same the KS statistic is pretty straight forward, when they differ there needs to be some type of interpolation of one or both datasets to get the comparison points.  The differences you are seeing are probably due to differences in how that interpolation is being done.  If the differences are small and do not change the decision then I would not worry about them.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of rommel
> Sent: Saturday, September 24, 2011 2:30 AM
> To: r-help at r-project.org
> Subject: Re: [R] Kolmogorov-Smirnov test
> 
> Dear Dr. Snow,
>  
> Thank you for your reply.
>  
> 1. Are you doing the 2 sample KS test? Comparing if 2 samples come from
> the same distribution? -Yes, I am doing 2-sample KS test
>  
> 2. With 3,000 points you will still likely have power to find
> meaningless differences, what exactly are you trying to accomplish by
> doing the comparison? - I am comparing the swimming parameters of fish
> larvae such as move duration and move length.
> - The comparison is between treatments.
> -Sample sizes for example in one comparison pair :  Control (2700
> data pts) vs Medium (3012 pts)
>   Dmax = 0.07 p-level <0.001
> - Are there criteria to know if the differences are meaningless or not?
>  
> 3. I am really only familiar with the KS test done in R (which did not
> make your list, yet you are asking on an R mailing list). Differences
> could be due to errors, different assumptions, different algorithms,
> sunspots, or any number of other things. Are the differences
> meaningful? R lets you see exactly what it is doing so you can check
> errors/assumptions/algorithms, I don't know about the ones you show. -
> sorry i forgot to list the R. I thought wessa.net was using R already.
> but I also made the software comparisons using R. The results were:
>     with equal data points: results are the same in both
> Dmax and p-value
>     with unequal data points : conclusions from
> results were the same such that significant difference between samples
> holds through using different softwares. Only the Dmax and p-values
> differ a bit.
> (please see attached file for the comparisons).
>  
> 4. You will need to ask someone who knows the programs you reference to
> determine what input they are expecting. R expects the raw data.
> - Thanks! I expected this also.
>  
> Thank you.
>  
> -Rommel
>  
>  
>  
>  
> ----- Ursprüngliche Nachricht ----- Von: "Greg Snow-2 [via R]" <ml-
> node+s789695n3838250h62 at n4.nabble.com> Datum: Samstag, 24. September
> 2011, 12:52 am Betreff: Re: Kolmogorov-Smirnov test An: rommel
> <rmaneja at ifm-geomar.de>
> Are you doing the 2 sample KS test? Comparing if 2 samples come from
> the same distribution? With 3,000 points you will still likely have
> power to find meaningless differences, what exactly are you trying to
> accomplish by doing the comparison? I am really only familiar with the
> KS test done in R (which did not make your list, yet you are asking on
> an R mailing list).  Differences could be due to errors, different
> assumptions, different algorithms, sunspots, or any number of other
> things.  Are the differences meaningful?  R lets you see
> exactly what it is doing so you can check
> errors/assumptions/algorithms, I don't know about the ones you show.
> You will need to ask someone who knows the programs you reference to
> determine what input they are expecting.  R expects the raw data.
> -----Original Message----- From: [hidden email]  [mailto: [hidden
> email] ] On Behalf Of rommel Sent: Friday, September 23, 2011 7:51 AM
> To: [hidden email]  Subject: Re: [R] Kolmogorov-Smirnov test Dear Dr.
> Snow, I would like to ask for help on my three questions regarding
> Kolmogorov Smirnov test. 1. 'With a sample size over 10,000 you will
> have power to detect differences that are not practically meaningful. '
>     -Is sample size of 3000 for each sample okay for using
> Kolmogorov Smirnov test? 2. I am checking whether my KS procedure is
> correct. I have compared results of KS tests using the following 3
> softwares: 1. Statistica 2. http://www.wessa.net/rwasp_Reddy-
> Moores%20K-S%20Test.wasp 3. http://www.physics.csbsju.edu/stats/KS-
> test.html I have observed that the three softwares produced the same
> results only if the samples sizes are equal. However, when samples are
> not equal, I did not get similar results particularly from the
> wessa.net calculator. Is it allowed to do a KS test to compare samples
> with unequal sizes? 3. Is it allowed to use the raw data values in
> doing KS test? Or should I use the frequencies obtained from frequency
> distribution table of the raw data from each sample? I think that when
> I use the frequency, the KS test will construct new cumulative
> fractions from the frequencies, which I think is not right. Hope you
> can assist me. Thanks! -rommel   -- View this message in context:
> http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-
> tp3479506p3836910.html Sent from the R help mailing list archive at
> Nabble.com. ______________________________________________ [hidden
> email]  mailing list https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html and provide commented, minimal, self-contained, reproducible
> code. ______________________________________________ [hidden email]
> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> read the posting guide http://www.R-project.org/posting-guide.html and
> provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> If you reply to this email, your message will be added to the
> discussion below: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-
> test-tp3479506p3838250.html
> To unsubscribe from Kolmogorov-Smirnov test, click here .
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-
> Smirnov-test-tp3479506p3838937.html
> Sent from the R help mailing list archive at Nabble.com.
> 	[[alternative HTML version deleted]]



More information about the R-help mailing list