[R] KS Test question (2)

Thu Aug 5 19:05:46 CEST 2010

The warning (with an error you would not see any results) means that there are ties in your data, the theory behind the ks test says that the probability of seeing ties is 0, so your data and the theory do not match, therefore the p-value is suspect (though an ok approximation for some uses).

These types of tests are useful for showing differences (often in a non meaningful way), not similarities.  You really need to decide what you mean by similar.

Consider two population distributions, the first is the standard uniform with density height equal to 1 between 0 and 1 (0 elsewhere), the 2nd distribution has height 1 from 0 to 0.99 and from 99.99 to 100 (0 elsewhere), are these 2 populations similar?  By some measures they are (the ks statistic for one), by other measures they are not (comparing mean and variance as an example).  Whether they are similar or not really depends on what you want to do with them.

One additional "test" you might consider is use the vis.test function in the TeachingDemos package, write a function that will either draw a standard qqplot of your 2 datasets, or pools them together then splits them randomly and creates the qqplot.  Use this with vis.test, if you cannot pick out the real dataset then it is less likely to matter if you interchange them.  (this assumes 2 random samples from the respective populations, if there is something more going on then you will need to come up with a different comparison that accounts for any structure).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ralf B
> Sent: Wednesday, August 04, 2010 3:50 PM
> To: r-help at r-project.org
> Subject: [R] KS Test question (2)
> 
> Hi R Users,
> 
> I have two vectors, x and y, of equal length representing two types of
> data from two studies. I would like to test if they are similar enough
> to use them interchangeably. No assumptions about distributions can be
> made (initial tests clearly show that they are not normal).
> Here some result:
> 
> Two-sample Kolmogorov-Smirnov test
> 
> data:  x and y
> D = 0.1091, p-value < 2.2e-16
> alternative hypothesis: two-sided
> 
> Warning message:
> In ks.test(x[1:nx], y[1:nx], exact = FALSE) :
>   cannot compute correct p-values with ties
> 
> Here some questions:
> 
> a) What does the error message means and what does it imply?
> b) The data is very noisy and the initial result shows that there is
> no relation between x and y. Is there a way to calculate and effect
> size?
> c) Can the p-value be used, when running tests over a large amount of
> different data sets, as a metric for ranking similarity between x and
> y data sets?
> 
> Best
> R.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.