# [R] ks.test - continuous vs discrete

David Middleton dmiddleton at fisheries.gov.fk
Tue Mar 26 18:23:21 CET 2002

```I frequently want to test for differences between animal size frequency
distributions.  The obvious test (I think) to use is the Kolmogorov-Smirnov
two sample test (provided in R as the function ks.test in package ctest).
The KS test is for continuous variables and this obviously includes length,
weight etc.  However, limitations in measuring (e.g length to the nearest
cm/mm, weight to the nearest g/mg etc) has the obvious effect of
"discretising" real data.

The ks.test function checks for the presence of ties noting in the help page
that "continuous distributions do not generate them".  Given the problem of
"measuring to the nearest..." noted above I frequently find that my data has
ties and ks.test generates a warning.
I was interested to note that the example of a two-sample KS test given in
Sokal & Rohlf's "Biometry" (I have the 2nd edition where the example is on
p.441) has exactly the same problem:
> A <- c(104,109,112,114,116,118,118,117,121,123,125,126,126,128,128,128)
> B <- c(100,105,107,107,108,111,116,120,121,123)
> ks.test(A,B)

Two-sample Kolmogorov-Smirnov test

data:  A and B
D = 0.475, p-value = 0.1244
alternative hypothesis: two.sided

Warning message:
cannot compute correct p-values with ties in: ks.test(A, B)
In their chapter 2, "Data in Biology", Sokal & Rohlf note "any given reading
of a continuous variable ... is therefore an approximation to the exact
reading, which is in practice unknowable.  However, for the purposes of
computation these approximations are usually sufficient..."
I am interested to know whether this can be made more exact.  Are there
methods to test that data are measured at an appropriate scale so as to be
regarded as sufficiently continuous for a KS test, or is common sense choice
of measurement precision widely regarded as sufficient?