[R] ks.test - continuous vs discrete
Torsten Hothorn
Torsten.Hothorn at rzmail.uni-erlangen.de
Wed Mar 27 16:15:11 CET 2002
>
> I frequently want to test for differences between animal size frequency
> distributions. The obvious test (I think) to use is the Kolmogorov-Smirnov
> two sample test (provided in R as the function ks.test in package ctest).
"obvious" depends on the problem you want to test: KS tests the hypothesis
H_0: F(z) = G(z) for all z vs. H_1: F(z) != G(z) for at least one z
ks.test assumes that both F and G are continuous variables. However, if
you want to test
H_0: F(z) = G(z) vs. H_1: F(z) = G(z - delta); delta != 0
as "test for differences" indicates, the Wilcoxon rank sum test is
"obvious". Or, more general, if your hypothesis is "exchangeability", a
permutation test can be used.
> The KS test is for continuous variables and this obviously includes length,
> weight etc. However, limitations in measuring (e.g length to the nearest
> cm/mm, weight to the nearest g/mg etc) has the obvious effect of
> "discretising" real data.
or maybe the underlying distribution is discrete?
Anyway: ks.test and wilcox.test in ctest assume data from continuous
distributions and the normal approximation is used if ties occur.
For the Wilcoxon and permutation test, the conditional distribution (that
is: conditional on the ties) can be computed using the exactRankTests
package.
>
> The ks.test function checks for the presence of ties noting in the help page
> that "continuous distributions do not generate them". Given the problem of
> "measuring to the nearest..." noted above I frequently find that my data has
> ties and ks.test generates a warning.
> I was interested to note that the example of a two-sample KS test given in
> Sokal & Rohlf's "Biometry" (I have the 2nd edition where the example is on
> p.441) has exactly the same problem:
> > A <- c(104,109,112,114,116,118,118,117,121,123,125,126,126,128,128,128)
> > B <- c(100,105,107,107,108,111,116,120,121,123)
For your example:
R> library(exactRankTests)
R> wilcox.exact(B, A)
Exact Wilcoxon rank sum test
data: B and A
W = 36.5, p-value = 0.02039
alternative hypothesis: true mu is not equal to 0
R> perm.test(B, A)
2-sample Permutation Test
data: B and A
T = 1118, p-value = 0.01864
alternative hypothesis: true mu is not equal to 0
Torsten
> > ks.test(A,B)
>
> Two-sample Kolmogorov-Smirnov test
>
> data: A and B
> D = 0.475, p-value = 0.1244
> alternative hypothesis: two.sided
>
> Warning message:
> cannot compute correct p-values with ties in: ks.test(A, B)
> In their chapter 2, "Data in Biology", Sokal & Rohlf note "any given reading
> of a continuous variable ... is therefore an approximation to the exact
> reading, which is in practice unknowable. However, for the purposes of
> computation these approximations are usually sufficient..."
> I am interested to know whether this can be made more exact. Are there
> methods to test that data are measured at an appropriate scale so as to be
> regarded as sufficiently continuous for a KS test, or is common sense choice
> of measurement precision widely regarded as sufficient?
> Any comments/references would be appreciated!
> David Middleton
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list