[R] ks.test - continuous vs discrete

Jason W. Martinez jmartinez at uia.net
Thu Mar 28 16:43:10 CET 2002

```Hello,

You may want to check out Handcock and Morris's book and R/splus code on
``relative distribution methods.''

See their website for more info. Last time I checked, the documentation for
their code was somewhat lacking, though.

http://www.stat.washington.edu/~handcock/RelDist/

jason

On Thursday 28 March 2002 02:48 am, David Middleton wrote:
> Thanks for the input, and sorry for the delay in returning to the thread.
>
> > > I frequently want to test for differences between animal size frequency
> > > distributions.  The obvious test (I think) to use is the
>
> Kolmogorov-Smirnov
>
> > > two sample test (provided in R as the function ks.test in package
>
> ctest).
>
> > "obvious" depends on the problem you want to test: KS tests the
> > hypothesis
> >
> > H_0: F(z) = G(z) for all z vs. H_1: F(z) != G(z) for at least one z
> >
> > ks.test assumes that both F and G are continuous variables. However, if
> > you want to test
> >
> > H_0: F(z) = G(z)  vs. H_1: F(z) = G(z - delta); delta != 0
> >
> > as "test for differences" indicates, the Wilcoxon rank sum test is
> > "obvious". Or, more general, if your hypothesis is "exchangeability", a
> > permutation test can be used.
>
> Apologies for my vague description.  The Wilcoxon rank sum test is a test
> of difference in location, as is the permutation test I believe.  I am
> interested in more than just location - the animal size distributions I
> have in mind are often multimodal, encompassing different cohorts for
> example - so I am interested in a more general test of differences in the
> distributions, both for exploratory purposes and too see if it is
> appropriate to lump samples.  Thus the KS test seems the "obvious" choice.
>
> > > The KS test is for continuous variables and this obviously includes
>
> length,
>
> > > weight etc.  However, limitations in measuring (e.g length to the
>
> nearest
>
> > > cm/mm, weight to the nearest g/mg etc) has the obvious effect of
> > > "discretising" real data.
> >
> > or maybe the underlying distribution is discrete?
>
> In the case I described (animal size) it is pretty clear that the variable
> is continuous, and likewise the underlying distribution.  The ties really
> are the result of rounding error.
>
> Off list both Don MacQueen and Ross Darnell came up with the idea of
> "jittering" the values (adding a random number form a uniform distribution
> half the width of the measurement unit) to remove the ties, and re-testing
> to see if the rounding was influencing the results.  This seems to be what
> I need.
>
> David Middleton
>
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>.-.- r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or
> "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```