[R] ks.test - continuous vs discrete
Jason W. Martinez
jmartinez at uia.net
Thu Mar 28 16:43:10 CET 2002
You may want to check out Handcock and Morris's book and R/splus code on
``relative distribution methods.''
See their website for more info. Last time I checked, the documentation for
their code was somewhat lacking, though.
On Thursday 28 March 2002 02:48 am, David Middleton wrote:
> Thanks for the input, and sorry for the delay in returning to the thread.
> > > I frequently want to test for differences between animal size frequency
> > > distributions. The obvious test (I think) to use is the
> > > two sample test (provided in R as the function ks.test in package
> > "obvious" depends on the problem you want to test: KS tests the
> > hypothesis
> > H_0: F(z) = G(z) for all z vs. H_1: F(z) != G(z) for at least one z
> > ks.test assumes that both F and G are continuous variables. However, if
> > you want to test
> > H_0: F(z) = G(z) vs. H_1: F(z) = G(z - delta); delta != 0
> > as "test for differences" indicates, the Wilcoxon rank sum test is
> > "obvious". Or, more general, if your hypothesis is "exchangeability", a
> > permutation test can be used.
> Apologies for my vague description. The Wilcoxon rank sum test is a test
> of difference in location, as is the permutation test I believe. I am
> interested in more than just location - the animal size distributions I
> have in mind are often multimodal, encompassing different cohorts for
> example - so I am interested in a more general test of differences in the
> distributions, both for exploratory purposes and too see if it is
> appropriate to lump samples. Thus the KS test seems the "obvious" choice.
> > > The KS test is for continuous variables and this obviously includes
> > > weight etc. However, limitations in measuring (e.g length to the
> > > cm/mm, weight to the nearest g/mg etc) has the obvious effect of
> > > "discretising" real data.
> > or maybe the underlying distribution is discrete?
> In the case I described (animal size) it is pretty clear that the variable
> is continuous, and likewise the underlying distribution. The ties really
> are the result of rounding error.
> Off list both Don MacQueen and Ross Darnell came up with the idea of
> "jittering" the values (adding a random number form a uniform distribution
> half the width of the measurement unit) to remove the ties, and re-testing
> to see if the rounding was influencing the results. This seems to be what
> I need.
> David Middleton
>.-.- r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help