(PR#1007) [Rd] ks.test doesn't compute correct empirical

ripley@stats.ox.ac.uk ripley@stats.ox.ac.uk
Sun, 1 Jul 2001 08:50:39 +0200 (MET DST)


On Sun, 1 Jul 2001 mcdowella@mcdowella.demon.co.uk wrote:

> Full_Name: Andrew Grant McDowell
> Version: R 1.1.1 (but source in 1.3.0 looks fishy as well)
> OS: Windows 2K Professional (Consumer)
> Submission from: (NULL) (194.222.243.209)

Please upgrade: we've found a number of Win2k bugs and worked around them
since then, let alone teh bug fixes and improvements in R ....


> In article <xeQ_6.1949$xd.353840@typhoon.snet.net>,
> johnt@tman.dnsalias.com writes
> >Can someone help?  In R, I am generating a vector of 1000 samples from
> >Bin (1000, 0.25).  I then do a Kolmogorov Smirnov test to test if the
> >vector has been drawn from a population of Bin (1000, 0.25).  I would
> >expect a reasonably high p-value.....

You do realize that the Kolmogorov tests (and the Kolmogorov-Smirnov
extension) assume continuous distributions, so the distribution theory
is not valid in this case?

S-PLUS does stop you doing this:

> ks.gof(o, dist="binomial", size=100, prob=0.25)
Problem in not.cont1(ttest = d.test, nx = nx, alt.ex..: For testing
discrete distributions when sample size > 50, use the
       Chi-square test


> >Either I am doing something wrong in R, or I am misunderstanding how this
> >test should work (both quite possible)...
> >
> >
> >Thanks,
> >JT..
> >
> >
> >
> >> #### 1000 random samples from binomial dist with mean =.25, n=100...
> >> o<-rbinom (1000, 100, .25)
> >> mean (o);
> >[1] 25.178
> >> var (o);
> >[1] 19.61193
> >> ks.test (o, "pbinom", 100, .25);
> >
> >        One-sample Kolmogorov-Smirnov test
> >
> >data:  o
> >D = 0.0967, p-value = 1.487e-08
> >alternative hypothesis: two.sided
> >
> >
> >
> >p-value is mighty small, leading me to reject the null hypothesis that
> >the sample has been drawn from the Bin(100, 0.25) distribution!!!

That's OK.  That's not what you tested (see above).

An S language point: the `;' are unnecessary.


> Some more oddities:
>
> > o<-rbinom(10000, 1, 0.25)
> > ks.test(o, "pbinom", 1, 0.25)
>
>          One-sample Kolmogorov-Smirnov test
>
> data:  o
> D = 0.75, p-value = < 2.2e-16
> alternative hypothesis: two.sided
>
> > length(o[o==0])
> [1] 7491
> > length(o[o==1])
> [1] 2509
> > o<-rep(0,10000)
> > ks.test(o, "pbinom", 1, 0.25)
>
>          One-sample Kolmogorov-Smirnov test
>
> data:  o
> D = 0.75, p-value = < 2.2e-16
> alternative hypothesis: two.sided
>
> > length(o[o==0])
> [1] 10000
> > length(o[o==1])
> [1] 0
>
> Here zeroing out the data does not change the reported D value

Nor does it change the maximum discrepancy.

> ks.test(rep(1,10000), "pbinom", 1, 0.25)

        One-sample Kolmogorov-Smirnov test

data:  rep(1, 10000)
D = 1, p-value = < 2.2e-16
alternative hypothesis: two.sided

shows 0 is special here.

>
> After playing about with
> ks.test(c(rep(0, X), rep(1, 1000-x)), "pbinom", 1, p)
> for a bit I conjecture that ks.test() takes no account
> whatsoever of ties, but merely sorts the input values
> and looks for max (position/N - pbinom(value, 1, p)).
> Anybody got the source handy?
>
> After 30 minutes of download, the relevant part of ks.test.R would appear to be

Eh? Just type ks.test in your R session for the source ....


-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._