[Rd] ks.test doesn't compute correct empirical distribution if there are ties in the data (PR#1007)
mcdowella@mcdowella.demon.co.uk
mcdowella@mcdowella.demon.co.uk
Sun, 1 Jul 2001 07:53:54 +0200 (MET DST)
Full_Name: Andrew Grant McDowell
Version: R 1.1.1 (but source in 1.3.0 looks fishy as well)
OS: Windows 2K Professional (Consumer)
Submission from: (NULL) (194.222.243.209)
In article <xeQ_6.1949$xd.353840@typhoon.snet.net>,
johnt@tman.dnsalias.com writes
>Can someone help? In R, I am generating a vector of 1000 samples from
>Bin (1000, 0.25). I then do a Kolmogorov Smirnov test to test if the
>vector has been drawn from a population of Bin (1000, 0.25). I would
>expect a reasonably high p-value.....
>
>Either I am doing something wrong in R, or I am misunderstanding how this
>test should work (both quite possible)...
>
>
>Thanks,
>JT..
>
>
>
>> #### 1000 random samples from binomial dist with mean =.25, n=100...
>> o<-rbinom (1000, 100, .25)
>> mean (o);
>[1] 25.178
>> var (o);
>[1] 19.61193
>> ks.test (o, "pbinom", 100, .25);
>
> One-sample Kolmogorov-Smirnov test
>
>data: o
>D = 0.0967, p-value = 1.487e-08
>alternative hypothesis: two.sided
>
>
>
>p-value is mighty small, leading me to reject the null hypothesis that
>the sample has been drawn from the Bin(100, 0.25) distribution!!!
>
>
>
Some more oddities:
> o<-rbinom(10000, 1, 0.25)
> ks.test(o, "pbinom", 1, 0.25)
One-sample Kolmogorov-Smirnov test
data: o
D = 0.75, p-value = < 2.2e-16
alternative hypothesis: two.sided
> length(o[o==0])
[1] 7491
> length(o[o==1])
[1] 2509
> o<-rep(0,10000)
> ks.test(o, "pbinom", 1, 0.25)
One-sample Kolmogorov-Smirnov test
data: o
D = 0.75, p-value = < 2.2e-16
alternative hypothesis: two.sided
> length(o[o==0])
[1] 10000
> length(o[o==1])
[1] 0
Here zeroing out the data does not change the reported D value
After playing about with
ks.test(c(rep(0, X), rep(1, 1000-x)), "pbinom", 1, p)
for a bit I conjecture that ks.test() takes no account
whatsoever of ties, but merely sorts the input values
and looks for max (position/N - pbinom(value, 1, p)).
Anybody got the source handy?
--
A. G. McDowell
After 30 minutes of download, the relevant part of ks.test.R would appear to be
METHOD <- "One-sample Kolmogorov-Smirnov test"
n <- length(x)
x <- y(sort(x), ...) - (0 : (n-1)) / n
STATISTIC <- switch(alternative,
"two.sided" = max(c(x, 1/n - x)),
"greater" = max(1/n - x),
"less" = max(x))
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._