[R] Kolmogorov-Smirnov test

m.marcinmichal m.marcinmichal at gmail.com
Thu Apr 28 23:53:39 CEST 2011


Hi, 
thanks for response.

>> The Kolmogorov-Smirnov test is designed for distributions on continuous
>> variable, not discrete like the >> poisson.  That is why you are getting
>> some of your warnings. 

I read in "Fitting distributions whith R" Vito Ricci page 19  that: "...
Kolmogorov-Smirnov test is used to decide if a sample comes from a
population with a specific distribution. I can be applied both for discrete
(count) data and continuous binned (even if some Authors do not agree on
this point) and both for continuous variables" but in page 16 i read that
"... while the Kolmogorov-Smirnov and Anderson-Darling tests are restricted
to continuous distribution" and i was little confused, but try this test to
my discrete data. 

Generally in first step, I try fit my data to discret or continuous 
distribution (task: find distribution for emirical data). Question, Can I
approximate my discret data by the continuous  distribution? I know that
sometmies we can poisson distribution approxime by the normal distribution.
But what happen if I use another distribution like log normall or gama?

I done another three tests - chi square test. But this tests return three
another results. Suppose that we have the same data i.e vectorSentence.
Test:
1. One
param <- fitdistr(vectorSentence, "poisson")
chisq.test(table(vectorSentence), p = dpois(1:9, lambda=param[[1]][1]),
rescale.p = TRUE)

X-squared = 272.8958, df = 8, p-value < 2.2e-16

2. Two
library(vcd)
gf <- goodfit(vectorSentence, type="poisson", method="MinChisq")
summary(gf)

             X^2 df     P(> X^2)
Pearson 404.3607  8 2.186332e-82

3. Three
fdistc <- fitdist(vectorSentence, "pois")
g<-gofstat(fdistc, print.test = TRUE) 

Chi-squared statistic:  535.344 
Degree of freedom of the Chi-squared distribution:  8 
Chi-squared p-value:  1.824112e-110 

Question which results is correct?

I know that I can reject null hipotesis: data don't come from poisson
distribution. But which result is correct?

For another side I trying to accomplish another problem:
1. Suppose that we have a reference data (dr) from some process (pr) which
save in vectorSentence. 
2. Suppose that we have a two another sample data d1, d2 from another two
process p1, p2
3. We know that all data is discrete.

Task:
One: check if data d1, d2 is equal to reference data (dr) - this is not a
problem. I use a cdf, histogram, another mensure etc. chi square test. But
can I use Kolmogorov-Smirnov  to test cumulative distribution function 
hipotesis i.e F(d1) = F(d) for my data?
Two: find dr distributions discret or if possible continuous 

Best

Marcin M.


--
View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3482349.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list