ks.test {stats}  R Documentation 
Perform a one or twosample KolmogorovSmirnov test.
ks.test(x, ...)
## Default S3 method:
ks.test(x, y, ...,
alternative = c("two.sided", "less", "greater"),
exact = NULL, simulate.p.value = FALSE, B = 2000)
## S3 method for class 'formula'
ks.test(formula, data, subset, na.action, ...)
x 
a numeric vector of data values. 
y 
either a numeric vector of data values, or a character string
naming a cumulative distribution function or an actual cumulative
distribution function such as 
... 
for the default method, parameters of the distribution
specified (as a character string) by 
alternative 
indicates the alternative hypothesis and must be
one of 
exact 

simulate.p.value 
a logical indicating whether to compute pvalues by Monte Carlo simulation. (Ignored for the onesample test.) 
B 
an integer specifying the number of replicates used in the Monte Carlo test. 
formula 
a formula of the form 
data 
an optional matrix or data frame (or similar: see

subset 
an optional vector specifying a subset of observations to be used. 
na.action 
a function which indicates what should happen when
the data contain 
If y
is numeric, a twosample (Smirnov) test of the null
hypothesis that x
and y
were drawn from the same
distribution is performed.
Alternatively, y
can be a character string naming a continuous
(cumulative) distribution function, or such a function. In this case,
a onesample (Kolmogorov) test is carried out of the null that the
distribution function which generated x
is distribution
y
with parameters specified by ...
.
The presence of ties always generates a warning in the onesample case, as continuous
distributions do not generate them. If the ties arose from rounding
the tests may be approximately valid, but even modest amounts of
rounding can have a significant effect on the calculated statistic.
Missing values are silently omitted from x
and (in the
twosample case) y
.
The possible values "two.sided"
, "less"
and
"greater"
of alternative
specify the null hypothesis
that the true cumulative distribution function (CDF) of x
is equal
to, not less than or not greater than the hypothesized CDF (onesample
case) or the CDF of y
(twosample case), respectively. The test
compares the CDFs taking their maximal difference as test statistic,
with the statistic in the "greater"
alternative being
D^+ = \max_u [ F_x(u)  F_y(u) ]
.
Thus in the twosample case alternative = "greater"
includes
distributions for which x
is stochastically smaller than
y
(the CDF of x
lies above and hence to the left of that
for y
), in contrast to t.test
or
wilcox.test
.
Exact pvalues are not available for the onesample case in the
presence of ties.
If exact = NULL
(the default), an
exact pvalue is computed if the sample size is less than 100 in the
onesample case and there are no ties, and if the product of
the sample sizes is less than 10000 in the twosample case, with or
without ties (using the algorithm described in Schröer and Trenkler, 1995).
Otherwise, the pvalue is computed via Monte Carlo simulation in the
twosample case if simulate.p.value
is TRUE
, or else
asymptotic distributions are used whose approximations may
be inaccurate in small samples. In the onesample twosided case,
exact pvalues are obtained as described in Marsaglia, Tsang & Wang
(2003) (but not using the optional approximation in the right tail, so
this can be slow for small pvalues). The formula of Birnbaum &
Tingey (1951) is used for the onesample onesided case.
If a onesample test is used, the parameters specified in
...
must be prespecified and not estimated from the data.
There is some more refined distribution theory for the KS test with
estimated parameters (see Durbin, 1973), but that is not implemented
in ks.test
.
A list inheriting from classes "ks.test"
and "htest"
containing the following components:
statistic 
the value of the test statistic. 
p.value 
the pvalue of the test. 
alternative 
a character string describing the alternative hypothesis. 
method 
a character string indicating what type of test was performed. 
data.name 
a character string giving the name(s) of the data. 
The twosided onesample distribution comes via Marsaglia, Tsang and Wang (2003).
Exact distributions for the twosample (Smirnov) test are computed by the algorithm proposed by Schröer (1991) and Schröer & Trenkler (1995) using numerical improvements along the lines of Viehmann (2021).
Z. W. Birnbaum and Fred H. Tingey (1951). Onesided confidence contours for probability distribution functions. The Annals of Mathematical Statistics, 22/4, 592–596. doi:10.1214/aoms/1177729550.
William J. Conover (1971). Practical Nonparametric Statistics. New York: John Wiley & Sons. Pages 295–301 (onesample Kolmogorov test), 309–314 (twosample Smirnov test).
Durbin, J. (1973). Distribution theory for tests based on the sample distribution function. SIAM.
W. Feller (1948). On the KolmogorovSmirnov limit theorems for empirical distributions. The Annals of Mathematical Statistics, 19(2), 177–189. doi:10.1214/aoms/1177730243.
George Marsaglia, Wai Wan Tsang and Jingbo Wang (2003). Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8/18. doi:10.18637/jss.v008.i18.
Gunar Schröer (1991). Computergestützte statistische Inferenz am Beispiel der KolmogorovSmirnov Tests. Diplomarbeit Universität Osnabrück.
Gunar Schröer and Dietrich Trenkler (1995). Exact and Randomization Distributions of KolmogorovSmirnov Tests for Two or Three Samples. Computational Statistics & Data Analysis, 20(2), 185–202. doi:10.1016/01679473(94)00040P.
Thomas Viehmann (2021). Numerically more stable computation of the pvalues for the twosample KolmogorovSmirnov test. https://arxiv.org/abs/2102.08037.
shapiro.test
which performs the ShapiroWilk test for
normality.
require("graphics")
x < rnorm(50)
y < runif(30)
# Do x and y come from the same distribution?
ks.test(x, y)
# Does x come from a shifted gamma distribution with shape 3 and rate 2?
ks.test(x+2, "pgamma", 3, 2) # twosided, exact
ks.test(x+2, "pgamma", 3, 2, exact = FALSE)
ks.test(x+2, "pgamma", 3, 2, alternative = "gr")
# test if x is stochastically larger than x2
x2 < rnorm(50, 1)
plot(ecdf(x), xlim = range(c(x, x2)))
plot(ecdf(x2), add = TRUE, lty = "dashed")
t.test(x, x2, alternative = "g")
wilcox.test(x, x2, alternative = "g")
ks.test(x, x2, alternative = "l")
# with ties, example from Schröer and Trenkler (1995)
# D = 3/7, p = 8/33 = 0.242424..
ks.test(c(1, 2, 2, 3, 3),
c(1, 2, 3, 3, 4, 5, 6))# > exact
# formula interface, see ?wilcox.test
ks.test(Ozone ~ Month, data = airquality,
subset = Month %in% c(5, 8))