R: Distribution of the Smirnov Statistic

Smirnov {stats}

R Documentation

Distribution of the Smirnov Statistic

Description

Distribution function, quantile function and random generation for the distribution of the Smirnov statistic.

Usage

psmirnov(q, sizes, z = NULL,
         alternative = c("two.sided", "less", "greater"),
         exact = TRUE, simulate = FALSE, B = 2000,
         lower.tail = TRUE, log.p = FALSE)
qsmirnov(p, sizes, z = NULL,
         alternative = c("two.sided", "less", "greater"),
         exact = TRUE, simulate = FALSE, B = 2000)
rsmirnov(n, sizes, z = NULL,
         alternative = c("two.sided", "less", "greater"))

Arguments

q

a numeric vector of quantiles.

p

a numeric vector of probabilities.

sizes

an integer vector of length two giving the sample sizes.

z

a numeric vector of the pooled data values in both samples when the exact conditional distribution of the Smirnov statistic given the data shall be computed.

alternative

one of "two.sided" (default), "less", or "greater" indicating whether absolute (two-sided, default) or raw (one-sided) differences of frequencies define the test statistic. See ‘Details’.

exact

NULL or a logical indicating whether the exact (conditional on the pooled data values in z) distribution or the asymptotic distribution should be used.

simulate

a logical indicating whether to compute the distribution function by Monte Carlo simulation.

B

an integer specifying the number of replicates used in the Monte Carlo test.

lower.tail

a logical, if TRUE (default), probabilities are P[D < q], otherwise, P[D \ge q].

log.p

a logical, if TRUE (default), probabilities are given as log-probabilities.

n

an integer giving number of observations.

Details

For samples x and y with respective sizes n_x and n_y and empirical cumulative distribution functions F_{x,n_x} and F_{y,n_y}, the Smirnov statistic is

D = \sup_c | F_{x,n_x}(c) - F_{y,n_y}(c) |

in the two-sided case,

D^+ = \sup_c ( F_{x,n_x}(c) - F_{y,n_y}(c) )

in the one-sided "greater" case, and

D^- = \sup_c ( F_{y,n_y}(c) - F_{x,n_x}(c) )

in the one-sided "less" case.

These statistics are used in the Smirnov test of the null that x and y were drawn from the same distribution, see ks.test.

If the underlying common distribution function F is continuous, the distribution of the test statistics does not depend on F, and has a simple asymptotic approximation. For arbitrary F, one can compute the conditional distribution given the pooled data values z of x and y, either exactly (feasible provided that the product n_x n_y of the sample sizes is “small enough”) or approximately Monte Carlo simulation. If the pooled data values z are not specified, a pooled sample without ties is assumed.

Value