Smirnov {stats} R Documentation

## Distribution of the Smirnov Statistic

### Description

Distribution function, quantile function and random generation for the distribution of the Smirnov statistic.

### Usage

psmirnov(q, sizes, z = NULL, two.sided = TRUE,
exact = TRUE, simulate = FALSE, B = 2000,
lower.tail = TRUE, log.p = FALSE)
qsmirnov(p, sizes, z = NULL, two.sided = TRUE,
exact = TRUE, simulate = FALSE, B = 2000)
rsmirnov(n, sizes, z = NULL, two.sided = TRUE)


### Arguments

 q a numeric vector of quantiles. p a numeric vector of probabilities. sizes an integer vector of length two giving the sample sizes. z a numeric vector of the pooled data values in both samples when the exact conditional distribution of the Smirnov statistic given the data shall be computed. two.sided a logical indicating whether absolute (TRUE) or raw differences of frequencies define the test statistic. exact NULL or a logical indicating whether the exact (conditional on the pooled data values in z) distribution or the asymptotic distribution should be used. simulate a logical indicating whether to compute the distribution function by Monte Carlo simulation. B an integer specifying the number of replicates used in the Monte Carlo test. lower.tail a logical, if TRUE (default), probabilities are P[D < q], otherwise, P[D \ge q]. log.p a logical, if TRUE (default), probabilities are given as log-probabilities. n an integer giving number of observations.

### Details

For samples x and y with respective sizes n_x and n_y and empirical cumulative distribution functions F_{x,n_x} and F_{y,n_y}, the Smirnov statistic is

D = \sup_c | F_{x,n_x}(c) - F_{y,n_y}(c) |

in the two-sided case and

D = \sup_c ( F_{x,n_x}(c) - F_{y,n_y}(c) )

otherwise.

These statistics are used in the Smirnov test of the null that x and y were drawn from the same distribution, see ks.test.

If the underlying common distribution function F is continuous, the distribution of the test statistics does not depend on F, and has a simple asymptotic approximation. For arbitrary F, one can compute the conditional distribution given the pooled data values z of x and y, either exactly (feasible provided that the product n_x n_y of the sample sizes is “small enough”) or approximately Monte Carlo simulation. If the pooled data values z are not specified, a pooled sample without ties is assumed.

### Value

psmirnov gives the distribution function, qsmirnov gives the quantile function, and rsmirnov generates random deviates.

ks.test for references on the algorithms used for computing exact distributions.