Hypergeometric {stats} R Documentation

## The Hypergeometric Distribution

### Description

Density, distribution function, quantile function and random generation for the hypergeometric distribution.

### Usage

dhyper(x, m, n, k, log = FALSE)
phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE)
rhyper(nn, m, n, k)


### Arguments

 x, q vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls. m the number of white balls in the urn. n the number of black balls in the urn. k the number of balls drawn from the urn, hence must be in 0,1,…, m+n. p probability, it must be between 0 and 1. nn number of observations. If length(nn) > 1, the length is taken to be the number required. log, log.p logical; if TRUE, probabilities p are given as log(p). lower.tail logical; if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x].

### Details

The hypergeometric distribution is used for sampling without replacement. The density of this distribution with parameters m, n and k (named Np, N-Np, and n, respectively in the reference below, where N := m+n is also used in other references) is given by

p(x) = choose(m, x) choose(n, k-x) / choose(m+n, k)

for x = 0, …, k.

Note that p(x) is non-zero only for max(0, k-n) <= x <= min(k, m).

With p := m/(m+n) (hence Np = N \times p in the reference's notation), the first two moments are mean

E[X] = μ = k p

and variance

Var(X) = k p (1 - p) * (m+n-k)/(m+n-1),

which shows the closeness to the Binomial(k,p) (where the hypergeometric has smaller variance unless k = 1).

The quantile is defined as the smallest value x such that F(x) ≥ p, where F is the distribution function.

In rhyper(), if one of m, n, k exceeds .Machine\$integer.max, currently the equivalent of qhyper(runif(nn), m,n,k) is used which is comparably slow while instead a binomial approximation may be considerably more efficient.

### Value

dhyper gives the density, phyper gives the distribution function, qhyper gives the quantile function, and rhyper generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rhyper, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

### Source

dhyper computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

phyper is based on calculating dhyper and phyper(...)/dhyper(...) (as a summation), based on ideas of Ian Smith and Morten Welinder.

qhyper is based on inversion (of an earlier phyper() algorithm).

rhyper is based on a corrected version of

Kachitvichyanukul, V. and Schmeiser, B. (1985). Computer generation of hypergeometric random variates. Journal of Statistical Computation and Simulation, 22, 127–145.

### References

Johnson, N. L., Kotz, S., and Kemp, A. W. (1992) Univariate Discrete Distributions, Second Edition. New York: Wiley.

Distributions for other standard distributions.

### Examples

m <- 10; n <- 7; k <- 8
x <- 0:(k+1)
rbind(phyper(x, m, n, k), dhyper(x, m, n, k))
all(phyper(x, m, n, k) == cumsum(dhyper(x, m, n, k)))  # FALSE
## but error is very small:
signif(phyper(x, m, n, k) - cumsum(dhyper(x, m, n, k)), digits = 3)


[Package stats version 4.1.0 Index]