R: Distribution of the Wilcoxon Rank Sum Statistic

Wilcoxon {stats}

R Documentation

Distribution of the Wilcoxon Rank Sum Statistic

Description

Density, distribution function, quantile function and random generation for the distribution of the Wilcoxon rank sum statistic obtained from samples with size m and n, respectively.

Usage

dwilcox(x, m, n, log = FALSE)
pwilcox(q, m, n, lower.tail = TRUE, log.p = FALSE)
qwilcox(p, m, n, lower.tail = TRUE, log.p = FALSE)
rwilcox(nn, m, n)

Arguments

x, q

vector of quantiles.

p

vector of probabilities.

nn

number of observations. If length(nn) > 1, the length is taken to be the number required.

m, n

numbers of observations in the first and second sample, respectively. Can be vectors of positive integers.

log, log.p

logical; if TRUE, probabilities are given as logarithms.

lower.tail

logical; if TRUE (default), probabilities are P[X \le x], otherwise, P[X > x].

Details

This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum statistic is the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i]. This statistic takes values between 0 and m * n, and its mean and variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.

If any of the first three arguments are vectors, the recycling rule is used to do the calculations for all combinations of the three up to the length of the longest vector.

Value

dwilcox gives the density, pwilcox is the cumulative distribution function, qwilcox is the quantile function, and rwilcox generates random deviates.

The length of the result is determined by nn for rwilcox, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than nn are recycled to the length of the result. Only the first elements of the logical arguments are used.

Note

S-PLUS used a different (but equivalent) definition of the Wilcoxon statistic: see wilcox.test for details.

Author(s)

Originally by Kurt Hornik, more recent revisions by Andreas Löffler, Aidan Lakshman, and Ivan Krylov.

Source

These ("d","p","q") are calculated based on cwilcox(k, m, n), the number of choices with statistic k from samples of size m and n. cwilcox() is calculated using a formula introduced by Andreas Löffler to avoid recursion and reduce memory complexity. Then dwilcox and pwilcox sum appropriate values of cwilcox, and qwilcox is based on inversion.

rwilcox generates a random permutation of ranks and evaluates the statistic. Note that it is based on the same C code as sample(), and hence is determined by .Random.seed, notably from RNGkind(sample.kind = ..) which changed with R version 3.6.0.

References

Löffler, Andreas (1983) Über eine Partition der nat. Zahlen und ihre Anwendung beim U-Test. Wissenschaftliche Zeitschrift der Martin-Luther-Universität Halle-Wittenberg; Mathematisch-Naturwissenschaftliche Reihe, XXXII'83 M, Heft 5, 87–89; available as https://upload.wikimedia.org/wikipedia/commons/f/f5/LoefflerWilcoxonMannWhitneyTest.pdf and in English as https://upload.wikimedia.org/wikipedia/de/1/19/MannWhitney_151102.pdf

Examples

require(graphics)

x <- -1:(4*6 + 1)
fx <- dwilcox(x, 4, 6)
Fx <- pwilcox(x, 4, 6)

layout(rbind(1,2), widths = 1, heights = c(3,2))
plot(x, fx, type = "h", col = "violet",
     main =  "Probabilities (density) of Wilcoxon-Statist.(n=6, m=4)")
plot(x, Fx, type = "s", col = "blue",
     main =  "Distribution of Wilcoxon-Statist.(n=6, m=4)")
abline(h = 0:1, col = "gray20", lty = 2)
layout(1) # set back

N <- 200
hist(U <- rwilcox(N, m = 4,n = 6), breaks = 0:25 - 1/2,
     border = "red", col = "pink", sub = paste("N =",N))
mtext("N * f(x),  f() = true \"density\"", side = 3, col = "blue")
 lines(x, N*fx, type = "h", col = "blue", lwd = 2)
points(x, N*fx, cex = 2)

## Better is a Quantile-Quantile Plot
qqplot(U, qw <- qwilcox((1:N - 1/2)/N, m = 4, n = 6),
       main = paste("Q-Q-Plot of empirical and theoretical quantiles",
                     "Wilcoxon Statistic,  (m=4, n=6)", sep = "\n"))
n <- as.numeric(names(print(tU <- table(U))))
text(n+.2, n+.5, labels = tU, col = "red")

[Package stats version 4.6.0 Index]