Wilcoxon {stats}  R Documentation 
Distribution of the Wilcoxon Rank Sum Statistic
Description
Density, distribution function, quantile function and random
generation for the distribution of the Wilcoxon rank sum statistic
obtained from samples with size m
and n
, respectively.
Usage
dwilcox(x, m, n, log = FALSE)
pwilcox(q, m, n, lower.tail = TRUE, log.p = FALSE)
qwilcox(p, m, n, lower.tail = TRUE, log.p = FALSE)
rwilcox(nn, m, n)
Arguments
x , q 
vector of quantiles. 
p 
vector of probabilities. 
nn 
number of observations. If 
m , n 
numbers of observations in the first and second sample, respectively. Can be vectors of positive integers. 
log , log.p 
logical; if TRUE, probabilities p are given as log(p). 
lower.tail 
logical; if TRUE (default), probabilities are

Details
This distribution is obtained as follows. Let x
and y
be two random, independent samples of size m
and n
.
Then the Wilcoxon rank sum statistic is the number of all pairs
(x[i], y[j])
for which y[j]
is not greater than
x[i]
. This statistic takes values between 0
and
m * n
, and its mean and variance are m * n / 2
and
m * n * (m + n + 1) / 12
, respectively.
If any of the first three arguments are vectors, the recycling rule is used to do the calculations for all combinations of the three up to the length of the longest vector.
Value
dwilcox
gives the density,
pwilcox
gives the distribution function,
qwilcox
gives the quantile function, and
rwilcox
generates random deviates.
The length of the result is determined by nn
for
rwilcox
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than nn
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
Note
SPLUS used a different (but equivalent) definition of the Wilcoxon
statistic: see wilcox.test
for details.
Author(s)
Originally by Kurt Hornik, more recent revisions by Andreas Löffler, Aidan Lakshman, and Ivan Krylov.
Source
These ("d","p","q") are calculated based on cwilcox(k, m, n)
,
the number of choices with statistic k
from samples of size
m
and n
. cwilcox()
is calculated using a formula
introduced by Andreas Löffler to avoid recursion and reduce memory
complexity. Then dwilcox
and pwilcox
sum
appropriate values of cwilcox
, and qwilcox
is based on
inversion.
rwilcox
generates a random permutation of ranks and evaluates
the statistic. Note that it is based on the same C code as sample()
,
and hence is determined by .Random.seed
, notably from
RNGkind(sample.kind = ..)
which changed with R version 3.6.0.
References
Löffler, Andreas (1983) Über eine Partition der nat. Zahlen und ihre Anwendung beim UTest. Wissenschaftliche Zeitschrift der MartinLutherUniversität HalleWittenberg; MathematischNaturwissenschaftliche Reihe, XXXII'83 M, Heft 5, 87–89; available as https://upload.wikimedia.org/wikipedia/commons/f/f5/LoefflerWilcoxonMannWhitneyTest.pdf and in English as https://upload.wikimedia.org/wikipedia/de/1/19/MannWhitney_151102.pdf
See Also
wilcox.test
to calculate the statistic from data, find p
values and so on.
Distributions for standard distributions, including
dsignrank
for the distribution of the
onesample Wilcoxon signed rank statistic.
Examples
require(graphics)
x < 1:(4*6 + 1)
fx < dwilcox(x, 4, 6)
Fx < pwilcox(x, 4, 6)
layout(rbind(1,2), widths = 1, heights = c(3,2))
plot(x, fx, type = "h", col = "violet",
main = "Probabilities (density) of WilcoxonStatist.(n=6, m=4)")
plot(x, Fx, type = "s", col = "blue",
main = "Distribution of WilcoxonStatist.(n=6, m=4)")
abline(h = 0:1, col = "gray20", lty = 2)
layout(1) # set back
N < 200
hist(U < rwilcox(N, m = 4,n = 6), breaks = 0:25  1/2,
border = "red", col = "pink", sub = paste("N =",N))
mtext("N * f(x), f() = true \"density\"", side = 3, col = "blue")
lines(x, N*fx, type = "h", col = "blue", lwd = 2)
points(x, N*fx, cex = 2)
## Better is a QuantileQuantile Plot
qqplot(U, qw < qwilcox((1:N  1/2)/N, m = 4, n = 6),
main = paste("QQPlot of empirical and theoretical quantiles",
"Wilcoxon Statistic, (m=4, n=6)", sep = "\n"))
n < as.numeric(names(print(tU < table(U))))
text(n+.2, n+.5, labels = tU, col = "red")