R: Box Plot Statistics

boxplot.stats {grDevices}

R Documentation

Box Plot Statistics

Description

This function is typically called by another function to gather the statistics necessary for producing box plots, but may be invoked separately.

Usage

boxplot.stats(x, coef = 1.5, do.conf = TRUE, do.out = TRUE)

Arguments

x

a numeric vector for which the boxplot will be constructed (NAs and NaNs are allowed and omitted).

coef

this determines how far the plot ‘whiskers’ extend out from the box. If coef is positive, the whiskers extend to the most extreme data point which is no more than coef times the length of the box away from the box. A value of zero causes the whiskers to extend to the data extremes (and no outliers be returned).

do.conf, do.out

logicals; if FALSE, the conf or out component respectively will be empty in the result.

Details

The two ‘hinges’ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n\equiv 1 \bmod 4), the hinges do so additionally for n %% 4 == 2 (n\equiv 2 \bmod 4), and are in the middle of two observations otherwise.

The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers, Cleveland, Kleiner, and Tukey (1983, p. 62)), given in Mcgill, Tukey, and Larsen (1978, p. 16)). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians.

Value

A list with named components as follows:

stats

a vector of length 5, containing the extreme of the lower whisker, the lower ‘hinge’, the median, the upper ‘hinge’ and the extreme of the upper whisker. For coef = 0, this vector is identical to fivenum(x, na.rm = TRUE).

n

the number of non-NA observations in the sample.

conf

the lower and upper extremes of the ‘notch’ (if(do.conf)). See the details.

out

the values of any data points which lie beyond the extremes of the whiskers (if(do.out)).

Note that stats and conf are sorted in increasing order, unlike S, and that n and out include any +- Inf values.

References

Chambers J. M., Cleveland W. S., Kleiner B., Tukey P. A. (1983). Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole. ISBN 0871504138.

Emerson J. D., Strenio J. (1983). “Boxplots and Batch Comparison.” In Mosteller F., Hoaglin D. C., Tukey J. W. (eds.), Understanding Robust and Exploratory Data Analysis, chapter 3. Wiley.

Mcgill R., Tukey J. W., Larsen W. A. (1978). “Variations of Box Plots.” The American Statistician, 32(1), 12–16. doi:10.1080/00031305.1978.10479236.

Tukey J. W. (1977). Exploratory Data Analysis, number 2 series Addison-Wesley series in behavioral science. Addison-Wesley Publishing Company. ISBN 9780201076165. Section 2C.

Velleman P. F., Hoaglin D. C. (1981). Applications, Basics, and Computing of Exploratory Data Analysis, series Duxbury series in statistics and decision sciences. Duxbury Press. ISBN 9780878722730.

Examples

require(stats)
x <- c(1:100, 1000)
(b1 <- boxplot.stats(x))
(b2 <- boxplot.stats(x, do.conf = FALSE, do.out = FALSE))
stopifnot(b1 $ stats == b2 $ stats) # do.out = FALSE is still robust
boxplot.stats(x, coef = 3, do.conf = FALSE)

## no outlier treatment:
(b3 <- boxplot.stats(x, coef = 0))
stopifnot(b3$stats == fivenum(x))

## missing values are ignored
stopifnot(identical(boxplot.stats(c(x, NA)), b1))
## ... infinite values are not:
(r <- boxplot.stats(c(x, -1:1/0)))
stopifnot(r$out == c(1000, -Inf, Inf))

[Package grDevices version 4.6.0 Index]