R: Bandwidth Selectors for Kernel Density Estimation

bandwidth {stats}

R Documentation

Bandwidth Selectors for Kernel Density Estimation

Description

Bandwidth selectors for Gaussian kernels in density.

Usage

bw.nrd0(x)

bw.nrd(x)

bw.ucv(x, nb = 1000, lower = 0.1 * hmax, upper = hmax,
       tol = 0.1 * lower)

bw.bcv(x, nb = 1000, lower = 0.1 * hmax, upper = hmax,
       tol = 0.1 * lower)

bw.SJ(x, nb = 1000, lower = 0.1 * hmax, upper = hmax,
      method = c("ste", "dpi"), tol = 0.1 * lower)

Arguments

x

numeric vector.

nb

number of bins to use.

lower, upper

range over which to minimize. The default is almost always satisfactory. hmax is calculated internally from a normal reference bandwidth.

method

either "ste" ("solve-the-equation") or "dpi" ("direct plug-in"). Can be abbreviated.

tol

for method "ste", the convergence tolerance for uniroot. The default leads to bandwidth estimates with only slightly more than one digit accuracy, which is sufficient for practical density estimation, but possibly not for theoretical simulation studies.

Details

bw.nrd0 implements a rule-of-thumb for choosing the bandwidth of a Gaussian kernel density estimator. It defaults to 0.9 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one-fifth power (= Silverman's ‘rule of thumb’, ⁠Silverman (1986, page 48, eqn (3.31))) unless the quartiles coincide when a positive result will be guaranteed.

bw.nrd is the more common variation given by ⁠Scott (1992), using factor 1.06.

bw.ucv and bw.bcv implement unbiased and biased cross-validation respectively.

bw.SJ implements the methods of ⁠Sheather and Jones (1991) to select the bandwidth using pilot estimation of derivatives.
The algorithm for method "ste" solves an equation (via uniroot) and because of that, enlarges the interval c(lower, upper) when the boundaries were not user-specified and do not bracket the root.

The last three methods use all pairwise binned distances: they are of complexity O(n^2) up to n = nb/2 and O(n) thereafter. Because of the binning, the results differ slightly when x is translated or sign-flipped.

Value

A bandwidth on a scale suitable for the bw argument of density.

Note

Long vectors x are not supported, but neither are they by density and kernel density estimation and for more than a few thousand points a histogram would be preferred.

Author(s)

B. D. Ripley, taken from early versions of package MASS.

References

⁠Scott DW (1992). Multivariate Density Estimation. Theory, Practice and Visualization. Wiley, New York.

⁠Sheather SJ, Jones MC (1991). “A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation.” Journal of the Royal Statistical Society Series B: Statistical Methodology, 53(3), 683–690. doi:10.1111/j.2517-6161.1991.tb01857.x.

⁠Silverman BW (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall.

⁠Venables WN, Ripley BD (2002). Modern Applied Statistics with S, series Statistics and Computing. Springer, New York, NY. doi:10.1007/978-0-387-21706-2.

Examples

require(graphics)

plot(density(precip, n = 1000))
rug(precip)
lines(density(precip, bw = "nrd"), col = 2)
lines(density(precip, bw = "ucv"), col = 3)
lines(density(precip, bw = "bcv"), col = 4)
lines(density(precip, bw = "SJ-ste"), col = 5)
lines(density(precip, bw = "SJ-dpi"), col = 6)
legend(55, 0.035,
       legend = c("nrd0", "nrd", "ucv", "bcv", "SJ-ste", "SJ-dpi"),
       col = 1:6, lty = 1)

[Package stats version 4.6.0 Index]