[Rd] Inconsistent behavior of stats::bw.nrd() and stats::bw.nrd0()
Noah Greifer
no@h@gre||er @end|ng |rom gm@||@com
Wed Feb 23 17:21:18 CET 2022
Hello R-devel,
I noticed an inconsistency in stats::bw.nrd() and stats::bw.nrd0, two
functions used to compute the bandwidth for densities. According to the
documentation,
"bw.nrd0 implements a rule-of-thumb for choosing the bandwidth of a
Gaussian kernel density estimator. It defaults to 0.9 times the minimum of
the standard deviation and the interquartile range divided by 1.34 times
the sample size to the negative one-fifth power (= Silverman's ‘rule of
thumb’, Silverman (1986, page 48, eqn (3.31))) unless the quartiles
coincide when a positive result will be guaranteed.
bw.nrd is the more common variation given by Scott (1992), using factor
1.06."
This implies the result of bw.nrd() should simply be 1.06/.9 times the
result of bw.nrd0(). However, these functions are coded quite differently
and, in particular, respond to situations where the data has an IQR of 0
differently. The source of bw.nrd0 is
function (x)
{
if (length(x) < 2L)
stop("need at least 2 data points")
hi <- sd(x)
if (!(lo <- min(hi, IQR(x)/1.34)))
(lo <- hi) || (lo <- abs(x[1L])) || (lo <- 1)
0.9 * lo * length(x)^(-0.2)
}
and the source of bw.nrd is
function (x)
{
if (length(x) < 2L)
stop("need at least 2 data points")
r <- quantile(x, c(0.25, 0.75))
h <- (r[2L] - r[1L])/1.34
1.06 * min(sqrt(var(x)), h) * length(x)^(-1/5)
}
Importantly, when the IQR of the input is 0, bw.nrd0() falls back onto the
standard deviation, guaranteeing the positive result described in the
documentation. Whereas, bw.nrd() produces a result of 0. I am not sure
which result is more desirable, but it would seem to me that they should at
least be consistent. See examples below:
> x <- c(1,1,1,1,1000)
> stats::bw.nrd(x)
[1] 0
> stats::bw.nrd0(x)
[1] 291.4265
- Noah Greifer
[[alternative HTML version deleted]]
More information about the R-devel
mailing list