[Rd] Using sample() to sample one value from a single value?
hb at biostat.ucsf.edu
Thu Nov 4 18:59:31 CET 2010
On Thu, Nov 4, 2010 at 7:42 AM, Tim Hesterberg <timhesterberg at gmail.com> wrote:
> On Wed, Nov 3, 2010 at 3:54 PM, Henrik Bengtsson <hb at biostat.ucsf.edu>wrote:
>> Hi, consider this one as an FYI, or a seed for further discussion.
>> I am aware that many traps on sample() have been reported over the
>> years. I know that these are also documents in help("sample"). Still
>> I got bitten by this while writing
>> All of the above makes sense when one study the code of sample(), but
>> sample() is indeed dangerous, e.g. imagine how many bootstrap
>> estimates out there quietly gets incorrect.
> Nonparametric bootstrapping from a sample of size 1 is <always> incorrect.
> If you draw a single observation from a sample of size 1, you get that
> same observation back. This implies zero sampling variability, which
> is wrong. If this single sample represents one stratum or sample in
> a larger problem, this would contribute zero variability to the overall
> result, again wrong.
> In general, the ordinary bootstrap underestimates variability in
> small samples. For a sample mean, the ordinary bootstrap corresponds
> to using an estimate of variance equal to (1/n) sum((x - mean(x))^2),
> instead of a divisor of n-1. In stratified and multi-sample applications
> the downward bias is similarly (n-1)/n.
> Three remedies are:
> * draw bootstrap samples of size n-1
> * "bootknife" sampling - omit one observation (a jackknife sample), then
> draw a bootstrap sample of size n from that
> * bootstrap from a kernel density estimate, with kernel covariance equal
> to empirical covariance (with divisor n-1) / n.
> The latter two are described in
> Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs.
> Smoothing, Proceedings of the Section on Statistics and the Environment,
> American Statistical Association, 2924-2930.
> All three are undefined for samples of size 1. You need to go to some
> other bootstrap, e.g. a parametric bootstrap with variability estimated
> from other data.
I had a feeling that I was going to be bitten by that attention
grabber on bootstrapping. Worse it may be misleading to some. But
honestly, thank you Tim for pointing this out and so clearly
explaining it all.
> Tim Hesterberg
More information about the R-devel