[R] How to do bootstrap for the complex sample design?

Tim Hesterberg timhesterberg at gmail.com
Thu Nov 4 15:51:24 CET 2010

Faye wrote:
>Our survey is structured as : To be investigated area is divided into
>6 regions, within each region, one urban community and one rural
>community are randomly selected, then samples are randomly drawn from
>each selected uran and rural community.
>The problems is that in urban/rural stratum, we only have one sample.
>In this case, how to do bootstrap?

You are lucky that your sample size is 1.  If it were 2 you would
probably have proceeded without realizing that the answers were wrong.

Suppose you had two samples in each stratum.  If you proceed naturally,
drawing bootstrap samples of size 2 from each stratum, this would
underestimate variability by a factor of 2.

In general the ordinary nonparametric bootstrap estimates of variability
are biased downward by a factor of (n-1)/n -- exactly for the mean, 
approximately for other statistics.  In multiple-sample and stratified
situations, the bias depends on the stratum sizes.

Three remedies are:
* draw bootstrap samples of size n-1
* "bootknife" sampling - omit one observation (a jackknife sample), then
  draw a bootstrap sample of size n from that
* bootstrap from a kernel density estimate, with kernel covariance equal
  to empirical covariance (with divisor n-1) / n.
The latter two are described in 
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930.

All three are undefined for samples of size 1.  You need to go to some
other bootstrap, e.g. a parametric bootstrap with variability estimated
from other data.

Tim Hesterberg

More information about the R-help mailing list