[R] How to do bootstrap for the complex sample design?
timhesterberg at gmail.com
Thu Nov 4 15:51:24 CET 2010
>Our survey is structured as : To be investigated area is divided into
>6 regions, within each region, one urban community and one rural
>community are randomly selected, then samples are randomly drawn from
>each selected uran and rural community.
>The problems is that in urban/rural stratum, we only have one sample.
>In this case, how to do bootstrap?
You are lucky that your sample size is 1. If it were 2 you would
probably have proceeded without realizing that the answers were wrong.
Suppose you had two samples in each stratum. If you proceed naturally,
drawing bootstrap samples of size 2 from each stratum, this would
underestimate variability by a factor of 2.
In general the ordinary nonparametric bootstrap estimates of variability
are biased downward by a factor of (n-1)/n -- exactly for the mean,
approximately for other statistics. In multiple-sample and stratified
situations, the bias depends on the stratum sizes.
Three remedies are:
* draw bootstrap samples of size n-1
* "bootknife" sampling - omit one observation (a jackknife sample), then
draw a bootstrap sample of size n from that
* bootstrap from a kernel density estimate, with kernel covariance equal
to empirical covariance (with divisor n-1) / n.
The latter two are described in
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930.
All three are undefined for samples of size 1. You need to go to some
other bootstrap, e.g. a parametric bootstrap with variability estimated
from other data.
More information about the R-help