[R] Stratified Bootstrap question

Thu Apr 7 20:23:34 CEST 2005

Dear Tim,

Thank you very much for taking time giving me advices on my questions. I
talked with my professor about this bootstrapping question whether to
resample clinic or resample clinic + resample patients within clinic.

I was told that the second method might destroy the correlation structure
between the patients within a clinic. So I am thinking if it is worthy
that I do a simulation to compare the two kinds of bootstrapping method. I
mean, is this comparision meaningful and is it worth of doing? What do you
think? Thank you.

Qian

On 1 Apr 2005, Tim Hesterberg wrote:

> Qian wrote:
> >I talked with my advisor yesterday about how to do bootstrapping for my
> >scenario: random clinic + random subject within clinic. She suggested that
> >only clinic are independent units, so I can only resample clinic. But I
> >think that since subjects are also independent within clinic, shall I
> >resample subjects within clinic, which means I have two-stage resampling?
> >Which one do you think makes sense?
>
> This is a tough issue; I don't have a complete answer.  I'd
> appreciate input from other r-help readers.
>
> If you randomly select clinics, then randomly select patients within
> the clinics:
>   (1) by bootstrapping just clinics, you capture both sources of
>   variation -- the between-subject variation is incorporated in the
>   results for each clinic.
>
>   (2) by bootstrapping clinics, then subjects within clinics, you
>   end up double-counting the between-subject variation
> That argues for resampling just clinics.
>
> By analogy, if you have multiple subjects, and multiple measurements
> per subject, you should just resample subjects.
>
> However, I'm not comfortable with this if you have a small number of
> clinics, and relatively large numbers of patients in each clinic, and
> think that the between-clinic variation should be small.  Then it
> seems better to resample both clinics and patients.
>
> I'm leery about resampling just clinics if there are a small number
> of clinics.  Bootstrapping isn't particularly effective for small
> samples -- it is subject to skewness in small samples, and it
> underestimates variances (it's advantages over classical methods
> really show up with medium size samples).
> There are remedies for the small variance, see
> 	Hesterberg, Tim C. (2004), "Unbiasing the Bootstrap-Bootknife Sampling
> 	vs. Smoothing", Proceedings of the Section on Statistics and the
> 	Environment, American Statistical Association, 2924-2930
> 	www.insightful.com/Hesterberg/articles/JSM04-bootknife.pdf
>
> Tim Hesterberg
>
> ========================================================
> | Tim Hesterberg       Research Scientist              |
> | timh at insightful.com  Insightful Corp.                |
> | (206)802-2319        1700 Westlake Ave. N, Suite 500 |
> | (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
> |                      www.insightful.com/Hesterberg   |
> ========================================================
> Download the S+Resample library from www.insightful.com/downloads/libraries
>
>

***************************************
Qian An
Division of Biostatistics
University of Minnesota
(phone) 612-626-2263
(fax) 612-626-8892
Email: qiana at biostat.umn.edu