[R] Stratified Bootstrap question
Qian An
qiana at biostat.umn.edu
Thu Apr 7 20:23:34 CEST 2005
Dear Tim,
Thank you very much for taking time giving me advices on my questions. I
talked with my professor about this bootstrapping question whether to
resample clinic or resample clinic + resample patients within clinic.
I was told that the second method might destroy the correlation structure
between the patients within a clinic. So I am thinking if it is worthy
that I do a simulation to compare the two kinds of bootstrapping method. I
mean, is this comparision meaningful and is it worth of doing? What do you
think? Thank you.
Qian
On 1 Apr 2005, Tim Hesterberg wrote:
> Qian wrote:
> >I talked with my advisor yesterday about how to do bootstrapping for my
> >scenario: random clinic + random subject within clinic. She suggested that
> >only clinic are independent units, so I can only resample clinic. But I
> >think that since subjects are also independent within clinic, shall I
> >resample subjects within clinic, which means I have two-stage resampling?
> >Which one do you think makes sense?
>
> This is a tough issue; I don't have a complete answer. I'd
> appreciate input from other r-help readers.
>
> If you randomly select clinics, then randomly select patients within
> the clinics:
> (1) by bootstrapping just clinics, you capture both sources of
> variation -- the between-subject variation is incorporated in the
> results for each clinic.
>
> (2) by bootstrapping clinics, then subjects within clinics, you
> end up double-counting the between-subject variation
> That argues for resampling just clinics.
>
> By analogy, if you have multiple subjects, and multiple measurements
> per subject, you should just resample subjects.
>
> However, I'm not comfortable with this if you have a small number of
> clinics, and relatively large numbers of patients in each clinic, and
> think that the between-clinic variation should be small. Then it
> seems better to resample both clinics and patients.
>
> I'm leery about resampling just clinics if there are a small number
> of clinics. Bootstrapping isn't particularly effective for small
> samples -- it is subject to skewness in small samples, and it
> underestimates variances (it's advantages over classical methods
> really show up with medium size samples).
> There are remedies for the small variance, see
> Hesterberg, Tim C. (2004), "Unbiasing the Bootstrap-Bootknife Sampling
> vs. Smoothing", Proceedings of the Section on Statistics and the
> Environment, American Statistical Association, 2924-2930
> www.insightful.com/Hesterberg/articles/JSM04-bootknife.pdf
>
> Tim Hesterberg
>
> ========================================================
> | Tim Hesterberg Research Scientist |
> | timh at insightful.com Insightful Corp. |
> | (206)802-2319 1700 Westlake Ave. N, Suite 500 |
> | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. |
> | www.insightful.com/Hesterberg |
> ========================================================
> Download the S+Resample library from www.insightful.com/downloads/libraries
>
>
***************************************
Qian An
Division of Biostatistics
University of Minnesota
(phone) 612-626-2263
(fax) 612-626-8892
Email: qiana at biostat.umn.edu
More information about the R-help
mailing list