[R] bootstrapping respecting subject level information

Fri Jul 5 03:09:07 CEST 2013

Josh's comment prompted me to check mty go-to reference. Davison and
Hinckley (1997 Section 3.8) recommend sampling the Subjects, but not
within the Subjects.

Cheers

Andrew

On Thu, Jul 04, 2013 at 05:53:58PM -0700, Joshua Wiley wrote:
> Hi,
> 
> It is not the easiest to follow code, but when I was working at UCLA,
> I wrote a page demonstrating a multilevel bootstrap, where I use a two
> stage sampler, (re)sampling at each level.  In your case, could be
> first draw subjects, then draw observations within subjects.  A strata
> only option does not resample all sources of variability, which are:
> 
> 1) which subjects you get and
> 2) which observations within those
> 
> The page is here: http://www.ats.ucla.edu/stat/r/dae/melogit.htm
> 
> As a side note, it demonstrates a mixed effects model in R, although
> as I mentioned it is not geared for beginners.
> 
> Cheers,
> 
> Josh
> 
> 
> 
> On Wed, Jul 3, 2013 at 7:19 PM, Sol Lago <solcita.lago at gmail.com> wrote:
> > Hi there,
> >
> > This is the first time I use this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
> >
> > I am trying to bootstrap an interaction (that is my test statistic) using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine:
> >
> > Subject = rep(c("S1","S2","S3","S4"),4)
> > Num     = rep(c("singular","plural"),8)
> > Gram    = rep(c("gram","gram","ungram","ungram"),4)
> > RT      = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
> > data    = data.frame(Subject,Num,Gram,RT)
> >
> > This is the code I used to get the empirical interaction value:
> >
> > summary(lm(RT ~ Num*Gram, data=data))
> >
> > As you can see, the interaction between my two factors is -348. I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:
> >
> > #Function to create the statistic to be boostrapped
> > boot.huber <- function(data, indices) {
> > data <- data[indices, ] #select obs. in bootstrap sample
> > mod <- lm(RT ~ Num*Gram, data=data)
> > coefficients(mod)       #return coefficient vector
> > }
> >
> > #Generate bootstrap estimate
> > data.boot <- boot(data, boot.huber, 1999)
> >
> > #Get confidence interval
> > boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction
> >
> > My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1, not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)
> >
> > Does anyone know how I could make sure that the resampling procedure used by "boot" respects the subject level information?
> >
> > Thanks a lot for your help/advice!
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://joshuawiley.com/
> Senior Analyst - Elkhart Group Ltd.
> http://elkhartgroup.com
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Deputy Director, ACERA 
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia               (prefer email)
http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-4599
http://www.acera.unimelb.edu.au/

Methods of Statistical Model Estimation (CRC, 2013)
http://www.crcpress.com/product/isbn/9781439858028
Forest Analytics with R (Springer, 2011) 
http://www.ms.unimelb.edu.au/FAwR/
Introduction to Scientific Programming and Simulation using R (CRC, 2009): 
http://www.ms.unimelb.edu.au/spuRs/