[R] bootstrap sample for clustered data
    Liu, Lei 
    |e|@||u @end|ng |rom wu@t|@edu
       
    Mon Sep 17 05:22:44 CEST 2018
    
    
  
Hi there,
I posted this message before but there may be some confusion in my previous post. So here is a clearer version:
I'd like to do a bootstrap sampling for clustered data. Then I will run some complicated models (say mixed effects models) on the bootstrapped sample. Here id is the cluster. Note different clusters have different number of subjects, e.g., id 2 has 2 observations, id 3 has 3 observations.
id=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5)
y=c(.5, .6, .4, .3, .4, 1, .9, 1, .5, 2, 2.2, 3)
x=c(0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1 )
xx=data.frame(id, x, y)
boot.cluster <- function(x, id){
  boot.id <- sample(unique(id), replace=T)
  out <- lapply(boot.id, function(i) x[id%in%i,])
  return( do.call("rbind",out) )
}
boot.xx=boot.cluster(xx, xx$id)
Here is the generated boot.xx dataset:
   id x y
   3 0 0.4
   3 0 1.0
   3 0 0.9
   1 0 0.5
   1 0 0.6
   5 1 2.2
   5 1 3.0
   2 1 0.4
   2 1 0.3
   1 0 0.5
   1 0 0.6
You can see that some clusters (ids) appears multiple times (e.g., id 1 appears in two places - 4 rows), since bootstrap does a sample with replacement, we could have the same cluster multiple times. Thus, we cannot do a mixed effects model using this data, as we should assume all the clusters are different in this new data. Instead, I will reorganize the data as below (id is reordered from the above boot.xx data). This is the step I need help:
  id x  y
   1 0 0.4
   1 0 1.0
   1 0 0.9
   2 0 0.5
   2 0 0.6
   3 1 2.2
   3 1 3.0
   4 1 0.4
   4 1 0.3
   5 0 0.5
   5 0 0.6
Can someone help me with it? Thanks!
Lei Liu
Professor of Biostatistics
Washington University in St. Louis
	[[alternative HTML version deleted]]
    
    
More information about the R-help
mailing list