[R] bootstrap sample for clustered data
Liu, Lei
|e|@||u @end|ng |rom wu@t|@edu
Mon Sep 17 05:22:44 CEST 2018
Hi there,
I posted this message before but there may be some confusion in my previous post. So here is a clearer version:
I'd like to do a bootstrap sampling for clustered data. Then I will run some complicated models (say mixed effects models) on the bootstrapped sample. Here id is the cluster. Note different clusters have different number of subjects, e.g., id 2 has 2 observations, id 3 has 3 observations.
id=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5)
y=c(.5, .6, .4, .3, .4, 1, .9, 1, .5, 2, 2.2, 3)
x=c(0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1 )
xx=data.frame(id, x, y)
boot.cluster <- function(x, id){
boot.id <- sample(unique(id), replace=T)
out <- lapply(boot.id, function(i) x[id%in%i,])
return( do.call("rbind",out) )
}
boot.xx=boot.cluster(xx, xx$id)
Here is the generated boot.xx dataset:
id x y
3 0 0.4
3 0 1.0
3 0 0.9
1 0 0.5
1 0 0.6
5 1 2.2
5 1 3.0
2 1 0.4
2 1 0.3
1 0 0.5
1 0 0.6
You can see that some clusters (ids) appears multiple times (e.g., id 1 appears in two places - 4 rows), since bootstrap does a sample with replacement, we could have the same cluster multiple times. Thus, we cannot do a mixed effects model using this data, as we should assume all the clusters are different in this new data. Instead, I will reorganize the data as below (id is reordered from the above boot.xx data). This is the step I need help:
id x y
1 0 0.4
1 0 1.0
1 0 0.9
2 0 0.5
2 0 0.6
3 1 2.2
3 1 3.0
4 1 0.4
4 1 0.3
5 0 0.5
5 0 0.6
Can someone help me with it? Thanks!
Lei Liu
Professor of Biostatistics
Washington University in St. Louis
[[alternative HTML version deleted]]
More information about the R-help
mailing list