[R] bootstrap sample for clustered data
    Jeff Newmiller 
    jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
       
    Tue Sep 18 10:04:35 CEST 2018
    
    
  
Seeing what you regard as a satisfactory solution, I think Bert's recommendation to create a factor was superior since it allows you to maintain consistent labeling of your clusters even as the set of clusters changes.
I also still think you are setting the stage for frequent failures of the analyses you plan to apply to these data, but that discussion is out of scope here.
On September 17, 2018 8:29:48 AM PDT, "Liu, Lei" <lei.liu using wustl.edu> wrote:
>Thanks for the help. My friend helped me and here is the solution:
>
>boot.cluster <- function(x, id){
>  boot.id <- sample(unique(id), replace=T)
>out <- lapply(1:length(boot.id),
>function(newid){cbind(x[id%in%boot.id[newid],],newid)})
>  return( do.call("rbind",out) )
>}
>
>Lei
>
>-----Original Message-----
>From: Jeff Newmiller [mailto:jdnewmil using dcn.davis.ca.us] 
>Sent: Monday, September 17, 2018 2:32 AM
>To: r-help using r-project.org; Liu, Lei <lei.liu using wustl.edu>;
>r-help using R-project.org
>Subject: Re: [R] bootstrap sample for clustered data
>
>You are telling us that the ID values in your data set indicate
>clusters. However you went about making that determination in the first
>place might be an obvious(?) way to do it again with your bootstrapped
>sample, ignoring the cluster assignments you have in place. This is the
>wrong place to have a discussion about which theoretical method for
>cluster identification you should use, and if you do know that then
>searching the web or using the sos package would be the appropriate way
>to find implementations of a specific clustering algorithm.
>
>I am not an ME expert, but AFAIK "complicated" analyses such as mixed
>effects models tend to have rather hefty appetites for data
>completeness, so you may have to design a special sampling plan in
>order to avoid generating data sets for which those analyses won't
>break, and you will probably need a very large data set to start with
>in order to have sufficient data in each cluster. That is, you may be
>better off keeping the original cluster identification and just
>restructuring your bootstrap sampling to sample within clusters.
>
>The R-sig-me mailing list is probably a better venue for your
>questions. 
>
>On September 16, 2018 8:22:44 PM PDT, "Liu, Lei" <lei.liu using wustl.edu>
>wrote:
>>Hi there,
>>
>>I posted this message before but there may be some confusion in my 
>>previous post. So here is a clearer version:
>>
>>I'd like to do a bootstrap sampling for clustered data. Then I will
>run 
>>some complicated models (say mixed effects models) on the bootstrapped
>
>>sample. Here id is the cluster. Note different clusters have different
>
>>number of subjects, e.g., id 2 has 2 observations, id 3 has 3 
>>observations.
>>
>>id=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5) y=c(.5, .6, .4, .3, .4, 1,
>.9, 
>>1, .5, 2, 2.2, 3) x=c(0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1 )
>>
>>xx=data.frame(id, x, y)
>>
>>boot.cluster <- function(x, id){
>>
>>  boot.id <- sample(unique(id), replace=T)  out <- lapply(boot.id, 
>> function(i) x[id%in%i,])
>>
>>  return( do.call("rbind",out) )
>>
>>}
>>
>>boot.xx=boot.cluster(xx, xx$id)
>>
>>Here is the generated boot.xx dataset:
>>
>>   id x y
>>   3 0 0.4
>>   3 0 1.0
>>   3 0 0.9
>>   1 0 0.5
>>   1 0 0.6
>>   5 1 2.2
>>   5 1 3.0
>>   2 1 0.4
>>   2 1 0.3
>>   1 0 0.5
>>   1 0 0.6
>>
>>You can see that some clusters (ids) appears multiple times (e.g., id
>1 
>>appears in two places - 4 rows), since bootstrap does a sample with 
>>replacement, we could have the same cluster multiple times. Thus, we 
>>cannot do a mixed effects model using this data, as we should assume 
>>all the clusters are different in this new data. Instead, I will 
>>reorganize the data as below (id is reordered from the above boot.xx 
>>data). This is the step I need help:
>>
>>  id x  y
>>   1 0 0.4
>>   1 0 1.0
>>   1 0 0.9
>>   2 0 0.5
>>   2 0 0.6
>>   3 1 2.2
>>   3 1 3.0
>>   4 1 0.4
>>   4 1 0.3
>>   5 0 0.5
>>   5 0 0.6
>>
>>Can someone help me with it? Thanks!
>>
>>Lei Liu
>>Professor of Biostatistics
>>Washington University in St. Louis
>>
>>
>>	[[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>--
>Sent from my phone. Please excuse my brevity.
-- 
Sent from my phone. Please excuse my brevity.
    
    
More information about the R-help
mailing list