[R] bootstrap sample for clustered data

Mon Sep 17 03:05:09 CEST 2018

Unless there is good reason not to -- which is not the case here --
**always" cc the list. I have done that here.

"Can you help me with it?"
Nope. I'm not a private consultant, and I already made an attempt to do so,
which you seem to have completely ignored. So I'm done.
By the way, "Unfortunately it couldn’t work for my case" is a completely
meaningless comment. You need to explicitly show what you did and what
error messages you received. Read the posting guide below for how to post
an intelligible question.
FInally, if you think this is a mixed model issue -- which I believe you
are confused about, but as I can't penetrate your comments, maybe I'm wrong
-- post on the r-sig-mixed-models list,not here. Same comments go for
posting an intelligible question apply there.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sun, Sep 16, 2018 at 5:34 PM Liu, Lei <lei.liu using wustl.edu> wrote:

> Hi Bert,
>
>
>
> Thanks for your help. Unfortunately it couldn’t work for my case. Please
> see my code below. Here id is the cluster. Note different clusters have
> different number of subjects, some have 2, some have 3.
>
>
>
> id=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5)
>
> y=c(.5, .6, .4, .3, .4, 1, .9, 1, .5, 2, 2.2, 3)
>
> x=c(0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1 )
>
>
>
> xx=data.frame(id, x, y)
>
>
>
> boot.cluster <- function(x, id){
>
>
>
>   boot.id <- sample(unique(id), replace=T)
>
>   out <- lapply(boot.id, function(i) x[id%in%i,])
>
>
>
>   return( do.call("rbind",out) )
>
>
>
> }
>
>
>
> boot.xx=boot.cluster(xx, xx$id)
>
>
>
> Here is the boot.xx dataset:
>
>
>
>    id x   y
>
> 5   3 0 0.4
>
> 6   3 0 1.0
>
> 7   3 0 0.9
>
> 1   1 0 0.5
>
> 2   1 0 0.6
>
> 11  5 1 2.2
>
> 12  5 1 3.0
>
> 3   2 1 0.4
>
> 4   2 1 0.3
>
> 13  1 0 0.5
>
> 21  1 0 0.6
>
>
>
> You can see that some clusters (ids) appears multiple times (e.g., id 1
> appears in two places – 4 rows), since bootstrap does a sample *with
> replacement*, we could have the same cluster multiple times. Thus, we
> cannot do a mixed effects model using this data, as we should assume all
> the clusters are different in this new data. Instead, I will reorganize the
> data as below. This is the step I need help:
>
>
>
> new.id x   y
>
> 5   1 0 0.4
>
> 6   1 0 1.0
>
> 7   1 0 0.9
>
> 1   2 0 0.5
>
> 2   2 0 0.6
>
> 11  3 1 2.2
>
> 12  3 1 3.0
>
> 3   4 1 0.4
>
> 4   4 1 0.3
>
> 13  5 0 0.5
>
> 21  5 0 0.6
>
>
>
> Can you help me with it? Thanks a lot!
>
>
>
> Lei
>
>
>
> *From:* Bert Gunter [mailto:bgunter.4567 using gmail.com]
> *Sent:* Sunday, September 16, 2018 3:36 PM
> *To:* Liu, Lei <lei.liu using wustl.edu>
> *Subject:* Re: [R] bootstrap sample for clustered data
>
>
>
> You can do a mixed effects model using the existing id's without recoding.
>
>
>
> But if you insist, is this the sort of thing you want?
>
>
>
> set.seed(-12345) # for reprodicibility
>
> id <- factor(sample(2:5, 10, rep=TRUE))
> id
> new.id <- factor(id,labels = seq_along(levels(id)))
> new.id
>
>
>
> Note: There's a slightly slicker way to do this, but it bypasses the
> factor() API, and I prefer not to do that.
>
>
>
> Cheers,
>
> Bert
>
>
>
>
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>
>
>
> On Sun, Sep 16, 2018 at 12:52 PM Liu, Lei <lei.liu using wustl.edu> wrote:
>
> Sorry for the confusion. I just want to recode the id variable to 1 to 5
> in the bootstrapped sample. This way I can do e.g., a mixed effects model
> using the new id as the cluster. Thanks!
>
> Lei
>
>
>
> *From:* Bert Gunter [mailto:bgunter.4567 using gmail.com]
> *Sent:* Sunday, September 16, 2018 2:21 PM
> *To:* Liu, Lei <lei.liu using wustl.edu>
> *Cc:* R-help <r-help using r-project.org>
> *Subject:* Re: [R] bootstrap sample for clustered data
>
>
>
> I can't make any sense of your post. Id 3 occurs 6 times, and 2 and 5
> occur twice each in your example.. How do you get (1,1,2,2,3,3,4,4,5,5) out
> of that? In other words, specify the mapping of old id's to new.
>
>
>
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>
>
>
> On Sun, Sep 16, 2018 at 11:51 AM Liu, Lei <lei.liu using wustl.edu> wrote:
>
> Hi there,
>
> I tried to generate bootstrap samples for clustered data. Here is some
> code I found in the web to do the work:
>
> id=c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5)
> y=c(.5, .6, .4, .3, .4, 1, .9, 1, .5, 2)
> x=c(0, 0, 1, 1, 0, 0, 1, 1, 1, 1 )
>
> xx=data.frame(id, x, y)
>
> boot.cluster <- function(x, id){
>
>   boot.id <- sample(unique(id), replace=T)
>   out <- lapply(boot.id, function(i) x[id%in%i,])
>
>   return( do.call("rbind",out) )
>
> }
>
> boot.pro=boot.cluster(xx, xx$id)
>
> Now I have the output
>
>    id x   y
> 5   3 0 0.4
> 6   3 0 1.0
> 51  3 0 0.4
> 61  3 0 1.0
> 9   5 1 0.5
> 10  5 1 2.0
> 52  3 0 0.4
> 62  3 0 1.0
> 3   2 1 0.4
> 4   2 1 0.3
>
> However, the id variable is the original id, while I want to take the new
> id as (1, 1, 2, 2, 3, 3, 4, 4, 5, 5) for later analysis. Can anyone show me
> how to do it? Of note, the same original id may have duplicates since the
> bootstrap sample is drawn with replacement. Thanks a lot!
>
> Lei
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]