[R] How to Un-group a grouped data set?

David L Carlson dcarlson at tamu.edu
Tue May 15 16:46:35 CEST 2012


newdats <- rbind(cbind(dats[rep(1:nrow(dats), dats$AEs), 1:2], 
  AEs=1), cbind(dats[rep(1:nrow(dats), dats$N-dats$AEs),1:2], 
  AEs=0))

But the data will not be in the order you specified unless you add

newdats <- newdats[order(newdats$Study, -newdats$TX, -newdats$AEs),]

and you may want to clean up the rownumbers with

rownames(newdats) <- 1:nrow(newdats)

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of R. Michael Weylandt
> Sent: Tuesday, May 15, 2012 1:09 AM
> To: Cheenghee AM Koh
> Cc: r-help at r-project.org
> Subject: Re: [R] How to Un-group a grouped data set?
> 
> It is a nifty and surprisingly useful construct whenever you need to
> construct a function call programmatically or apply it to a list.
> 
> R-News 2/2 has some useful tips on this and related functions in the
> Programmer's Note section if you're interested.
> 
> Best,
> Michael
> 
> On Tue, May 15, 2012 at 2:05 AM, Cheenghee AM Koh <sigontw at gmail.com>
> wrote:
> > Thank you so much!  I can't believe I spent the whole night by not
> knowing
> > this one command "do.call"
> > This is so handy!
> > Best, Koh
> >
> >
> > On Tue, May 15, 2012 at 12:52 AM, R. Michael Weylandt
> > <michael.weylandt at gmail.com> wrote:
> >>
> >> Sorry -- I missed the bit about the AE in your original post.
> Perhaps
> >> you can work with my bit for the repeats, but it looks like if you
> >> want to use your function, it should suffice to do something like
> >>
> >> do.call("rbind", lapply(NewFuncName, 1:6))
> >>
> >> Best,
> >> Michael
> >>
> >> On Tue, May 15, 2012 at 1:50 AM, R. Michael Weylandt
> >> <michael.weylandt at gmail.com> wrote:
> >> > Don't use subset for a function name -- it's already the name of a
> >> > rather important function as is data (but at least that one's not
> a
> >> > function in your use so it's not quite so bad). Finally, use
> dput()
> >> > when sending data so we get a plaintext reproducible version.
> >> >
> >> > I'd try something like this:
> >> >
> >> > dats <- structure(list(Study = c(1L, 1L, 2L, 2L, 3L, 3L), TX =
> c(1L,
> >> > 0L, 1L, 0L, 1L, 0L), AEs = c(3L, 2L, 1L, 2L, 1L, 1L), N = c(5L,
> >> > 7L, 10L, 7L, 8L, 4L)), .Names = c("Study", "TX", "AEs", "N"),
> class =
> >> > "data.frame", row.names = c("1",
> >> > "2", "3", "4", "5", "6"))
> >> >
> >> > # See how handy dput can be :-)
> >> >
> >> > dats[unlist(mapply(FUN = function(x,y) rep(x, y), 1:NROW(dats),
> >> > dats$N)), -4]
> >> >
> >> > which isn't super elegant, but others might have something better.
> >> >
> >> > Best,
> >> > Michael
> >> >
> >> > On Tue, May 15, 2012 at 1:24 AM, Cheenghee AM Koh
> <sigontw at gmail.com>
> >> > wrote:
> >> >> Hello, R-fellows,
> >> >>
> >> >> I have a question that I really don't know how to solve. I have
> spent
> >> >> hours
> >> >> on line surfing for possible solutions but in veil. Please if
> anyone
> >> >> could
> >> >> help me handle this issue, you would be so appreciated!
> >> >>
> >> >> I have a "grouped" dataset like this:
> >> >>
> >> >>> data
> >> >>  Study TX AEs   N
> >> >> 1     1     1    3       5
> >> >> 2     1     0    2       7
> >> >> 3     2     1    1      10
> >> >> 4     2     0    2       7
> >> >> 5     3     1    1       8
> >> >> 6     3     0    1       4
> >> >>
> >> >> where Study is the study id, TX is treatment, AEs is how many
> people in
> >> >> this trial is positive, and N is the number of the subjects.
> Therefore,
> >> >> for
> >> >> the row 1, it stands for: It is the treatment arm for the study
> one,
> >> >> where
> >> >> there are 5 subjects and 3 of them are positive. The row 2 stands
> for:
> >> >> It
> >> >> is the control arm of the study 1 where there are 7 subjects and
> 2 of
> >> >> them
> >> >> are positive.
> >> >>
> >> >> Now I would like to "un-group them", make it like:
> >> >>
> >> >> Study  TX   AEs
> >> >>   1         1      1
> >> >>   1         1      1
> >> >>   1         1      1
> >> >>   1         1      0
> >> >>   1         1      0
> >> >>   1         0      1
> >> >>   1         0      1
> >> >>   1         0      0
> >> >>   1         0      0
> >> >>   1         0      0
> >> >>   1         0      0
> >> >>   1         0      0
> >> >>   2         1      1
> >> >>   .....................
> >> >>  .....................
> >> >>
> >> >>
> >> >> But I wasn't able to do it. In fact I wrote a small function, and
> use
> >> >> "lapply" to get what I want. It worked well, and did give me what
> I
> >> >> want.
> >> >> But I wasn't able to collapse all the returns into one single
> data
> >> >> frame
> >> >> for subsequent analysis.
> >> >>
> >> >> The function I wrote:
> >> >>
> >> >> subset = function(i){
> >> >> d = c(rep(data[i,1], data[i,4]), rep(data[i,2], data[i,4]),
> rep(0:1,
> >> >> c(data[i,4] - data[i,3],data[i,3])))
> >> >> d = matrix(d, data[i,4],3)
> >> >> d
> >> >> }
> >> >>
> >> >> then:
> >> >>
> >> >> Data = lapply(1:6, subset)
> >> >> Data
> >> >>
> >> >> Therefore, I tried to write a loop. But no matter how I tried, I
> can't
> >> >> get
> >> >> what I want.
> >> >>
> >> >> Any idea?
> >> >>
> >> >> Thank you so much!
> >> >>
> >> >> Best,
> >> >>
> >> >>
> >> >> --
> >> >> Cheenghee Masaki Koh, MSW, MS(c), PhD Student
> >> >> School of Social Service Administration
> >> >> Department of Health Studies, Division of Biological Science
> >> >> University of Chicago
> >> >>
> >> >>        [[alternative HTML version deleted]]
> >> >>
> >> >> ______________________________________________
> >> >> R-help at r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible
> code.
> >
> >
> >
> >
> > --
> > Cheenghee Masaki Koh, MSW, MS(c), PhD Student
> > School of Social Service Administration
> > Department of Health Studies, Division of Biological Science
> > University of Chicago
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list