[R] How to Un-group a grouped data set?
R. Michael Weylandt
michael.weylandt at gmail.com
Tue May 15 07:50:11 CEST 2012
Don't use subset for a function name -- it's already the name of a
rather important function as is data (but at least that one's not a
function in your use so it's not quite so bad). Finally, use dput()
when sending data so we get a plaintext reproducible version.
I'd try something like this:
dats <- structure(list(Study = c(1L, 1L, 2L, 2L, 3L, 3L), TX = c(1L,
0L, 1L, 0L, 1L, 0L), AEs = c(3L, 2L, 1L, 2L, 1L, 1L), N = c(5L,
7L, 10L, 7L, 8L, 4L)), .Names = c("Study", "TX", "AEs", "N"), class =
"data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
# See how handy dput can be :-)
dats[unlist(mapply(FUN = function(x,y) rep(x, y), 1:NROW(dats), dats$N)), -4]
which isn't super elegant, but others might have something better.
Best,
Michael
On Tue, May 15, 2012 at 1:24 AM, Cheenghee AM Koh <sigontw at gmail.com> wrote:
> Hello, R-fellows,
>
> I have a question that I really don't know how to solve. I have spent hours
> on line surfing for possible solutions but in veil. Please if anyone could
> help me handle this issue, you would be so appreciated!
>
> I have a "grouped" dataset like this:
>
>> data
> Study TX AEs N
> 1 1 1 3 5
> 2 1 0 2 7
> 3 2 1 1 10
> 4 2 0 2 7
> 5 3 1 1 8
> 6 3 0 1 4
>
> where Study is the study id, TX is treatment, AEs is how many people in
> this trial is positive, and N is the number of the subjects. Therefore, for
> the row 1, it stands for: It is the treatment arm for the study one, where
> there are 5 subjects and 3 of them are positive. The row 2 stands for: It
> is the control arm of the study 1 where there are 7 subjects and 2 of them
> are positive.
>
> Now I would like to "un-group them", make it like:
>
> Study TX AEs
> 1 1 1
> 1 1 1
> 1 1 1
> 1 1 0
> 1 1 0
> 1 0 1
> 1 0 1
> 1 0 0
> 1 0 0
> 1 0 0
> 1 0 0
> 1 0 0
> 2 1 1
> .....................
> .....................
>
>
> But I wasn't able to do it. In fact I wrote a small function, and use
> "lapply" to get what I want. It worked well, and did give me what I want.
> But I wasn't able to collapse all the returns into one single data frame
> for subsequent analysis.
>
> The function I wrote:
>
> subset = function(i){
> d = c(rep(data[i,1], data[i,4]), rep(data[i,2], data[i,4]), rep(0:1,
> c(data[i,4] - data[i,3],data[i,3])))
> d = matrix(d, data[i,4],3)
> d
> }
>
> then:
>
> Data = lapply(1:6, subset)
> Data
>
> Therefore, I tried to write a loop. But no matter how I tried, I can't get
> what I want.
>
> Any idea?
>
> Thank you so much!
>
> Best,
>
>
> --
> Cheenghee Masaki Koh, MSW, MS(c), PhD Student
> School of Social Service Administration
> Department of Health Studies, Division of Biological Science
> University of Chicago
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list