[R] How to Un-group a grouped data set?

R. Michael Weylandt michael.weylandt at gmail.com
Tue May 15 07:50:11 CEST 2012


Don't use subset for a function name -- it's already the name of a
rather important function as is data (but at least that one's not a
function in your use so it's not quite so bad). Finally, use dput()
when sending data so we get a plaintext reproducible version.

I'd try something like this:

dats <- structure(list(Study = c(1L, 1L, 2L, 2L, 3L, 3L), TX = c(1L,
0L, 1L, 0L, 1L, 0L), AEs = c(3L, 2L, 1L, 2L, 1L, 1L), N = c(5L,
7L, 10L, 7L, 8L, 4L)), .Names = c("Study", "TX", "AEs", "N"), class =
"data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))

# See how handy dput can be :-)

dats[unlist(mapply(FUN = function(x,y) rep(x, y), 1:NROW(dats), dats$N)), -4]

which isn't super elegant, but others might have something better.

Best,
Michael

On Tue, May 15, 2012 at 1:24 AM, Cheenghee AM Koh <sigontw at gmail.com> wrote:
> Hello, R-fellows,
>
> I have a question that I really don't know how to solve. I have spent hours
> on line surfing for possible solutions but in veil. Please if anyone could
> help me handle this issue, you would be so appreciated!
>
> I have a "grouped" dataset like this:
>
>> data
>  Study TX AEs   N
> 1     1     1    3       5
> 2     1     0    2       7
> 3     2     1    1      10
> 4     2     0    2       7
> 5     3     1    1       8
> 6     3     0    1       4
>
> where Study is the study id, TX is treatment, AEs is how many people in
> this trial is positive, and N is the number of the subjects. Therefore, for
> the row 1, it stands for: It is the treatment arm for the study one, where
> there are 5 subjects and 3 of them are positive. The row 2 stands for: It
> is the control arm of the study 1 where there are 7 subjects and 2 of them
> are positive.
>
> Now I would like to "un-group them", make it like:
>
> Study  TX   AEs
>   1         1      1
>   1         1      1
>   1         1      1
>   1         1      0
>   1         1      0
>   1         0      1
>   1         0      1
>   1         0      0
>   1         0      0
>   1         0      0
>   1         0      0
>   1         0      0
>   2         1      1
>   .....................
>  .....................
>
>
> But I wasn't able to do it. In fact I wrote a small function, and use
> "lapply" to get what I want. It worked well, and did give me what I want.
> But I wasn't able to collapse all the returns into one single data frame
> for subsequent analysis.
>
> The function I wrote:
>
> subset = function(i){
> d = c(rep(data[i,1], data[i,4]), rep(data[i,2], data[i,4]), rep(0:1,
> c(data[i,4] - data[i,3],data[i,3])))
> d = matrix(d, data[i,4],3)
> d
> }
>
> then:
>
> Data = lapply(1:6, subset)
> Data
>
> Therefore, I tried to write a loop. But no matter how I tried, I can't get
> what I want.
>
> Any idea?
>
> Thank you so much!
>
> Best,
>
>
> --
> Cheenghee Masaki Koh, MSW, MS(c), PhD Student
> School of Social Service Administration
> Department of Health Studies, Division of Biological Science
> University of Chicago
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list