[R] best way to aggregate / rearrange data.frame with different data types
Dennis Murphy
djmuser at gmail.com
Mon Jul 11 19:06:42 CEST 2011
Hi:
Here's another approach using the plyr and reshape packages. (There
are multiple ways to do this, BTW.)
## (1)
library(plyr)
> ddply(dat, .(Subject, gender, comment), summarise, mean_y = mean(y))
Subject gender comment mean_y
1 1 w comment A 3.881864
2 2 m comment B 2.213656
3 3 w comment C 2.568794
## (2)
# Add a 'time' variable holding names to associate to new columns
dat$time <- paste('y', rep(1:4, 3), sep = '')
# reshape from 'long' to 'wide' form
cast(dat, Subject + gender + comment ~ time, value = 'y')
Subject gender comment y1 y2 y3 y4
1 1 w comment A 5.9299385 3.268402 2.634573 3.694540
2 2 m comment B 2.0663910 1.475625 1.960885 3.351722
3 3 w comment C 0.6656096 3.044818 4.833166 1.731582
HTH,
Dennis
On Mon, Jul 11, 2011 at 8:55 AM, Martin Batholdy
<batholdy at googlemail.com> wrote:
> Hi,
>
>
> I have a data.frame that looks like this:
>
>
> Subject <- c(rep(1,4), rep(2,4), rep(3,4))
> y <- rnorm(12, 3, 2)
> gender <- c(rep("w",4), rep("m",4), rep("w",4))
> comment <- c(rep("comment A",4), rep("comment B",4), rep("comment C",4))
>
> data <- data.frame(Subject,y,gender,comment)
> data
>
> Subject y gender comment
> 1 1 2.86495339 w comment A
> 2 1 3.33758993 w comment A
> 3 1 7.00301094 w comment A
> 4 1 3.81585998 w comment A
> 5 2 2.50300460 m comment B
> 6 2 4.93830489 m comment B
> 7 2 5.08184289 m comment B
> 8 2 4.00552691 m comment B
> 9 3 3.16131181 w comment C
> 10 3 4.61620021 w comment C
> 11 3 3.68288799 w comment C
> 12 3 -0.05049953 w comment C
>
>
>
> So I have multiple lines for one subject because of a repeated measurement of variable y
> (the rest of the variables stay the same, like gender).
>
>
> Now I would like to transform this data.frame in two ways:
>
> 1. a aggregated form,
> where I only have one row left for each subject - for numerical variables within the data.frame (like y) a mean should be calculated.
>
>
> 2. a restructured form,
> where I only have one row for each subject, but four different y-columns (y1, y2, y3, y4).
>
>
> What is the easiest way to do this?
> Are there any functions who do this kind of data-frame rearranging in one step?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list