[R] best way to aggregate / rearrange data.frame with different data types

Mon Jul 11 18:25:45 CEST 2011

On Jul 11, 2011, at 11:55 AM, Martin Batholdy wrote:

> Hi,
>
>
> I have a data.frame that looks like this:
>
>
> Subject <- c(rep(1,4), rep(2,4), rep(3,4))
> y <- rnorm(12, 3, 2)
> gender <- c(rep("w",4), rep("m",4), rep("w",4))
> comment <- c(rep("comment A",4), rep("comment B",4), rep("comment C", 
> 4))
>
> data <- data.frame(Subject,y,gender,comment)
> data
>
>   Subject           y gender   comment
> 1        1  2.86495339      w comment A
> 2        1  3.33758993      w comment A
> 3        1  7.00301094      w comment A
> 4        1  3.81585998      w comment A
> 5        2  2.50300460      m comment B
> 6        2  4.93830489      m comment B
> 7        2  5.08184289      m comment B
> 8        2  4.00552691      m comment B
> 9        3  3.16131181      w comment C
> 10       3  4.61620021      w comment C
> 11       3  3.68288799      w comment C
> 12       3 -0.05049953      w comment C
>
>
>
> So I have multiple lines for one subject because of a repeated  
> measurement of variable y
> (the rest of the variables stay the same, like gender).
>
>
> Now I would like to transform this data.frame in two ways:
>
> 1. a aggregated form,
> where I only have one row left for each subject - for numerical  
> variables within the data.frame (like y) a mean should be calculated.

?aggregate     # seems that you _should_ have already looked here.

>
>
> 2. a restructured form,
> where I only have one row for each subject, but four different y- 
> columns (y1, y2, y3, y4).

You can use xtab .
  data$seqvar <- ave(data$y, data$Subject, FUN=seq)
  xtabs(y ~ Subject +seqvar, data=data)

or ..
# reshape (the function)
 > data$seqvar <- ave(data$y, data$Subject, FUN=seq)
 > reshape(data, idvar=c("Subject", "gender", "comment"),   
timevar="seqvar", direction="wide")

or the easier to understand reshape or reshape2 packages.

-- 
David Winsemius, MD
West Hartford, CT