[R] By processing on two variables at once?
Steve Lianoglou
mailinglist.honeypot at gmail.com
Thu Nov 12 04:11:39 CET 2009
Hi,
On Wed, Nov 11, 2009 at 8:51 PM, zwarren <zack.warren at yahoo.com> wrote:
>
> Hello!
>
> I'm trying to runs stats on two vars at a time in a big data frame. I knew
> how to do this in SAS many years ago, but have half-forgotten that as well!
>
> I need, for instance, mean(value) by x-y combination:
> x y z value
> 1 1 1 10
> 1 1 2 20
> 1 2 1 30
>
> with results:
> x y mean(value)
> 1 1 15
> 1 2 30
What happend to your "z" column?
Anyway, there are a few ways you can do this.
1. If you just want to use the standard library, try the aggregate
function. Roghly:
R> df <- data.frame(x=c(1,1,1), y=c(1,1,2), z=c(1,2,1), value=c(10,20,30))
R> aggregate(df, by=list(df$x, df$y), mean)
Group.1 Group.2 x y z value
1 1 1 1 1 1.5 15
2 1 2 1 2 1.0 30
2. You can try using the plyr library:
R> library(plyr)
R> ddply(df, .(x, y), mean)
x y z value
1 1 1 1.5 15
2 1 2 1.0 30
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list