[Rd] by() processing on a dataframe
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Sep 30 20:47:37 CEST 2005
And here is one more approach using the reshape package:
library(reshape)
dataset.d <- melt(dataset, id = 1:2)
cast(dataset.d, gp1 + gp2 ~ variable, mean)
On 9/30/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Check out summaryBy in the doBy package at:
>
> http://genetics.agrsci.dk/~sorenh/misc
>
> e.g.
>
> summaryBy(value ~ gp1 + gp2, data = dataset)
>
>
>
> On 9/30/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> > I want to calculate a statistic on a number of subgroups of a dataframe,
> > then put the results into a dataframe. (What SAS PROC MEANS does, I
> > think, though it's been years since I used it.)
> >
> > This is possible using by(), but it seems cumbersome and fragile. Is
> > there a more straightforward way than this?
> >
> > Here's a simple example showing my current strategy:
> >
> > > dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4,
> > c(2,2,2,2)), value = rnorm(8))
> > > dataset
> > gp1 gp2 value
> > 1 1 1 0.9493232
> > 2 1 1 -0.0474712
> > 3 1 2 -0.6808021
> > 4 1 2 1.9894999
> > 5 2 3 2.0154786
> > 6 2 3 0.4333056
> > 7 2 4 -0.4746228
> > 8 2 4 0.6017522
> > >
> > > handleonegroup <- function(subset) data.frame(gp1 = subset$gp1[1],
> > + gp2 = subset$gp2[1], statistic = mean(subset$value))
> > >
> > > bylist <- by(dataset, list(dataset$gp1, dataset$gp2), handleonegroup)
> > >
> > > result <- do.call('rbind', bylist)
> > > result
> > gp1 gp2 statistic
> > 1 1 1 0.45092598
> > 11 1 2 0.65434890
> > 12 2 3 1.22439210
> > 13 2 4 0.06356469
> >
> > tapply() is inappropriate because I don't have all possible combinations
> > of gp1 and gp2 values, only some of them:
> >
> > > tapply(dataset$value, list(dataset$gp1, dataset$gp2), mean)
> > 1 2 3 4
> > 1 0.450926 0.6543489 NA NA
> > 2 NA NA 1.224392 0.06356469
> >
> >
> >
> > In the real case, I only have a very sparse subset of all the
> > combinations, and tapply() and by() both die for lack of memory.
> >
> > Any suggestions on how to do what I want, without using SAS?
> >
> > Duncan Murdoch
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
More information about the R-devel
mailing list