[R] Odp: Is the aggregate function the best way to do this?
Petr PIKAL
petr.pikal at precheza.cz
Wed Feb 17 10:18:20 CET 2010
Hi
r-help-bounces at r-project.org napsal dne 17.02.2010 09:36:45:
> Hi,
>
>
>
> I'm having a dataframe 'Subset1' with a number of factor variables and
160
> numerical variables
>
> Now I want to make sums for all rows that have the same values for the
> different factor variables, except for the factor variables:
VAR1,VAR2,VAR3
> who may have the same values.
>
> With the formula given below this works great, but in a situation with
15000
> rows and 13 factor variables the calculation takes more than 2 minutes.
>
> So my question is: Does anyone knows if there exists a faster
alternative?
I believe plyr package has optimised code for such aggregations. But I do
not use it myself much often so I am not sure.
You probably could speed things by avoiding ncol(Subset1) in aggregate.
Either use numbers or do
selection <- Subset1[,(ncol(Subset1)-159):ncol(Subset1)]
and avoiding unnecessary coercion to data.frame. Aggregate perform it for
you :-)
Subset1.AGG <- aggregate(selection, list(VAR1 =
Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3), FUN=sum)
Regards
Petr
>
>
>
> Subset1.AGG <-
> as.data.frame(aggregate(Subset1[,(ncol(Subset1)-159):ncol(Subset1)],
> list(VAR1 = Subset1$VAR1,VAR2=Subset1$VAR2,VAR3 = Subset1$VAR3),
FUN=sum) )
>
>
>
> Thank you very much for helping me out,
>
> Bert
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list