[R] Subtraction of group means using AGGREGATE and MERGE
Joris Meys
jorismeys at gmail.com
Thu Jun 17 11:15:18 CEST 2010
Funny, I couldn't run your code using R 2.10.1 (aggregate required a
list). This said, take a look at the function ave() :
> X <- rep(1:4)
> Y <- rep(letters[1:2],each=2)
> Z <- data.frame(X,Y)
> system.time(replicate(1000,{
+ A <- aggregate(Z$X, by=list(Y=Z$Y), FUN=mean)
+ M <- merge(Z,A,by="Y")[,3]
+ Result <- X - M
+ }))
user system elapsed
3.57 0.01 3.58
> system.time(replicate(1000,{
+ Result <- Z$X - ave(Z$X,Z$Y)
+ }))
user system elapsed
0.25 0.00 0.25
>
Cheers
Joris
On Thu, Jun 17, 2010 at 9:22 AM, Ben Cocker <b.cocker at ucl.ac.uk> wrote:
> Hi all,
>
> This is my first ever post, so forgive me and let me know if my
> etiquette is less than that required.
>
> I am searching for a faster way of subracting group means within a
> data frame than the solution I've found so far, using AGGREGATE and
> MERGE.
>
> I'll flesh my question out using a trivial example: I have a data
> frame Z with two columns - one X of values and one Y of labels:
>
>> Z
> X Y
> 1 1 4
> 2 2 4
> 3 3 5
> 4 4 5
>
> I want to take the group means (for the two groups Y=4 and Y=5) and
> subtract them from X resulting in the vector Result = t(-0.5 0.5 -0.5
> 0.5). I have found a (slow) way of achieving this, using the
> AGGREGATE function to get the group means and then MERGE to construct
> an appropriate vector of these values, M:
>
>> A <- aggregate(Z$X, by=Z$Y, FUN=mean)
>> A
> Y X
> 1 4 1.5
> 2 5 3.5
>
>> M <- merge(Z,A,by="Y")[,3]
>> M
> [1] 1.5 1.5 3.5 3.5
>
>> Result <- X - M
>> Result
> X
> 1 -0.5
> 2 0.5
> 3 -0.5
> 4 0.5
>
> My problem: for lots of records, while AGGREGATE is very fast, MERGE
> is very slow - in real life I need to call this routine many times
> over a very large dataset. Could anyone help me find a faster way of
> achieving the same goal?
>
> Many thanks,
>
> Ben Cocker
> MSc Statistics at UCL, London, UK
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
More information about the R-help
mailing list