[R] Subtraction of group means using AGGREGATE and MERGE
Ben Cocker
b.cocker at ucl.ac.uk
Thu Jun 17 09:22:15 CEST 2010
Hi all,
This is my first ever post, so forgive me and let me know if my
etiquette is less than that required.
I am searching for a faster way of subracting group means within a
data frame than the solution I've found so far, using AGGREGATE and
MERGE.
I'll flesh my question out using a trivial example: I have a data
frame Z with two columns - one X of values and one Y of labels:
> Z
X Y
1 1 4
2 2 4
3 3 5
4 4 5
I want to take the group means (for the two groups Y=4 and Y=5) and
subtract them from X resulting in the vector Result = t(-0.5 0.5 -0.5
0.5). I have found a (slow) way of achieving this, using the
AGGREGATE function to get the group means and then MERGE to construct
an appropriate vector of these values, M:
> A <- aggregate(Z$X, by=Z$Y, FUN=mean)
> A
Y X
1 4 1.5
2 5 3.5
> M <- merge(Z,A,by="Y")[,3]
> M
[1] 1.5 1.5 3.5 3.5
> Result <- X - M
> Result
X
1 -0.5
2 0.5
3 -0.5
4 0.5
My problem: for lots of records, while AGGREGATE is very fast, MERGE
is very slow - in real life I need to call this routine many times
over a very large dataset. Could anyone help me find a faster way of
achieving the same goal?
Many thanks,
Ben Cocker
MSc Statistics at UCL, London, UK
More information about the R-help
mailing list