[R] summing a large, partitioned data frame
james.foadi at diamond.ac.uk
james.foadi at diamond.ac.uk
Mon Jan 25 17:07:16 CET 2010
Dear R community,
I'm trying to develop a fast way of summing specific rows of a large data frame.
Here is an example of the kind of data frames I'm dealing with:
> refls
H K L M/ISYM BATCH I SIGI
43247 1 0 5 21 79 61.44117 2.20553
1040 1 0 5 257 6 15.16316 0.54431
2324 1 0 5 257 5 46.76152 1.67858
31515 1 0 5 259 60 57.97305 2.08104
35158 1 0 5 259 61 3.15614 0.11329
51575 1 0 6 259 88 380.04477 8.08878
51846 1 0 6 259 89 624.90802 13.30038
28946 1 1 4 1 42 2517.79492 55.37144
23199 1 1 4 5 31 2525.67407 55.54472
23198 1 1 4 21 39 2519.44653 55.40777
............................................
............................................
I need to add up all I's with same H, K, L and M/ISYM.
The new data frame coming out of this partial summing should look, in this case, like:
H K L M/ISYM BATCH I SIGI
43247 1 0 5 21 79 61.44117 2.20553
1040 1 0 5 257 6 61.92468 0.54431
31515 1 0 5 259 60 61.12919 2.08104
51575 1 0 6 259 88 1004.95279 8.08878
28946 1 1 4 1 42 2517.79492 55.37144
23199 1 1 4 5 31 2525.67407 55.54472
23198 1 1 4 21 39 2519.44653 55.40777
............................................
............................................
Essentially I only add those I's with same H, K, L, M/ISYM and replace the sum
in a unique row in the new data frame. In other words there's first a partition and then
a sum.
I have tried with a for loop, but it really takes too long.
I was wondering whether anyone knows of a better and faster way of doing this operation.
J
Dr James Foadi PhD
Membrane Protein Laboratory (MPL)
Diamond Light Source Ltd
Diamond House
Harewell Science and Innovation Campus
Chilton, Didcot
Oxfordshire OX11 0DE
Email : james.foadi at diamond.ac.uk
Alt Email: j.foadi at imperial.ac.uk
--
This e-mail and any attachments may contain confidential...{{dropped:8}}
More information about the R-help
mailing list