[R] vectorization of a data-aggregation loop
Marc Schwartz
MSchwartz at MedAnalytics.com
Wed Feb 2 00:18:16 CET 2005
On Tue, 2005-02-01 at 23:28 +0100, Christoph Lehmann wrote:
> Hi
> I have a simple question:
>
> the following data.frame
>
> id iwv type
> 1 1 1 a
> 2 1 2 b
> 3 1 11 b
> 4 1 5 a
> 5 1 6 c
> 6 2 4 c
> 7 2 3 c
> 8 2 10 a
> 9 3 6 b
> 10 3 9 a
> 11 3 8 b
> 12 3 7 c
>
> shall be aggregated into the form:
>
> id t.a t.b t.c
> 1 1 6 13 6
> 6 2 10 0 7
> 9 3 9 14 7
>
> means for each 'type' (a, b, c) a new column is introduced which
> gets the sum of iwv for the respective observations 'id'
>
> of course I can do this transformation/aggregation in a loop (see
> below), but is there a way to do this more efficiently, eg. in using
> tapply (or something similar)- since I have lot many rows?
>
> thanks for a hint
Well, I'll get you started using the sample data you have above.
Presuming that your data is in a data frame called 'df':
# Use aggregate to get the summations data by id and type
> df.a <- aggregate(df$iwv, by = list(df$id, df$type), sum)
# Show the results
> df.a
Group.1 Group.2 x
1 1 a 6
2 2 a 10
3 3 a 9
4 1 b 13
5 3 b 14
6 1 c 6
7 2 c 7
8 3 c 7
# Now use xtabs() to create a contingency table from df.a
> xtabs(x ~ Group.1 + Group.2, data = df.a)
Group.2
Group.1 a b c
1 6 13 6
2 10 0 7
3 9 14 7
You can now modify the colnames in the result of the xtabs step as you
desire.
It's a little easier in two steps. See ?aggregate and ?xtabs for more
information.
HTH,
Marc Schwartz
More information about the R-help
mailing list