[R] vectorization of a data-aggregation loop

Wed Feb 2 00:18:16 CET 2005

On Tue, 2005-02-01 at 23:28 +0100, Christoph Lehmann wrote:
> Hi
> I have a simple question:
> 
> the following data.frame
> 
>     id iwv type
> 1   1   1    a
> 2   1   2    b
> 3   1  11    b
> 4   1   5    a
> 5   1   6    c
> 6   2   4    c
> 7   2   3    c
> 8   2  10    a
> 9   3   6    b
> 10  3   9    a
> 11  3   8    b
> 12  3   7    c
> 
> shall be aggregated into the form:
> 
>    id t.a t.b t.c
> 1  1   6  13   6
> 6  2  10   0   7
> 9  3   9  14   7
> 
> means for each 'type' (a, b, c) a new column is introduced which
> gets the sum of iwv for the respective observations 'id'
> 
> of course I can do this transformation/aggregation in a loop (see 
> below), but is there a way to do this more efficiently, eg. in using 
> tapply (or something similar)- since I have lot many rows?
> 
> thanks for a hint

Well, I'll get you started using the sample data you have above.

Presuming that your data is in a data frame called 'df':

# Use aggregate to get the summations data by id and type
> df.a <- aggregate(df$iwv, by = list(df$id, df$type), sum)

# Show the results
> df.a
  Group.1 Group.2  x
1       1       a  6
2       2       a 10
3       3       a  9
4       1       b 13
5       3       b 14
6       1       c  6
7       2       c  7
8       3       c  7

# Now use xtabs() to create a contingency table from df.a

> xtabs(x ~ Group.1 + Group.2, data = df.a)
       Group.2
Group.1 a  b  c 
      1  6 13  6
      2 10  0  7
      3  9 14  7

You can now modify the colnames in the result of the xtabs step as you
desire.

It's a little easier in two steps. See ?aggregate and ?xtabs for more
information.

HTH,

Marc Schwartz