[R] summarizing a data frame i.e. count -> group by

Giovanni Azua bravegag at gmail.com
Sun Oct 23 19:29:40 CEST 2011


Hello,

This is one problem at the time :)

I have a data frame df that looks like this:

  time partitioning_mode workload runtime
1     1          sharding    query     607
2     1          sharding    query      85
3     1          sharding    query      52
4     1          sharding    query      79
5     1          sharding    query      77
6     1          sharding    query      67
7     1          sharding    query      98
8     1          sharding  refresh    2932
9     1          sharding  refresh    2870
10    1          sharding  refresh    2877
11    1          sharding  refresh    2868
12    1       replication    query    2891
13    1       replication    query    2907
14    1       replication    query    2922
15    1       replication    query    2937

and if I could use SQL ... omg! I really wish I could! I would do exactly this:

insert into throughput
  select time, partitioning_mode, count(*)
  from data.frame 
  group by time, partitioning_mode

My attempted R versions are wrong and produce very cryptic error message:

> throughput <- aggregate(x=df[,c("time", "partitioning_mode")], by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) : 
  incorrect number of dimensions

> throughput <- aggregate(x=df, by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) : 
  incorrect number of dimensions

>throughput <- tapply(X=df$time, INDEX=list(df$time,df$partitioning), FUN=count)
I cant comprehend what comes out from this one ... :(

and I thought C++ template errors were the most cryptic ;P

Many many thanks in advance,
Best regards,
Giovanni


More information about the R-help mailing list