[R] summarizing a data frame i.e. count -> group by
Giovanni Azua
bravegag at gmail.com
Sun Oct 23 19:29:40 CEST 2011
Hello,
This is one problem at the time :)
I have a data frame df that looks like this:
time partitioning_mode workload runtime
1 1 sharding query 607
2 1 sharding query 85
3 1 sharding query 52
4 1 sharding query 79
5 1 sharding query 77
6 1 sharding query 67
7 1 sharding query 98
8 1 sharding refresh 2932
9 1 sharding refresh 2870
10 1 sharding refresh 2877
11 1 sharding refresh 2868
12 1 replication query 2891
13 1 replication query 2907
14 1 replication query 2922
15 1 replication query 2937
and if I could use SQL ... omg! I really wish I could! I would do exactly this:
insert into throughput
select time, partitioning_mode, count(*)
from data.frame
group by time, partitioning_mode
My attempted R versions are wrong and produce very cryptic error message:
> throughput <- aggregate(x=df[,c("time", "partitioning_mode")], by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) :
incorrect number of dimensions
> throughput <- aggregate(x=df, by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) :
incorrect number of dimensions
>throughput <- tapply(X=df$time, INDEX=list(df$time,df$partitioning), FUN=count)
I cant comprehend what comes out from this one ... :(
and I thought C++ template errors were the most cryptic ;P
Many many thanks in advance,
Best regards,
Giovanni
More information about the R-help
mailing list