[R] Failure to aggregate

Gabor Grothendieck ggrothendieck at gmail.com
Mon Sep 6 20:15:07 CEST 2010


On Mon, Sep 6, 2010 at 12:15 PM, Dimitri Shvorob
<dimitri.shvorob at gmail.com> wrote:
>
> I have a (very big - 1.5 rows) dataframe with a (POSIXt"  "POSIXlt") column h
> (hour). Surprisingly, I cannot calculate a simple aggregate over the
> dataframe.
>
>> n.h1 = sqldf("select distinct h, count(*) from x group by h")
> Error in sqliteExecStatement(con, statement, bind.data) :
>  RS-DBI driver: (error in statement: no such table: x)
> In addition: Warning message:
> In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'
>
>> n.h2 = aggregate(x$price, by = x$h, FUN = nrow)
> Error in names(y) <- c(names(by), names(x)) :
>  'names' attribute [10] must be the same length as the vector [2]


Since you are using group by you don't want "distinct".

In aggregate use x["price"] and x["h"] rather than
x$price and x$h or use a formula.

Also use nrow or length in place of NROW.

library(sqldf)
x <- data.frame(price = 1:4, h = c(1, 1, 2, 3))
sqldf("select h, count(*) from x group by h")

aggregate(x["price"], by = x["h"], FUN = NROW)
aggregate(x["price"], by = x["h"], FUN = length)
aggregate(price ~ h, x, FUN = length)







-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list