[R] aggregate slow with variables of type 'dates' - how to solve

Gabor Grothendieck ggrothendieck at gmail.com
Sat Apr 16 06:07:59 CEST 2005


On 4/15/05, Christoph Lehmann <christoph.lehmann at gmx.ch> wrote:
> Dear all
> I use aggregate with variables of type numeric and dates. For type numeric
> functions, such as sum() are very fast, but similar simple functions, such
> as min() are much slower for the variables of type 'dates'. The difference
> gets bigger the larger the 'id' var is - but see this sample code:
> 
> dts <- dates(c("02/27/92", "02/27/92", "01/14/92",
>               "02/28/92", "02/01/92"))
> ntimes <- 700000
> dts <- data.frame(rep(c(1:40), ntimes/8),
>                  chron(rep(dts, ntimes), format = c(dates = "m/d/y")),
>                  rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes))
> names(dts) <- c("id", "date", "tbs")
> 
> date()
> dat.1st <- aggregate(dts$date, list(id = dts$id), min)$x
> dat.1st <- chron(dat.1st, format = c(dates = "m/d/y"))
> dat.1st
> date() #82 seconds
> 
> date()
> tbs.s <- aggregate(as.numeric(dts$tbs),list(id = dts$id), sum)
> tbs.s
> date() #17 seconds
> 
> --- is it a problem of data-type 'dates' ? if yes, is there any solution
> to solve this, since for huge data-sets, this can be a problem...
> 
> as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the
> two times are roughly the same, but with the 40 different ids, we have
> this big difference
> 
> thanks a lot
> 
> Christoph
> 
> --
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list