[R] Selecting ranges of dates from a dataframe
Benjamin Stier
benjamin.stier at ub.uni-tuebingen.de
Fri Mar 11 14:41:12 CET 2011
Hi Francisco,
Thanks for your solution. It runs pretty fast compared to my for loop. Here
is a comparison of system.time():
system.time(splitVals <- by(serv, dates, aggregateDf ))
user system elapsed
1.129 0.218 1.348
system.time(... my long for loop...)
user system elapsed
276.987 1.544 278.698
I also tried Davids solution with "aggregate", but I can't get it to work
because I have to add as.numeric() into the sum(), since the data is very big.
I will now try to understand how the by()-function works and what it does.
Thanks again for helping me!
Regards,
Benjamin
On Thu, Mar 10, 2011 at 04:26:57PM +0000, Francisco Gochez wrote:
> Benjamin,
>
> A more elegant "R-style" solution would be to use one of R's "apply"/
> aggregation routines, of which there are many. For example, the "by" function
> can split a data.frame by some factor/categorical variable(s), and then apply a
> function to each "slice". The result can then be pieced back together. See
> below for an example in which this factor is simply a parallel vector of pure
> dates:
>
> # extract pure date component of time and date
> dates <- format(serv$datum, "%Y-%m-%d")
>
> # write auxilliary function to aggregate a "slice" of the data.frame
> # x will be a "slice" of data from a single day
> aggregateDf <- function(x)
> {
> # return a one-row data.frame
> data.frame(datum = format(x$datum[1], "%Y-%m-%d"), write = sum(x$write),
> read = sum(x$read) )
> }
>
> # now process each "slice" of the serv data.frame using "by"
> splitVals <- by(serv, dates, aggregateDf )
>
> # bind back into a single data.frame
> values <- do.call(rbind, splitVals)
>
>
> The difference in execution speed is pretty negligible on my machine, so it's a
> more concise solution but I don't know if it is much faster.
>
> HTH,
>
> Francisco
More information about the R-help
mailing list