[R] How to speed up grouping time series, help please
Den Alpin
den.alpin at gmail.com
Thu Apr 7 12:53:05 CEST 2011
I found a faster implementation (by an order of magnitude from my
tests) than the one using xts, split, merge (from Joshua).
I report the two fastest solution below with code to generate a test
case; some work still to be done for columns order and naming,
Test case has grown from my previous post to get a more realistic timing.
Any comment or idea to further speed up multivariate time series
creation with classes xts or timeSeries starting from a data.frame
like the one reported here is welcome.
Best regards,
Den
a data.frame example (code below to generate it)
ID DATE VALUE
14 3 2000-01-01 00:00:03 0.5726334
4 1 2000-01-01 00:00:03 0.8830174
1 1 2000-01-01 00:00:00 0.2875775
15 3 2000-01-01 00:00:04 0.1029247
11 3 2000-01-01 00:00:00 0.9568333
9 2 2000-01-01 00:00:03 0.5514350
7 2 2000-01-01 00:00:01 0.5281055
6 2 2000-01-01 00:00:00 0.0455565
12 3 2000-01-01 00:00:01 0.4533342
8 2 2000-01-01 00:00:02 0.8924190
3 1 2000-01-01 00:00:02 0.4089769
13 3 2000-01-01 00:00:02 0.6775706
And I want to get a timeSeries object or xts object like this:
1 2 3
2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
2000-01-01 00:00:01 NA 0.5281055 0.4533342
2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
2000-01-01 00:00:04 NA NA 0.1029247
# CODE:
set.seed(123)
# set N to 5 to reproduce above data.frame
N <- 1000
# set K to 3 to reproduce above data.frame
K <- 10
X <- data.frame(
ID = rep(1:K, each = N),
DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), K)),
VALUE = runif(N*K), stringsAsFactors = FALSE)
X <- X[sample(1:(N*K), N*K),]
X <- X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),]
str(X)
xtsSplit <- function(x)
{
library(xts)
x <- xts(x[,c("ID","VALUE")], as.POSIXct(x[,"DATE"]))
return(do.call(merge, split(x$VALUE,x$ID)))
}
xtsSplitTime <- replicate(50,
system.time(xtsSplit(X))[[1]])
median(xtsSplitTime)
xtsReshape <- function(x)
{
library(xts)
x <- reshape(x, idvar = "DATE", timevar = "ID", direction = "wide")
x <- xts(x[,-1], as.POSIXct(x[,1]))
return(x)
}
xtsReshapeTime <- replicate(50,
system.time(xtsReshape(X))[[1]])
median(xtsReshapeTime)
More information about the R-help
mailing list