[R] importing and merging many time series
Anton Lebedevich
mabrek at gmail.com
Sun Apr 7 14:40:33 CEST 2013
Hello.
I've got many (5-20k) files with time series in a text format like this:
1359635460 2.006747
1359635520 1.886745
1359635580 3.066988
1359635640 3.633578
1359635700 2.140082
1359635760 2.033564
1359635820 1.980123
1359635880 2.060131
1359635940 2.113416
1359636000 2.440172
First field is a unix timestamp, second is a float number. Its a text
export of http://graphite.readthedocs.org/en/latest/whisper.html
databases. Time series could have different resolutions, start/end
times, and possibly gaps inside.
Current way of importing them:
read.file <- function(file.name) {
read.zoo(
file.name,
na.strings="None",
colClasses=c("integer", "numeric"),
col.names=c("time", basename(file.name)),
FUN=function(t) {as.POSIXct(t, origin="1970-01-01 00:00.00", tz="UTC")},
drop=FALSE)
}
load.metrics <- function(path=".") {
do.call(merge.zoo, lapply(list.files(path, full.names=TRUE), read.file))
}
It works for 6k time series with 2k points in each, but fails with out
of memory error on 16Gb box when I try to import 10k time series with
10k points.
I've tried to make merging incremental by using Reduce but import speed
became unacceptable:
load.metrics <- function(path=".") {
Reduce(
function(a, b) {
if (class(a) == "character") {
a <- read.file(a)
}
merge.zoo(a, read.file(b))
},
list.files(path, full.names=TRUE))
}
Is there faster and less memory consuming way to import and merge a lot
of time series?
Regards,
Anton Lebedevich.
More information about the R-help
mailing list