[R] importing and merging many time series

Joshua Ulrich josh.m.ulrich at gmail.com
Mon Apr 15 14:34:07 CEST 2013


On Sun, Apr 7, 2013 at 7:40 AM, Anton Lebedevich <mabrek at gmail.com> wrote:
> Hello.
>
> I've got many (5-20k) files with time series in a text format like this:
>
> 1359635460      2.006747
> 1359635520      1.886745
> 1359635580      3.066988
> 1359635640      3.633578
> 1359635700      2.140082
> 1359635760      2.033564
> 1359635820      1.980123
> 1359635880      2.060131
> 1359635940      2.113416
> 1359636000      2.440172
>
> First field is a unix timestamp, second is a float number. Its a text
> export of http://graphite.readthedocs.org/en/latest/whisper.html
> databases. Time series could have different resolutions, start/end
> times, and possibly gaps inside.
>
> Current way of importing them:
>
> read.file <- function(file.name) {
>   read.zoo(
>     file.name,
>     na.strings="None",
>     colClasses=c("integer", "numeric"),
>     col.names=c("time", basename(file.name)),
>     FUN=function(t) {as.POSIXct(t, origin="1970-01-01 00:00.00", tz="UTC")},
>     drop=FALSE)
> }
>
> load.metrics <- function(path=".") {
>   do.call(merge.zoo, lapply(list.files(path, full.names=TRUE), read.file))
> }
>
> It works for 6k time series with 2k points in each, but fails with out
> of memory error on 16Gb box when I try to import 10k time series with
> 10k points.
>
You're trying to merge 10,000 objects in a single call.  I'm not
surprised you run out of RAM.

> I've tried to make merging incremental by using Reduce but import speed
> became unacceptable:
>
This is similar to growing an object in a for loop, which is also slow.

> load.metrics <- function(path=".") {
>   Reduce(
>     function(a, b) {
>       if (class(a) == "character") {
>         a <- read.file(a)
>       }
>       merge.zoo(a, read.file(b))
>     },
>     list.files(path, full.names=TRUE))
> }
>
> Is there faster and less memory consuming way to import and merge a lot
> of time series?
>
Try something in between the two extremes (merging all objects at
once, versus merging every new object with the accumulated object).
For example, try merging 100-1000 objects at a time.

You might also benefit from converting your objects to xts, so you can
use xts' optimized merge.  You can always convert the final object
back to zoo.

> Regards,
> Anton Lebedevich.
>

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com

R/Finance 2013: Applied Finance with R  | www.RinFinance.com



More information about the R-help mailing list