[R-SIG-Finance] Discretising intra-day data -- how to get by with less memory?
Brian G. Peterson
brian at braverock.com
Fri Nov 27 14:00:23 CET 2009
Brian G. Peterson wrote:
> Ajay Shah wrote:
>> I'm using this function to convert intra-day data into a grid with an
>> observation each N seconds:
>>
>> # This function consumes "z" a zoo object where timestamps are
>> intraday
>> # and a period for discretisation Nseconds.
>> # The key ideas are from this thread:
>> # https://stat.ethz.ch/pipermail/r-sig-finance/2009q4/005144.html
>> intraday.discretise <- function(z, Nseconds) {
>> toNsec <- function(x)
>> as.POSIXct(Nseconds*ceiling(as.numeric(x)/Nseconds),
>> origin = "1970-01-01")
>> d <- aggregate(z, toNsec, tail, 1)
>> # At this point there is one problem: NA records are not created
>> # for blocks of time in which there were no records.
>>
>> # To solve this:
>> dreg <- as.zoo(as.ts(d))
>> class(time(dreg)) <- class(time(d))
>>
>> dreg
>> }
>>
>> This works correctly but it's incredibly memory-intensive. I'm running
>> out of core in running this for some problems.
>>
>> Is there a way to write this which would use less RAM?
>>
>>
> Jeff Ryan, Abe Winter, and I came up with an align.time function a few
> months back:
>
> align.time <- function(x, n=30) {
> structure(unclass(x) + (n - unclass(x) %% n),
> class=c("POSIXt","POSIXct")) }
>
> x is xts data
> n is seconds
>
> Regards,
>
> - Brian
>
Or, an earlier, slower version:
this works well enough to generate a new index on the output of to.period:
# stamp is POSIXct object, like index(x) of an xts object
# n is number of seconds to round to, so n=k in to.period
even_seconds = function(stamp,n=60)
{
tzone = attr(stamp,"tzone")
if (is.null(tzone)) { tzone = "" }
base = as.POSIXct(strptime( format(stamp,"%Y%m%d"), "%Y%m%d" ),tz=tzone)
i = as.numeric(stamp) - as.numeric(base)
i = base + n*ceiling(i/n)
i
}
--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock
More information about the R-SIG-Finance
mailing list