[R-SIG-Finance] Discretising intra-day data using zoo?

Gabor Grothendieck ggrothendieck at gmail.com
Sun Nov 8 15:13:37 CET 2009


On Sun, Nov 8, 2009 at 7:58 AM, Ajay Shah <ajayshah at mayin.org> wrote:
> library(zoo)
> print(load(url("http://www.mayin.org/ajayshah/tmp/demo.rda")))
> options("digits.secs"=6)
> head(demo)
> tail(demo)
>
> On Sun, Nov 08, 2009 at 07:20:02AM -0500, Gabor Grothendieck wrote:
>> See the aggregate.zoo example in vignette("zoo-quickref") but round up
>> to the next 4 seconds instead of next Friday:
>>
>> > to4sec <- function(x) as.POSIXct(4*ceiling(as.numeric(x)/4), origin = "1970-01-01")
>> > aggregate(demo, to4sec, tail, 1)
>>                     spread    ltp
>> 2009-02-16 05:00:04 0.0050 48.715
>> 2009-02-16 05:00:08 0.0025 48.715
>> 2009-02-16 05:00:12 0.0025 48.715
>> 2009-02-16 05:00:16 0.0025 48.715
>
> Gabor, thanks! I am not as fluent with as.POSIXct() as I should be.
>
> And, to continue with my original question:
>
>> > Suppose there is not a single record in the raw data from 10:30:04 to
>> > 10:30:09. Despite this, the resulting object should contain a record
>> > for 10:30:08 with NA values (which can then be filled out e.g. using
>> > na.locf()). How would we do this? This problem is not present in this
>> > data, where records are plentiful. But discretisation code should be
>> > general and handle this case right.
>
> How would we do this? To illustrate:
>
>  demo2 <- demo[-300:-700,]
>  plot(index(demo2), 1:599, type="l")         # we see that 5th to 10th
>                                              # second is zapped out.
>  to5sec <- function(x) as.POSIXct(5*ceiling(as.numeric(x)/5), origin = "1970-01-01")
>
>
> Now :
>
>> aggregate(demo, to5sec, tail, 1)
>                    spread    ltp
> 2009-02-16 05:00:05 0.0050 48.715
> 2009-02-16 05:00:10 0.0025 48.715
> 2009-02-16 05:00:15 0.0025 48.715
> 2009-02-16 05:00:20 0.0025 48.715
>> aggregate(demo2, to5sec, tail, 1)
>                    spread    ltp
> 2009-02-16 05:00:05 0.0050 48.715
> 2009-02-16 05:00:15 0.0025 48.715
> 2009-02-16 05:00:20 0.0025 48.715
>
> We should get :
>
>                    spread    ltp
> 2009-02-16 05:00:05 0.0050 48.715
> 2009-02-16 05:00:10 NA     NA
> 2009-02-16 05:00:15 0.0025 48.715
> 2009-02-16 05:00:20 0.0025 48.715
>

The trick is that converting to ts makes the series regular (as that
is the only thing ts can represent) so just convert it to ts and then
back to zoo.  Since ts cannot represent POSIXct what you get back will
not have the POSIXct class= attribute set so just set it yourself.

> # aggregate to 5 seconds
> ag <- aggregate(demo2, to5sec, tail, 1)
>
> # make regular (this will strip class from time)
> ag.fill <- as.zoo(as.ts(ag))
>
> # put class back on time
> time(ag.fill) <- structure(time(ag.fill), class = class(time(ag)))
> ag.fill
                    spread    ltp
2009-02-16 05:00:05 0.0050 48.715
2009-02-16 05:00:10     NA     NA
2009-02-16 05:00:15 0.0025 48.715
2009-02-16 05:00:20 0.0025 48.715



More information about the R-SIG-Finance mailing list