[R-SIG-Finance] Discretising intra-day data using zoo?
Gabor Grothendieck
ggrothendieck at gmail.com
Sun Nov 8 15:25:54 CET 2009
On Sun, Nov 8, 2009 at 9:13 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Sun, Nov 8, 2009 at 7:58 AM, Ajay Shah <ajayshah at mayin.org> wrote:
>> library(zoo)
>> print(load(url("http://www.mayin.org/ajayshah/tmp/demo.rda")))
>> options("digits.secs"=6)
>> head(demo)
>> tail(demo)
>>
>> On Sun, Nov 08, 2009 at 07:20:02AM -0500, Gabor Grothendieck wrote:
>>> See the aggregate.zoo example in vignette("zoo-quickref") but round up
>>> to the next 4 seconds instead of next Friday:
>>>
>>> > to4sec <- function(x) as.POSIXct(4*ceiling(as.numeric(x)/4), origin = "1970-01-01")
>>> > aggregate(demo, to4sec, tail, 1)
>>> spread ltp
>>> 2009-02-16 05:00:04 0.0050 48.715
>>> 2009-02-16 05:00:08 0.0025 48.715
>>> 2009-02-16 05:00:12 0.0025 48.715
>>> 2009-02-16 05:00:16 0.0025 48.715
>>
>> Gabor, thanks! I am not as fluent with as.POSIXct() as I should be.
>>
>> And, to continue with my original question:
>>
>>> > Suppose there is not a single record in the raw data from 10:30:04 to
>>> > 10:30:09. Despite this, the resulting object should contain a record
>>> > for 10:30:08 with NA values (which can then be filled out e.g. using
>>> > na.locf()). How would we do this? This problem is not present in this
>>> > data, where records are plentiful. But discretisation code should be
>>> > general and handle this case right.
>>
>> How would we do this? To illustrate:
>>
>> demo2 <- demo[-300:-700,]
>> plot(index(demo2), 1:599, type="l") # we see that 5th to 10th
>> # second is zapped out.
>> to5sec <- function(x) as.POSIXct(5*ceiling(as.numeric(x)/5), origin = "1970-01-01")
>>
>>
>> Now :
>>
>>> aggregate(demo, to5sec, tail, 1)
>> spread ltp
>> 2009-02-16 05:00:05 0.0050 48.715
>> 2009-02-16 05:00:10 0.0025 48.715
>> 2009-02-16 05:00:15 0.0025 48.715
>> 2009-02-16 05:00:20 0.0025 48.715
>>> aggregate(demo2, to5sec, tail, 1)
>> spread ltp
>> 2009-02-16 05:00:05 0.0050 48.715
>> 2009-02-16 05:00:15 0.0025 48.715
>> 2009-02-16 05:00:20 0.0025 48.715
>>
>> We should get :
>>
>> spread ltp
>> 2009-02-16 05:00:05 0.0050 48.715
>> 2009-02-16 05:00:10 NA NA
>> 2009-02-16 05:00:15 0.0025 48.715
>> 2009-02-16 05:00:20 0.0025 48.715
>>
>
> The trick is that converting to ts makes the series regular (as that
> is the only thing ts can represent) so just convert it to ts and then
> back to zoo. Since ts cannot represent POSIXct what you get back will
> not have the POSIXct class= attribute set so just set it yourself.
>
>> # aggregate to 5 seconds
>> ag <- aggregate(demo2, to5sec, tail, 1)
>>
>> # make regular (this will strip class from time)
>> ag.fill <- as.zoo(as.ts(ag))
>>
>> # put class back on time
>> time(ag.fill) <- structure(time(ag.fill), class = class(time(ag)))
>> ag.fill
> spread ltp
> 2009-02-16 05:00:05 0.0050 48.715
> 2009-02-16 05:00:10 NA NA
> 2009-02-16 05:00:15 0.0025 48.715
> 2009-02-16 05:00:20 0.0025 48.715
>
A slightly shorter alternative to the time(ag.fill)<- line above is:
class(time(ag.fill)) <- class(time(ag))
More information about the R-SIG-Finance
mailing list