[R-SIG-Finance] Speed optimization on minutes distribution calculation

Jeff Ryan jeff.a.ryan at gmail.com
Tue Jun 16 05:24:39 CEST 2009


I think you want something like ?aggregate.zoo

I didn't pull actual volume data, but here is an example that will
show what you can do:

library(xts)  ## only used for the sequence and to leverage
aggregate.zoo internally.

## generate a sequence of POSIXct 1 mo @ 1min
x <- timeBasedSeq('20090515/20090615 12:00')

## convert to POSIXlt and turn into HHMM numeric format
hm <- as.POSIXlt(x)$min + as.POSIXlt(x)$hour * 100

##  your original "Volume" column (here a simple xts object with each
min having Vol=1000)
##  There are 32 observations at each minute in 00:00--12:00 and 31
for 12:01--23:59
xx <- xts(rep(1000,length(x)), x)

##  using 'aggregate' to apply sum to the matching times
ax <- aggregate(xx, as.factor(hm), sum)

head(ax)

0 32000
1 32000
2 32000
3 32000
4 32000
5 32000
> tail(ax)

2354 31000
2355 31000
2356 31000
2357 31000
2358 31000
2359 31000

I haven't had a chance to actually test this, but at the very least it
should provide a start for you.

And the above is very fast:

 system.time(ax <- aggregate(xx, as.factor(hm), sum))
   user  system elapsed
  0.058   0.015   0.073

HTH
Jeff
On Mon, Jun 15, 2009 at 9:27 PM, Wind<windspeedo99 at gmail.com> wrote:
> periodicity() function in xts is a good tool for axis manipulation.
>
> Maybe I should not use character string methods to complie the
> distribution of minutes volume, as Brian suggested.   But what
> function should be used for such task in R?  I've tried in kdb+ , it
> is  somewhat simple and quick enough with select and xbar function.
> But I am not familiar with R.  Maybe there is some functions for this
> specific task I don't know.
>
> Thanks Brian.
>
>
> On Tue, Jun 16, 2009 at 8:00 AM, Brian G. Peterson<brian at braverock.com> wrote:
>> It seems that the slow part is all the character string manipulation.  This
>> would be slow in almost any programming language.   Honestly, I am always
>> annoyed by useless axes in charts that simply count from 1 to n.  A time
>> axis at least has some real meaning, and avoids the useless rewriting of
>> character strings.
>>
>> You should be able to get a meaningful, readable axis using the
>> periodicity() function in xts without the string manipulation.
>>
>> Regards,
>>
>>   - Brian
>>
>> Wind wrote:
>>>
>>> I want to plot the distribution of volume of the future  CLN9 along
>>> the 24 hours axis.   The following codes could complete the task.  But
>>> it is very time consuming when sapply(mins,function(x)
>>> {mean(hqm[which(format(index(hqm),"%H:%M")==x),5])}).
>>> Any suggestion for codes with better performance would be highly
>>> appreciated.
>>>
>>>
>>> The data hqm has been retrieved from IB via IBrokers.
>>>
>>>
>>>>
>>>> head(hqm[,5])
>>>>
>>>
>>>                    CLN9.Volume
>>> 2009-05-25 06:00:00          17
>>> 2009-05-25 06:01:00           2
>>> 2009-05-25 06:02:00          11
>>> 2009-05-25 06:03:00          26
>>> 2009-05-25 06:04:00          20
>>> 2009-05-25 06:05:00           5
>>>
>>>>
>>>> tail(hqm[,5])
>>>>
>>>
>>>                    CLN9.Volume
>>> 2009-06-15 21:51:00        1050
>>> 2009-06-15 21:52:00         807
>>> 2009-06-15 21:53:00         782
>>> 2009-06-15 21:54:00         385
>>> 2009-06-15 21:55:00         562
>>> 2009-06-15 21:56:00         423
>>>
>>>>
>>>>
>>>> mins<-unlist(lapply(0:23,function(h){sapply(0:59,function(m){paste(sprintf("%02d",h),sprintf("%02d",m),sep=":")})}))
>>>> head(mins)
>>>>
>>>
>>> [1] "00:00" "00:01" "00:02" "00:03" "00:04" "00:05"
>>>
>>>>
>>>> tail(mins)
>>>>
>>>
>>> [1] "23:54" "23:55" "23:56" "23:57" "23:58" "23:59"
>>>
>>>
>>>>
>>>> temp<-sapply(mins,function(x)
>>>> {mean(hqm[which(format(index(hqm),"%H:%M")==x),5])})
>>>> head(temp)
>>>>
>>>
>>>   00:00    00:01    00:02    00:03    00:04    00:05
>>> 279.1333 284.9333 247.8667 176.3333 278.8667 179.0667
>>>
>>>>
>>>> tail(temp)
>>>>
>>>
>>>   23:54    23:55    23:56    23:57    23:58    23:59
>>> 250.2667 312.7333 318.9333 210.8000 258.2000 232.8667
>>>
>>>>
>>>> plot(temp)
>>>>
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>>>
>>
>>
>> --
>> Brian G. Peterson
>> http://braverock.com/brian/
>> Ph: 773-459-4973
>> IM: bgpbraverock
>>
>>
>>
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>



-- 
Jeffrey Ryan
jeffrey.ryan at insightalgo.com

ia: insight algorithmics
www.insightalgo.com



More information about the R-SIG-Finance mailing list