[R-SIG-Finance] Speed optimization on minutes distribution calculation
Jeff Ryan
jeff.a.ryan at gmail.com
Tue Jun 16 05:24:39 CEST 2009
I think you want something like ?aggregate.zoo
I didn't pull actual volume data, but here is an example that will
show what you can do:
library(xts) ## only used for the sequence and to leverage
aggregate.zoo internally.
## generate a sequence of POSIXct 1 mo @ 1min
x <- timeBasedSeq('20090515/20090615 12:00')
## convert to POSIXlt and turn into HHMM numeric format
hm <- as.POSIXlt(x)$min + as.POSIXlt(x)$hour * 100
## your original "Volume" column (here a simple xts object with each
min having Vol=1000)
## There are 32 observations at each minute in 00:00--12:00 and 31
for 12:01--23:59
xx <- xts(rep(1000,length(x)), x)
## using 'aggregate' to apply sum to the matching times
ax <- aggregate(xx, as.factor(hm), sum)
head(ax)
0 32000
1 32000
2 32000
3 32000
4 32000
5 32000
> tail(ax)
2354 31000
2355 31000
2356 31000
2357 31000
2358 31000
2359 31000
I haven't had a chance to actually test this, but at the very least it
should provide a start for you.
And the above is very fast:
system.time(ax <- aggregate(xx, as.factor(hm), sum))
user system elapsed
0.058 0.015 0.073
HTH
Jeff
On Mon, Jun 15, 2009 at 9:27 PM, Wind<windspeedo99 at gmail.com> wrote:
> periodicity() function in xts is a good tool for axis manipulation.
>
> Maybe I should not use character string methods to complie the
> distribution of minutes volume, as Brian suggested. But what
> function should be used for such task in R? I've tried in kdb+ , it
> is somewhat simple and quick enough with select and xbar function.
> But I am not familiar with R. Maybe there is some functions for this
> specific task I don't know.
>
> Thanks Brian.
>
>
> On Tue, Jun 16, 2009 at 8:00 AM, Brian G. Peterson<brian at braverock.com> wrote:
>> It seems that the slow part is all the character string manipulation. This
>> would be slow in almost any programming language. Honestly, I am always
>> annoyed by useless axes in charts that simply count from 1 to n. A time
>> axis at least has some real meaning, and avoids the useless rewriting of
>> character strings.
>>
>> You should be able to get a meaningful, readable axis using the
>> periodicity() function in xts without the string manipulation.
>>
>> Regards,
>>
>> - Brian
>>
>> Wind wrote:
>>>
>>> I want to plot the distribution of volume of the future CLN9 along
>>> the 24 hours axis. The following codes could complete the task. But
>>> it is very time consuming when sapply(mins,function(x)
>>> {mean(hqm[which(format(index(hqm),"%H:%M")==x),5])}).
>>> Any suggestion for codes with better performance would be highly
>>> appreciated.
>>>
>>>
>>> The data hqm has been retrieved from IB via IBrokers.
>>>
>>>
>>>>
>>>> head(hqm[,5])
>>>>
>>>
>>> CLN9.Volume
>>> 2009-05-25 06:00:00 17
>>> 2009-05-25 06:01:00 2
>>> 2009-05-25 06:02:00 11
>>> 2009-05-25 06:03:00 26
>>> 2009-05-25 06:04:00 20
>>> 2009-05-25 06:05:00 5
>>>
>>>>
>>>> tail(hqm[,5])
>>>>
>>>
>>> CLN9.Volume
>>> 2009-06-15 21:51:00 1050
>>> 2009-06-15 21:52:00 807
>>> 2009-06-15 21:53:00 782
>>> 2009-06-15 21:54:00 385
>>> 2009-06-15 21:55:00 562
>>> 2009-06-15 21:56:00 423
>>>
>>>>
>>>>
>>>> mins<-unlist(lapply(0:23,function(h){sapply(0:59,function(m){paste(sprintf("%02d",h),sprintf("%02d",m),sep=":")})}))
>>>> head(mins)
>>>>
>>>
>>> [1] "00:00" "00:01" "00:02" "00:03" "00:04" "00:05"
>>>
>>>>
>>>> tail(mins)
>>>>
>>>
>>> [1] "23:54" "23:55" "23:56" "23:57" "23:58" "23:59"
>>>
>>>
>>>>
>>>> temp<-sapply(mins,function(x)
>>>> {mean(hqm[which(format(index(hqm),"%H:%M")==x),5])})
>>>> head(temp)
>>>>
>>>
>>> 00:00 00:01 00:02 00:03 00:04 00:05
>>> 279.1333 284.9333 247.8667 176.3333 278.8667 179.0667
>>>
>>>>
>>>> tail(temp)
>>>>
>>>
>>> 23:54 23:55 23:56 23:57 23:58 23:59
>>> 250.2667 312.7333 318.9333 210.8000 258.2000 232.8667
>>>
>>>>
>>>> plot(temp)
>>>>
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>>>
>>
>>
>> --
>> Brian G. Peterson
>> http://braverock.com/brian/
>> Ph: 773-459-4973
>> IM: bgpbraverock
>>
>>
>>
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>
--
Jeffrey Ryan
jeffrey.ryan at insightalgo.com
ia: insight algorithmics
www.insightalgo.com
More information about the R-SIG-Finance
mailing list