[R-SIG-Finance] Speed optimization on minutes distribution calculation

Wind windspeedo99 at gmail.com
Tue Jun 16 09:12:16 CEST 2009


> nqi <- index(NQ)
> hm <- as.POSIXlt(nqi)$min + as.POSIXlt(nqi)$hour*100
> NQV <- aggregate(Vo(NQ), as.factor(hm), sum)

That's exactly what I need.
The speed is amazing.  As quick as kdb+,  according to subjective judement.
Frankly speaking,  I never imagined the speed of R  could match kdb+.

By the way, it's also a great idea of creating a demand based database
system using getSymbols which mentioned in your ppt in the RFin 2009.
 Something like a hybrid of cloud computing and advanced quickest
database.    Before that, sometimes I used to store the data in a
database like kdb+ and retrieve them to R for furthur analysis.

Thanks Jeff and Brian.


On Tue, Jun 16, 2009 at 11:45 AM, Jeff Ryan<jeff.a.ryan at gmail.com> wrote:
> An actual example using IBrokers/IB
>
> NQ <- reqHistoricalData(tws,
>          twsFUT("NQ","GLOBEX","200909"),
>          useRTH="0", bar="1 min", dur="5 D")
>
> str(NQ)
> An 'xts' object from 2009-06-09 15:30:00 to 2009-06-15 22:33:00 containing:
>  Data: num [1:5910, 1:8] 1510 1510 1510 1510 1510 1510 1510 1510 1510 1510 ...
>  - attr(*, "dimnames")=List of 2
>  ..$ : NULL
>  ..$ : chr [1:8] "NQU9.Open" "NQU9.High" "NQU9.Low" "NQU9.Close" ...
>  Indexed by objects of class: [POSIXt,POSIXct] TZ: America/Chicago
>  xts Attributes:
> List of 4
>  $ from   : chr "20090611  04:33:46"
>  $ to     : chr "20090616  04:33:46"
>  $ src    : chr "IB"
>  $ updated: POSIXct[1:1], format: "2009-06-15 22:33:46.46141"
>
> nqi <- index(NQ)
> hm <- as.POSIXlt(nqi)$min + as.POSIXlt(nqi)$hour*100
> NQV <- aggregate(Vo(NQ), as.factor(hm), sum)
>
> barplot(NQV)
>
> The axis/chart leaves a lot to be desired, but once again that should
> be enough to set you on the right path.
>
> HTH
> Jeff
>
> On Mon, Jun 15, 2009 at 10:24 PM, Jeff Ryan<jeff.a.ryan at gmail.com> wrote:
>> I think you want something like ?aggregate.zoo
>>
>> I didn't pull actual volume data, but here is an example that will
>> show what you can do:
>>
>> library(xts)  ## only used for the sequence and to leverage
>> aggregate.zoo internally.
>>
>> ## generate a sequence of POSIXct 1 mo @ 1min
>> x <- timeBasedSeq('20090515/20090615 12:00')
>>
>> ## convert to POSIXlt and turn into HHMM numeric format
>> hm <- as.POSIXlt(x)$min + as.POSIXlt(x)$hour * 100
>>
>> ##  your original "Volume" column (here a simple xts object with each
>> min having Vol=1000)
>> ##  There are 32 observations at each minute in 00:00--12:00 and 31
>> for 12:01--23:59
>> xx <- xts(rep(1000,length(x)), x)
>>
>> ##  using 'aggregate' to apply sum to the matching times
>> ax <- aggregate(xx, as.factor(hm), sum)
>>
>> head(ax)
>>
>> 0 32000
>> 1 32000
>> 2 32000
>> 3 32000
>> 4 32000
>> 5 32000
>>> tail(ax)
>>
>> 2354 31000
>> 2355 31000
>> 2356 31000
>> 2357 31000
>> 2358 31000
>> 2359 31000
>>
>> I haven't had a chance to actually test this, but at the very least it
>> should provide a start for you.
>>
>> And the above is very fast:
>>
>>  system.time(ax <- aggregate(xx, as.factor(hm), sum))
>>   user  system elapsed
>>  0.058   0.015   0.073
>>
>> HTH
>> Jeff
>> On Mon, Jun 15, 2009 at 9:27 PM, Wind<windspeedo99 at gmail.com> wrote:
>>> periodicity() function in xts is a good tool for axis manipulation.
>>>
>>> Maybe I should not use character string methods to complie the
>>> distribution of minutes volume, as Brian suggested.   But what
>>> function should be used for such task in R?  I've tried in kdb+ , it
>>> is  somewhat simple and quick enough with select and xbar function.
>>> But I am not familiar with R.  Maybe there is some functions for this
>>> specific task I don't know.
>>>
>>> Thanks Brian.
>>>
>>>
>>> On Tue, Jun 16, 2009 at 8:00 AM, Brian G. Peterson<brian at braverock.com> wrote:
>>>> It seems that the slow part is all the character string manipulation.  This
>>>> would be slow in almost any programming language.   Honestly, I am always
>>>> annoyed by useless axes in charts that simply count from 1 to n.  A time
>>>> axis at least has some real meaning, and avoids the useless rewriting of
>>>> character strings.
>>>>
>>>> You should be able to get a meaningful, readable axis using the
>>>> periodicity() function in xts without the string manipulation.
>>>>
>>>> Regards,
>>>>
>>>>   - Brian
>>>>
>>>> Wind wrote:
>>>>>
>>>>> I want to plot the distribution of volume of the future  CLN9 along
>>>>> the 24 hours axis.   The following codes could complete the task.  But
>>>>> it is very time consuming when sapply(mins,function(x)
>>>>> {mean(hqm[which(format(index(hqm),"%H:%M")==x),5])}).
>>>>> Any suggestion for codes with better performance would be highly
>>>>> appreciated.
>>>>>
>>>>>
>>>>> The data hqm has been retrieved from IB via IBrokers.
>>>>>
>>>>>
>>>>>>
>>>>>> head(hqm[,5])
>>>>>>
>>>>>
>>>>>                    CLN9.Volume
>>>>> 2009-05-25 06:00:00          17
>>>>> 2009-05-25 06:01:00           2
>>>>> 2009-05-25 06:02:00          11
>>>>> 2009-05-25 06:03:00          26
>>>>> 2009-05-25 06:04:00          20
>>>>> 2009-05-25 06:05:00           5
>>>>>
>>>>>>
>>>>>> tail(hqm[,5])
>>>>>>
>>>>>
>>>>>                    CLN9.Volume
>>>>> 2009-06-15 21:51:00        1050
>>>>> 2009-06-15 21:52:00         807
>>>>> 2009-06-15 21:53:00         782
>>>>> 2009-06-15 21:54:00         385
>>>>> 2009-06-15 21:55:00         562
>>>>> 2009-06-15 21:56:00         423
>>>>>
>>>>>>
>>>>>>
>>>>>> mins<-unlist(lapply(0:23,function(h){sapply(0:59,function(m){paste(sprintf("%02d",h),sprintf("%02d",m),sep=":")})}))
>>>>>> head(mins)
>>>>>>
>>>>>
>>>>> [1] "00:00" "00:01" "00:02" "00:03" "00:04" "00:05"
>>>>>
>>>>>>
>>>>>> tail(mins)
>>>>>>
>>>>>
>>>>> [1] "23:54" "23:55" "23:56" "23:57" "23:58" "23:59"
>>>>>
>>>>>
>>>>>>
>>>>>> temp<-sapply(mins,function(x)
>>>>>> {mean(hqm[which(format(index(hqm),"%H:%M")==x),5])})
>>>>>> head(temp)
>>>>>>
>>>>>
>>>>>   00:00    00:01    00:02    00:03    00:04    00:05
>>>>> 279.1333 284.9333 247.8667 176.3333 278.8667 179.0667
>>>>>
>>>>>>
>>>>>> tail(temp)
>>>>>>
>>>>>
>>>>>   23:54    23:55    23:56    23:57    23:58    23:59
>>>>> 250.2667 312.7333 318.9333 210.8000 258.2000 232.8667
>>>>>
>>>>>>
>>>>>> plot(temp)
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>>>> -- Subscriber-posting only.
>>>>> -- If you want to post, subscribe first.
>>>>>
>>>>
>>>>
>>>> --
>>>> Brian G. Peterson
>>>> http://braverock.com/brian/
>>>> Ph: 773-459-4973
>>>> IM: bgpbraverock
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>>>
>>
>>
>>
>> --
>> Jeffrey Ryan
>> jeffrey.ryan at insightalgo.com
>>
>> ia: insight algorithmics
>> www.insightalgo.com
>>
>
>
>
> --
> Jeffrey Ryan
> jeffrey.ryan at insightalgo.com
>
> ia: insight algorithmics
> www.insightalgo.com
>



More information about the R-SIG-Finance mailing list