[R-SIG-Finance] high frequency data analysis in R

Hae Kyung Im hakyim at gmail.com
Thu May 21 19:02:20 CEST 2009


Jeff,

This is very impressive. Even on my Macbook Air it takes less than 0.2
seconds total.

> x <- .xts(1:1e6, 1:1e6)
> system.time(merge(x,x))
   user  system elapsed
  0.093   0.021   0.198


> quantmod now has (devel) an attachSymbols function that makes
> lazy-loading data very easy, so all your data can be stored as xts
> objects and read in on-demand.

When you say stored, does it mean on disk or memory?

> xts is also getting the ability to query subsets of data on disk, by
> time.  This will have no practical limit.

This would be great! Will we be able to append data to xts stored on disk?


Thanks
Haky



On Thu, May 21, 2009 at 11:23 AM, Jeff Ryan <jeff.a.ryan at gmail.com> wrote:
> Not to distract from the underlying processing question, but to answer
> the 'data' one:
>
> The data in R should be too much of an issue, at least from a size perspective.
>
> xts objects on the order of millions of observations are still fast
> and memory friendly with respect to copying operations internal to
> many xts calls (merge, subset, etc).
>
>> x <- .xts(1:1e6, 1:1e6)
>> system.time(merge(x,x))
>   user  system elapsed
>  0.037   0.015   0.053
>
>
> 7 million obs of a single column xts is ~54 Mb.  Certainly you can
> handle quite a bit of data if you have anything more than trivial
> amounts of RAM.
>
> quantmod now has (devel) an attachSymbols function that makes
> lazy-loading data very easy, so all your data can be stored as xts
> objects and read in on-demand.
>
> xts is also getting the ability to query subsets of data on disk, by
> time.  This will have no practical limit.
>
> For current data solutions xts, fts (C++), data.table, and some other
> solutions should mitigate your problems, if not solve the 'data' side
> all together.
>
>
> HTH
> Jeff
>
>
>
> On Thu, May 21, 2009 at 11:13 AM, Hae Kyung Im <hakyim at gmail.com> wrote:
>> I think in general you would need some sort of pre-processing before using R.
>>
>> You can use periodic sampling of prices, but you may be throwing away
>> a lot of information. This is a method that used to be recommended
>> more than 5 years ago in order to mitigate the effect of market noise.
>> At least in the context of volatility estimation.
>>
>> Here is my experience with tick data:
>>
>> I used FX data to calculate estimated daily volatility using TSRV
>> (Zhang et al 2005
>> http://galton.uchicago.edu/~mykland/paperlinks/p1394.pdf). Using the
>> time series of estimated daily volatilities, I forecasted volatilities
>> for 1 day up to 1 year ahead. The tick data was in Quantitative
>> Analytics database. I used their C++ API to query daily data, computed
>> the TSRV estimator in C++ and saved the result in text file. Then I
>> used R to read the estimated volatilities and used FARIMA to forecast
>> volatility. An interesting thing about this type of series is that the
>> fractional coefficient is approximately 0.4 in many instances.
>> Bollerslev has a paper commenting on this fact.
>>
>> In another project, I had treasury futures market depth data. The data
>> came in plain text format, with one file per day. Each day had more
>> than 1 million entries. I don't think I could handle this with R. To
>> get started I decided to use only actual trades. I used Python to
>> filter out the trades. So this came down to ~60K entries per day. This
>> I could handle with R. I used to.period from xts package to aggregate
>> the data.
>>
>> In order to handle market depth data, we need some efficient way to
>> access (query) this huge database. I looked a little bit into kdb but
>> you have to pay ~25K to buy the software for one processor. I haven't
>> been able to look more into this for now.
>>
>> Haky
>>
>>
>>
>>
>> On Thu, May 21, 2009 at 10:15 AM, Jeff Ryan <jeff.a.ryan at gmail.com> wrote:
>>> Not my domain, but you will more than likely have to aggregate to some
>>> sort of regular/homogenous type of series for most traditional tools
>>> to work.
>>>
>>> xts has to.period to aggregate up to a lower frequency from tick-level
>>> data. Coupled with something like na.locf you can make yourself some
>>> high frequency 'regular' data from 'irregular'
>>>
>>> Regular and irregular of course depend on what you are looking at
>>> (weekends missing in daily data can still be 'regular').
>>>
>>> I'd be interested in hearing thoughts from those who actually tread in
>>> the high-freq domain...
>>>
>>> A wealth of information can be found here:
>>>
>>>  http://www.olsen.ch/publications/working-papers/
>>>
>>> Jeff
>>>
>>> On Thu, May 21, 2009 at 10:04 AM, Michael <comtech.usa at gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> I am wondering if there are some special toolboxes to handle high
>>>> frequency data in R?
>>>>
>>>> I have some high frequency data and was wondering what meaningful
>>>> experiments can I run on these high frequency data.
>>>>
>>>> Not sure if normal (low frequency) financial time series textbook data
>>>> analysis tools will work for high frequency data?
>>>>
>>>> Let's say I run a correlation between two stocks using the high
>>>> frequency data, or run an ARMA model on one stock, will the results be
>>>> meaningful?
>>>>
>>>> Could anybody point me some classroom types of treatment or lab
>>>> tutorial type of document which show me what meaningful
>>>> experiments/tests I can run on high frequency data?
>>>>
>>>> Thanks a lot!
>>>>
>>>> _______________________________________________
>>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>>> -- Subscriber-posting only.
>>>> -- If you want to post, subscribe first.
>>>>
>>>
>>>
>>>
>>> --
>>> Jeffrey Ryan
>>> jeffrey.ryan at insightalgo.com
>>>
>>> ia: insight algorithmics
>>> www.insightalgo.com
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>>>
>>
>
>
>
> --
> Jeffrey Ryan
> jeffrey.ryan at insightalgo.com
>
> ia: insight algorithmics
> www.insightalgo.com
>



More information about the R-SIG-Finance mailing list