[R-SIG-Finance] high frequency data analysis in R

Jeff Ryan jeff.a.ryan at gmail.com
Thu May 21 18:23:32 CEST 2009


Not to distract from the underlying processing question, but to answer
the 'data' one:

The data in R should be too much of an issue, at least from a size perspective.

xts objects on the order of millions of observations are still fast
and memory friendly with respect to copying operations internal to
many xts calls (merge, subset, etc).

> x <- .xts(1:1e6, 1:1e6)
> system.time(merge(x,x))
   user  system elapsed
  0.037   0.015   0.053


7 million obs of a single column xts is ~54 Mb.  Certainly you can
handle quite a bit of data if you have anything more than trivial
amounts of RAM.

quantmod now has (devel) an attachSymbols function that makes
lazy-loading data very easy, so all your data can be stored as xts
objects and read in on-demand.

xts is also getting the ability to query subsets of data on disk, by
time.  This will have no practical limit.

For current data solutions xts, fts (C++), data.table, and some other
solutions should mitigate your problems, if not solve the 'data' side
all together.


HTH
Jeff



On Thu, May 21, 2009 at 11:13 AM, Hae Kyung Im <hakyim at gmail.com> wrote:
> I think in general you would need some sort of pre-processing before using R.
>
> You can use periodic sampling of prices, but you may be throwing away
> a lot of information. This is a method that used to be recommended
> more than 5 years ago in order to mitigate the effect of market noise.
> At least in the context of volatility estimation.
>
> Here is my experience with tick data:
>
> I used FX data to calculate estimated daily volatility using TSRV
> (Zhang et al 2005
> http://galton.uchicago.edu/~mykland/paperlinks/p1394.pdf). Using the
> time series of estimated daily volatilities, I forecasted volatilities
> for 1 day up to 1 year ahead. The tick data was in Quantitative
> Analytics database. I used their C++ API to query daily data, computed
> the TSRV estimator in C++ and saved the result in text file. Then I
> used R to read the estimated volatilities and used FARIMA to forecast
> volatility. An interesting thing about this type of series is that the
> fractional coefficient is approximately 0.4 in many instances.
> Bollerslev has a paper commenting on this fact.
>
> In another project, I had treasury futures market depth data. The data
> came in plain text format, with one file per day. Each day had more
> than 1 million entries. I don't think I could handle this with R. To
> get started I decided to use only actual trades. I used Python to
> filter out the trades. So this came down to ~60K entries per day. This
> I could handle with R. I used to.period from xts package to aggregate
> the data.
>
> In order to handle market depth data, we need some efficient way to
> access (query) this huge database. I looked a little bit into kdb but
> you have to pay ~25K to buy the software for one processor. I haven't
> been able to look more into this for now.
>
> Haky
>
>
>
>
> On Thu, May 21, 2009 at 10:15 AM, Jeff Ryan <jeff.a.ryan at gmail.com> wrote:
>> Not my domain, but you will more than likely have to aggregate to some
>> sort of regular/homogenous type of series for most traditional tools
>> to work.
>>
>> xts has to.period to aggregate up to a lower frequency from tick-level
>> data. Coupled with something like na.locf you can make yourself some
>> high frequency 'regular' data from 'irregular'
>>
>> Regular and irregular of course depend on what you are looking at
>> (weekends missing in daily data can still be 'regular').
>>
>> I'd be interested in hearing thoughts from those who actually tread in
>> the high-freq domain...
>>
>> A wealth of information can be found here:
>>
>>  http://www.olsen.ch/publications/working-papers/
>>
>> Jeff
>>
>> On Thu, May 21, 2009 at 10:04 AM, Michael <comtech.usa at gmail.com> wrote:
>>> Hi all,
>>>
>>> I am wondering if there are some special toolboxes to handle high
>>> frequency data in R?
>>>
>>> I have some high frequency data and was wondering what meaningful
>>> experiments can I run on these high frequency data.
>>>
>>> Not sure if normal (low frequency) financial time series textbook data
>>> analysis tools will work for high frequency data?
>>>
>>> Let's say I run a correlation between two stocks using the high
>>> frequency data, or run an ARMA model on one stock, will the results be
>>> meaningful?
>>>
>>> Could anybody point me some classroom types of treatment or lab
>>> tutorial type of document which show me what meaningful
>>> experiments/tests I can run on high frequency data?
>>>
>>> Thanks a lot!
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>>>
>>
>>
>>
>> --
>> Jeffrey Ryan
>> jeffrey.ryan at insightalgo.com
>>
>> ia: insight algorithmics
>> www.insightalgo.com
>>
>> _______________________________________________
>> R-SIG-Finance at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
>>
>



-- 
Jeffrey Ryan
jeffrey.ryan at insightalgo.com

ia: insight algorithmics
www.insightalgo.com



More information about the R-SIG-Finance mailing list