[R-SIG-Finance] high frequency data analysis in R

Hae Kyung Im hakyim at gmail.com
Thu May 21 18:13:49 CEST 2009

I think in general you would need some sort of pre-processing before using R.

You can use periodic sampling of prices, but you may be throwing away
a lot of information. This is a method that used to be recommended
more than 5 years ago in order to mitigate the effect of market noise.
At least in the context of volatility estimation.

Here is my experience with tick data:

I used FX data to calculate estimated daily volatility using TSRV
(Zhang et al 2005
http://galton.uchicago.edu/~mykland/paperlinks/p1394.pdf). Using the
time series of estimated daily volatilities, I forecasted volatilities
for 1 day up to 1 year ahead. The tick data was in Quantitative
Analytics database. I used their C++ API to query daily data, computed
the TSRV estimator in C++ and saved the result in text file. Then I
used R to read the estimated volatilities and used FARIMA to forecast
volatility. An interesting thing about this type of series is that the
fractional coefficient is approximately 0.4 in many instances.
Bollerslev has a paper commenting on this fact.

In another project, I had treasury futures market depth data. The data
came in plain text format, with one file per day. Each day had more
than 1 million entries. I don't think I could handle this with R. To
get started I decided to use only actual trades. I used Python to
filter out the trades. So this came down to ~60K entries per day. This
I could handle with R. I used to.period from xts package to aggregate
the data.

In order to handle market depth data, we need some efficient way to
access (query) this huge database. I looked a little bit into kdb but
you have to pay ~25K to buy the software for one processor. I haven't
been able to look more into this for now.


On Thu, May 21, 2009 at 10:15 AM, Jeff Ryan <jeff.a.ryan at gmail.com> wrote:
> Not my domain, but you will more than likely have to aggregate to some
> sort of regular/homogenous type of series for most traditional tools
> to work.
> xts has to.period to aggregate up to a lower frequency from tick-level
> data. Coupled with something like na.locf you can make yourself some
> high frequency 'regular' data from 'irregular'
> Regular and irregular of course depend on what you are looking at
> (weekends missing in daily data can still be 'regular').
> I'd be interested in hearing thoughts from those who actually tread in
> the high-freq domain...
> A wealth of information can be found here:
>  http://www.olsen.ch/publications/working-papers/
> Jeff
> On Thu, May 21, 2009 at 10:04 AM, Michael <comtech.usa at gmail.com> wrote:
>> Hi all,
>> I am wondering if there are some special toolboxes to handle high
>> frequency data in R?
>> I have some high frequency data and was wondering what meaningful
>> experiments can I run on these high frequency data.
>> Not sure if normal (low frequency) financial time series textbook data
>> analysis tools will work for high frequency data?
>> Let's say I run a correlation between two stocks using the high
>> frequency data, or run an ARMA model on one stock, will the results be
>> meaningful?
>> Could anybody point me some classroom types of treatment or lab
>> tutorial type of document which show me what meaningful
>> experiments/tests I can run on high frequency data?
>> Thanks a lot!
>> _______________________________________________
>> R-SIG-Finance at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
> --
> Jeffrey Ryan
> jeffrey.ryan at insightalgo.com
> ia: insight algorithmics
> www.insightalgo.com
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.

More information about the R-SIG-Finance mailing list