[R-SIG-Finance] high frequency data analysis in R
jeff.a.ryan at gmail.com
Thu May 21 18:23:32 CEST 2009
Not to distract from the underlying processing question, but to answer
the 'data' one:
The data in R should be too much of an issue, at least from a size perspective.
xts objects on the order of millions of observations are still fast
and memory friendly with respect to copying operations internal to
many xts calls (merge, subset, etc).
> x <- .xts(1:1e6, 1:1e6)
user system elapsed
0.037 0.015 0.053
7 million obs of a single column xts is ~54 Mb. Certainly you can
handle quite a bit of data if you have anything more than trivial
amounts of RAM.
quantmod now has (devel) an attachSymbols function that makes
lazy-loading data very easy, so all your data can be stored as xts
objects and read in on-demand.
xts is also getting the ability to query subsets of data on disk, by
time. This will have no practical limit.
For current data solutions xts, fts (C++), data.table, and some other
solutions should mitigate your problems, if not solve the 'data' side
On Thu, May 21, 2009 at 11:13 AM, Hae Kyung Im <hakyim at gmail.com> wrote:
> I think in general you would need some sort of pre-processing before using R.
> You can use periodic sampling of prices, but you may be throwing away
> a lot of information. This is a method that used to be recommended
> more than 5 years ago in order to mitigate the effect of market noise.
> At least in the context of volatility estimation.
> Here is my experience with tick data:
> I used FX data to calculate estimated daily volatility using TSRV
> (Zhang et al 2005
> http://galton.uchicago.edu/~mykland/paperlinks/p1394.pdf). Using the
> time series of estimated daily volatilities, I forecasted volatilities
> for 1 day up to 1 year ahead. The tick data was in Quantitative
> Analytics database. I used their C++ API to query daily data, computed
> the TSRV estimator in C++ and saved the result in text file. Then I
> used R to read the estimated volatilities and used FARIMA to forecast
> volatility. An interesting thing about this type of series is that the
> fractional coefficient is approximately 0.4 in many instances.
> Bollerslev has a paper commenting on this fact.
> In another project, I had treasury futures market depth data. The data
> came in plain text format, with one file per day. Each day had more
> than 1 million entries. I don't think I could handle this with R. To
> get started I decided to use only actual trades. I used Python to
> filter out the trades. So this came down to ~60K entries per day. This
> I could handle with R. I used to.period from xts package to aggregate
> the data.
> In order to handle market depth data, we need some efficient way to
> access (query) this huge database. I looked a little bit into kdb but
> you have to pay ~25K to buy the software for one processor. I haven't
> been able to look more into this for now.
> On Thu, May 21, 2009 at 10:15 AM, Jeff Ryan <jeff.a.ryan at gmail.com> wrote:
>> Not my domain, but you will more than likely have to aggregate to some
>> sort of regular/homogenous type of series for most traditional tools
>> to work.
>> xts has to.period to aggregate up to a lower frequency from tick-level
>> data. Coupled with something like na.locf you can make yourself some
>> high frequency 'regular' data from 'irregular'
>> Regular and irregular of course depend on what you are looking at
>> (weekends missing in daily data can still be 'regular').
>> I'd be interested in hearing thoughts from those who actually tread in
>> the high-freq domain...
>> A wealth of information can be found here:
>> On Thu, May 21, 2009 at 10:04 AM, Michael <comtech.usa at gmail.com> wrote:
>>> Hi all,
>>> I am wondering if there are some special toolboxes to handle high
>>> frequency data in R?
>>> I have some high frequency data and was wondering what meaningful
>>> experiments can I run on these high frequency data.
>>> Not sure if normal (low frequency) financial time series textbook data
>>> analysis tools will work for high frequency data?
>>> Let's say I run a correlation between two stocks using the high
>>> frequency data, or run an ARMA model on one stock, will the results be
>>> Could anybody point me some classroom types of treatment or lab
>>> tutorial type of document which show me what meaningful
>>> experiments/tests I can run on high frequency data?
>>> Thanks a lot!
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>> Jeffrey Ryan
>> jeffrey.ryan at insightalgo.com
>> ia: insight algorithmics
>> R-SIG-Finance at stat.math.ethz.ch mailing list
>> -- Subscriber-posting only.
>> -- If you want to post, subscribe first.
jeffrey.ryan at insightalgo.com
ia: insight algorithmics
More information about the R-SIG-Finance