# [R-SIG-Finance] high frequency data analysis in R

Michael comtech.usa at gmail.com
Thu May 21 22:45:34 CEST 2009

```I want to see what statistical experiments I can run on my data.
The very first thing came to my mind was the "correlation" ...
But I am not sure if the concept of usual "correlation" is directly
applicable after I resampled the data into regularly spaced data. But
then again another question is what's a good resampling period? Maybe
"correlation" is sensitive to the resampling period...

On Thu, May 21, 2009 at 1:37 PM,  <markleeds at verizon.net> wrote:
> in that case, it begs the question of why you want to regularly space your
> data ?
> all the info is there so why reduce the amount of it by regularly spacing ?
>
>
>
>
> On May 21, 2009, Michael <comtech.usa at gmail.com> wrote:
>
> In fact, I have the whole jump processes of best bid, and best ask, at
> a continuous level (in the sense of time-stamped arrival data), and
> also the jump process of the last trade price, at a continuous level
> (in the sense of time-stamped arrival data). Any more thoughts?
>
>
> On Thu, May 21, 2009 at 9:51 AM, Hae Kyung Im <hakyim at gmail.com> wrote:
>> Relating the approach that turns irregular data into regular one,
>> I guess it's a complex question and how you approach it will depend on
>> the specific problem.
>>
>> With your method, you would assume that the price is equal to the last
>> traded price or something like that. If there is no trade for some
>> time, would it make sense to say that the price is the last traded
>> price? If you wanted to actually buy/sell at that price, it's not
>> obvious that you'll be able to do so.
>>
>> Also, if you only look at the time series of instantaneous prices, you
>> would be losing a lot of information about what happened in between
>> the time points. It makes more sense to aggregate and keep, for
>> example, open, high, low and close. Or some statistics on the
>> distribution of the prices between the endpoints.
>>
>> If what you need to calculate is correlations, then I would look at
>> the papers that Liviu suggested. It seems that synchronicity is
>> critical. I heard there is an extension of TSRV to correlations.
>>
>> If you only need to look at univariate time series, you may be able to
>> get away with your method more easily. It may not be statistically
>> efficient but may give you a good enough answer in some cases.
>>
>>
>> HTH
>> Haky
>>
>>
>>
>> On Thu, May 21, 2009 at 10:38 AM, Michael <comtech.usa at gmail.com> wrote:
>>> My data are price change arrivals, irregularly spaced. But when there
>>> is no price change, the price stays constant. Therefore, in fact, at
>>> any time instant, you give me a time, I can give you the price at that
>>> very instant of time. So irregularly spaced data can be easily sampled
>>> to be regularly spaced data.
>>> What do you think of this approach?
>>>
>>> On Thu, May 21, 2009 at 8:21 AM, Michael <comtech.usa at gmail.com> wrote:
>>>> Thanks Jeff.
>>>>
>>>> By high frequency I mean really the tick data. For example, during
>>>> peak time, the arrival of price events could be at about hundreds to
>>>> thousands within one second, irregularly spaced.
>>>>
>>>> I've heard that forcing irregularly spaced data into regularly spaced
>>>> data(e.g. through interpolation) will lose information. How's that so?
>>>>
>>>> Thanks!
>>>>
>>>> On Thu, May 21, 2009 at 8:15 AM, Jeff Ryan <jeff.a.ryan at gmail.com>
>>>> wrote:
>>>>> Not my domain, but you will more than likely have to aggregate to some
>>>>> sort of regular/homogenous type of series for most traditional tools
>>>>> to work.
>>>>>
>>>>> xts has to.period to aggregate up to a lower frequency from tick-level
>>>>> data. Coupled with something like na.locf you can make yourself some
>>>>> high frequency 'regular' data from 'irregular'
>>>>>
>>>>> Regular and irregular of course depend on what you are looking at
>>>>> (weekends missing in daily data can still be 'regular').
>>>>>
>>>>> I'd be interested in hearing thoughts from those who actually tread in
>>>>> the high-freq domain...
>>>>>
>>>>> A wealth of information can be found here:
>>>>>
>>>>>  http://www.olsen.ch/publications/working-papers/
>>>>>
>>>>> Jeff
>>>>>
>>>>> On Thu, May 21, 2009 at 10:04 AM, Michael <comtech.usa at gmail.com>
>>>>> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I am wondering if there are some special toolboxes to handle high
>>>>>> frequency data in R?
>>>>>>
>>>>>> I have some high frequency data and was wondering what meaningful
>>>>>> experiments can I run on these high frequency data.
>>>>>>
>>>>>> Not sure if normal (low frequency) financial time series textbook data
>>>>>> analysis tools will work for high frequency data?
>>>>>>
>>>>>> Let's say I run a correlation between two stocks using the high
>>>>>> frequency data, or run an ARMA model on one stock, will the results be
>>>>>> meaningful?
>>>>>>
>>>>>> Could anybody point me some classroom types of treatment or lab
>>>>>> tutorial type of document which show me what meaningful
>>>>>> experiments/tests I can run on high frequency data?
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>>>>> -- Subscriber-posting only.
>>>>>> -- If you want to post, subscribe first.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jeffrey Ryan
>>>>> jeffrey.ryan at insightalgo.com
>>>>>
>>>>> ia: insight algorithmics
>>>>> www.insightalgo.com
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> R-SIG-Finance at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>>> -- Subscriber-posting only.
>>> -- If you want to post, subscribe first.
>>>
>>
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>

```