[R-SIG-Finance] high frequency data analysis in R
Neil Tiffin
neilt at neiltiffin.com
Fri May 22 00:06:25 CEST 2009
What I have been interested in, just started looking at, but not found
is what predictive value is available from changes in the patterns in
the tick data. Now I am a complete neophyte so please do shoot me down.
Let me use an example, because I am not even sure of the right language.
If a stock is trading with roughly equal transactions at the bid and
at the ask price* with some pattern of volume, lets say mostly small
lots. There are a number of things that can happen at this point.
The transaction price can move up or down, the bid and ask price can
move up or down, the absolute volume traded at either the bid or ask
price* can go up or down, the bid ask spread can increase or decrease,
the block size of the trades can go up or down, the frequency of
trades can go up or down and probably something I missed. This seems
like a lot of information that goes away when you go to 1 minutes
open, close, high, and low data.
I have been trading for some time without using any statistics so I
understand some of the manipulations that occur in this whole scheme.
So the answer that this level of analysis does not mean anything is
not what I am looking for. I understand that any analysis of this
type is likely to be specific to market conditions. If not, then we
would have the magic bullet.
My question is can any of these changes predict what will happen short
term or even that changes are afloat? There is probably some back
room someplace where this is the secret formula never to be divulged.
But I have to ask, where has this been researched and what is publicly
available.
* keep in mind that the transactions can be between, over or under the
bid and ask price.
On May 21, 2009, at 3:45 PM, Michael wrote:
> I want to see what statistical experiments I can run on my data.
> The very first thing came to my mind was the "correlation" ...
> But I am not sure if the concept of usual "correlation" is directly
> applicable after I resampled the data into regularly spaced data. But
> then again another question is what's a good resampling period? Maybe
> "correlation" is sensitive to the resampling period...
> On Thu, May 21, 2009 at 1:37 PM, <markleeds at verizon.net> wrote:
>> in that case, it begs the question of why you want to regularly
>> space your
>> data ?
>> all the info is there so why reduce the amount of it by regularly
>> spacing ?
>> On May 21, 2009, Michael <comtech.usa at gmail.com> wrote:
>>
>> In fact, I have the whole jump processes of best bid, and best ask,
>> at
>> a continuous level (in the sense of time-stamped arrival data), and
>> also the jump process of the last trade price, at a continuous level
>> (in the sense of time-stamped arrival data). Any more thoughts?
>>
>>
>> On Thu, May 21, 2009 at 9:51 AM, Hae Kyung Im <hakyim at gmail.com>
>> wrote:
>>> Relating the approach that turns irregular data into regular one,
>>> I guess it's a complex question and how you approach it will
>>> depend on
>>> the specific problem.
>>> With your method, you would assume that the price is equal to the
>>> last
>>> traded price or something like that. If there is no trade for some
>>> time, would it make sense to say that the price is the last traded
>>> price? If you wanted to actually buy/sell at that price, it's not
>>> obvious that you'll be able to do so.
>>>
>>> Also, if you only look at the time series of instantaneous prices,
>>> you
>>> would be losing a lot of information about what happened in between
>>> the time points. It makes more sense to aggregate and keep, for
>>> example, open, high, low and close. Or some statistics on the
>>> distribution of the prices between the endpoints.
>>> If what you need to calculate is correlations, then I would look at
>>> the papers that Liviu suggested. It seems that synchronicity is
>>> critical. I heard there is an extension of TSRV to correlations.
>>>
>>> If you only need to look at univariate time series, you may be
>>> able to
>>> get away with your method more easily. It may not be statistically
>>> efficient but may give you a good enough answer in some cases.
>>>
>>> HTH
>>> Haky
>>> On Thu, May 21, 2009 at 10:38 AM, Michael <comtech.usa at gmail.com>
>>> wrote:
>>>> My data are price change arrivals, irregularly spaced. But when
>>>> there
>>>> is no price change, the price stays constant. Therefore, in fact,
>>>> at
>>>> any time instant, you give me a time, I can give you the price at
>>>> that
>>>> very instant of time. So irregularly spaced data can be easily
>>>> sampled
>>>> to be regularly spaced data.
>>>> What do you think of this approach?
>>>>
>>>> On Thu, May 21, 2009 at 8:21 AM, Michael <comtech.usa at gmail.com>
>>>> wrote:
>>>>> Thanks Jeff.
>>>>>
>>>>> By high frequency I mean really the tick data. For example, during
>>>>> peak time, the arrival of price events could be at about
>>>>> hundreds to
>>>>> thousands within one second, irregularly spaced.
>>>>>
>>>>> I've heard that forcing irregularly spaced data into regularly
>>>>> spaced
>>>>> data(e.g. through interpolation) will lose information. How's
>>>>> that so?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Thu, May 21, 2009 at 8:15 AM, Jeff Ryan <jeff.a.ryan at gmail.com>
>>>>> wrote:
>>>>>> Not my domain, but you will more than likely have to aggregate
>>>>>> to some
>>>>>> sort of regular/homogenous type of series for most traditional
>>>>>> tools
>>>>>> to work.
>>>>>>
>>>>>> xts has to.period to aggregate up to a lower frequency from
>>>>>> tick-level
>>>>>> data. Coupled with something like na.locf you can make yourself
>>>>>> some
>>>>>> high frequency 'regular' data from 'irregular'
>>>>>>
>>>>>> Regular and irregular of course depend on what you are looking at
>>>>>> (weekends missing in daily data can still be 'regular').
>>>>>>
>>>>>> I'd be interested in hearing thoughts from those who actually
>>>>>> tread in
>>>>>> the high-freq domain...
>>>>>>
>>>>>> A wealth of information can be found here:
>>>>>>
>>>>>> http://www.olsen.ch/publications/working-papers/
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>> On Thu, May 21, 2009 at 10:04 AM, Michael <comtech.usa at gmail.com>
>>>>>> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am wondering if there are some special toolboxes to handle
>>>>>>> high
>>>>>>> frequency data in R?
>>>>>>>
>>>>>>> I have some high frequency data and was wondering what
>>>>>>> meaningful
>>>>>>> experiments can I run on these high frequency data.
>>>>>>>
>>>>>>> Not sure if normal (low frequency) financial time series
>>>>>>> textbook data
>>>>>>> analysis tools will work for high frequency data?
>>>>>>>
>>>>>>> Let's say I run a correlation between two stocks using the high
>>>>>>> frequency data, or run an ARMA model on one stock, will the
>>>>>>> results be
>>>>>>> meaningful?
>>>>>>>
>>>>>>> Could anybody point me some classroom types of treatment or lab
>>>>>>> tutorial type of document which show me what meaningful
>>>>>>> experiments/tests I can run on high frequency data?
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>>
>>>>>>>
