[R-SIG-Finance] Calculating Hasbrouck's information share and Gonzalo-Granger weights on R

Tue Aug 31 01:14:11 CEST 2010

This is why the information share may not be a useful statistic. If you have
substantial residual correlation then the information share bounds will be
wide. The whole idea behind the information share is that it is supposed to
capture "who moves first" and is a informative statistic only when there is
a clear sequential market with uncorrelated residuals. My work with Bingchen
Yan also shows that the information share is not free of transitory effects
and should be interpreted with caution. 

-----Original Message-----
From: r-sig-finance-bounces at stat.math.ethz.ch
[mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of Nidhi Aggrawal
Sent: Saturday, August 28, 2010 12:56 PM
To: r-sig-finance at stat.math.ethz.ch
Subject: [R-SIG-Finance] Calculating Hasbrouck's information share and
Gonzalo-Granger weights on R

Folks,

I have been working on the question of `information share' which began
with Joel Hasbrouck and Gonzalo--Granger paper in 1995 and has been
the subject of interest in the following years with work most recently
by Yan & Zivot (2010).

A function that computes the Hasbrouck information share and the
Gonzalo/Granger common factor weights is at:

http://www.mayin.org/ajayshah/tmp/infoshare.R

Here's an example of running it with a simple futures/spot problem:

source("http://www.mayin.org/ajayshah/tmp/infoshare.R")
x <- as.matrix(read.csv(url("http://www.mayin.org/ajayshah/tmp/ex1.csv"),
header=TRUE))
IS(log(x))

The result of this computation is:

$original.ordering
[1] 0.0158933    0.9841067

$reversed.ordering
[1] 0.993828577  0.006171423

$gonzalo.granger
[1] 0.08537347    0.91462653

The above results appear agreeable wherein the upper and lower bound for
each market turn out to be quite close (i.e [0.006, 0.015] for futures and
[0.98, 0.99] for spot).

However, when I run the same code on a few other stocks, the difference
between the upper and lower bound of each market turns out to be very high.
A sample dataset for five stocks across a set of three days is at:

http://www.mayin.org/ajayshah/tmp/ex2.csv

The prices are at 5 second frequency.

As an illustration, when I run the code on a stock like BHARTIARTL
for July 1, 2009 as:

source("http://www.mayin.org/ajayshah/tmp/infoshare.R")
data <- read.csv(url("http://www.mayin.org/ajayshah/tmp/ex2.csv"), skip=1)
x <- data [1:4009, 3:4]
IS(log(x))

I get the following results:

$original.ordering
[1] 0.8194992 0.1805008

$reversed.ordering
[1] 0.6782834 0.3217166

$gonzalo.granger
[1] 0.5378149 0.4621851

which indicate the upper and lower bound for market1 are (0.32, 0.82) while
that of market2  are (0.18, 0.67). The bounds in this case turn out to be
too
wide. Similar is the case when I tried the code on other stocks mentioned in
ex2.csv

I know that several authors have described such phenomena in the literature,
but it makes me anxious. Does this make sense?  Or is there a mistake in
my code?

Is there an example of a data set or a simulated data set where the
answer is known, through which the correctness of the code can be
established?

More generally, I am a novice R programmer and it would be great if
you could look at my code and criticize what I have done.

Thanks,
Nidhi Aggarwal.

	[[alternative HTML version deleted]]

_______________________________________________
R-SIG-Finance at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.