[R-SIG-Finance] cointegration

Paul Teetor paulteetor at yahoo.com
Tue Oct 19 16:03:01 CEST 2010


Stephen,
 
It depends what you mean by "logic".
 
If you mean statistical logic, I'll defer to Eric Zivot and Sarbo who are far wiser than I am. I will note, however, that you are testing for a p-value of 0.05, so I expect 5% of your test results to be misleading. In other words, for every 20 pairs tested by your batch job, I expect one will be suspect.
 
"Spurious cointegration" is a serious problem. I suggest Googling that topic. You may be suprised what you learn. (The irony, of course, is that cointegration was supposed to cure "spurious correlation." Oh well.)
 
If you mean financial logic, I strongly suggest not blindly risking money on your statistical test. Some filtering is required. Look for trades that make sense.
 
For example, my software reports that the stocks of MSFT and GOOG form a mean-reverting pair. But I would not trade that spread: too much idiosyncratic risk. My software also reports that Corn futures and Soybean Oil futures form a mean-reverting pair. But I would not trade that spread because the economic connection between corn and bean oil is too weak.
 
Hope that helps.
 
Paul

  _____  

From: r-sig-finance-bounces at stat.math.ethz.ch [mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of Stephen Choularton
Sent: Monday, October 18, 2010 9:46 PM
To: r-sig-finance at stat.math.ethz.ch
Subject: [R-SIG-Finance] cointegration


Hi Folks

I'm using this to find cointegrated stocks on the AX.

library(xts)
library(quantmod)

# quickly re-source this file
s <- function() source('meanrev.R')

checkPairFromYahoo <- function(sym1, sym2, dateFilter='::')
{
  t.xts <- getCombined(sym1, sym2, dateFilter=dateFilter)

  cat("Date range is", format(start(t.xts)), "to", format(end(t.xts)), "\n")

  # Build linear model
  m <- buildLM(t.xts)

  # Note beta -- http://en.wikipedia.org/wiki/Beta_(finance)
  beta <- getBeta(m)
  cat("Assumed hedge ratio is", beta, "\n")

  # Build spread
  sprd <- buildSpread(t.xts, beta)

  # Test cointegration
  ht <- testCoint(sprd)
  cat("PP p-value is", as.double(ht$p.value), "\n")

  if (as.double(ht$p.value) < 0.05)
  {
    cat("###############################################################\n", sym1 ,":", sym2 ," is likely mean-reverting.\n", "###########################################################\n" )
  }
  else
  {
    #cat(sym1 ,":", sym2 ," is not mean-reverting.\n")
  }
}

getCombined <- function(sym1, sym2, dateFilter='::')
{
  # Grab historical data for both symbols
  one <- getSymbols(sym1, auto.assign=FALSE)
  two <- getSymbols(sym2, auto.assign=FALSE)

  # Give columns more usable names
  colnames(one) <- c('Open', 'High', 'Low', 'Close', 'Volume', 'Adjusted')
  colnames(two) <- c('Open', 'High', 'Low', 'Close', 'Volume', 'Adjusted')

  # Build combined object
  return(merge(one$Close, two$Close, all=FALSE)[dateFilter])
}

buildLM <- function(combined)
{
  return(lm(Close ~ Close.1 + 0, combined))
}

getBeta <- function(m)
{
  return(as.double(coef(m)[1]))
}

buildSpread <- function(combined, beta)
{
  return(combined$Close - beta*combined$Close.1)
}

testCoint <- function(sprd)
{
  return(PP.test(sprd, lshort = FALSE))
}

I run it on batches of stock-pairs and then have a look at those which are cointegrated.  Assuming my code is right (and anyone who thinks there is something wrong with it please let me know ;-)

Just wondered if anyone simply goes with the results, or if a test of logic is required.  I found, for example, that AGL ( a big gas company) was cointegrated with Bunnings Wharehouses (a hardware superstore chain).  Can't see the reason for that.  AMP (major insurer) cointegrates with AXA (another major insurer).  That makes sense and it cointegrates with  Westpac (major bank) still some logic but a bit thinner.  It also cointegrates with Fortescue Metals (big iron ore operation).  Not much logic there.  Anyway question is: do you get better results by using informed judgement on these things or just trust the figures?

Any comments most welcome.



Stephen Choularton Ph.D., FIoD

9999 2226
0413 545 182


for insurance go to www.netinsure.com.au
for markets go to www.organicfoodmarkets.com.au


On 19/10/2010 12:35 PM, Yihao Lu aeolus_lu wrote: 

I am doing rolling ADF test on some time series to check mean reversion. When I use short period rolling, I find the residue is not stationary at all. However, when I use horizon longer than 5 years, I find very significant stationary. On the other hand, I find the half life is only around 30 days.

Is there anyone who can give me some possible explanation or guide me to some reference? thanks



Best,

Yihao















________________________________

Date: Tue, 19 Oct 2010 09:03:55 +1100

From: stephen at organicfoodmarkets.com.au

To: r-sig-finance at stat.math.ethz.ch

CC: bjorn.skogtro at gmail.com

Subject: Re: [R-SIG-Finance] Ornstein-Uhlenbeck



Hi



I am still trying to sort this one out. Any comments from anyone would

be most welcome.



Stephen Choularton Ph.D., FIoD







On 14/10/2010 7:29 AM, Stephen Choularton wrote:

Thanks for this help.



Trying to make sense of it so I have added some notes to the code. I

have marked them #?#



Delighted if you can tell me if I am write or wrong, add any comments,

answers.





#?# This appears to be the function that is doing the 'Ornstein-Uhlenbeck

#?# process work' particularly via dcOU

#?# I have noted in several places that I am after:

#?# 'the half-life of the decay equals ln(2)/θ'

#?# 'The half-life is given as log(2)/mean-reversion speed.'

#?# and I see theta appearing at a number of points in the code.

#?# Can you tell me why 3 thetas viz theta1, theta2, theta3 and what they do?

#?# eg is one of these the theta I am after?



# ex3.01.R

OU.lik <- function(theta1, theta2, theta3){

n <- length(X)

dt <- deltat(X)

-sum(dcOU(X[2:n], dt, X[1:(n-1)], c(theta1,theta2,theta3), log=TRUE))

}



require(stats4)

require(sde)



#?# random numer generation seed

set.seed(123)



#?# creation of a data set

X <- sde.sim(model="OU", theta=c(3,1,2), N=1000, delta=1)

#?# If I Look at X its like this:

#?# Time Series:

#?# Start = 0

#?# End = 1000

#?# Frequency = 1

#?# [1] 1.00000000 etc

#?# What sort of data object is it and how would I coerce an object with one

#?# column from a read.csv into it?





mle(OU.lik, start=list(theta1=1, theta2=0.5, theta3=1),

method="L-BFGS-B", lower=c(-Inf,0,0)) -> fit

summary(fit)



#?# This gives:



#?# Maximum likelihood estimation



#?# Call:

#?# mle(minuslogl = OU.lik, start = list(theta1 = 1, theta2 = 0.5,

#?# theta3 = 1), method = "L-BFGS-B", lower = c(-Inf, 0, 0))



#?# Coefficients:

#?# Estimate Std. Error

#?# theta1 3.355322 0.28159504

#?# theta2 1.106107 0.09010627

#?# theta3 2.052815 0.07624441



#?# -2 log L: 3366.389



#?# What's this telling me?



# ex3.01.R (cont.)

prof <- profile(fit)

par(mfrow=c(1,3))

plot(prof)

par(mfrow=c(1,1))

vcov(fit)

confint(fit)



#?# This provides me with this output using 'fit' from before:



#?# > vcov(fit)

#?# theta1 theta2 theta3

#?# theta1 0.07929576 0.024620718 0.016634557

#?# theta2 0.02462072 0.008119141 0.005485549

#?# theta3 0.01663456 0.005485549 0.005813209

#?# > confint(fit)

#?# Profiling...

#?# 2.5 % 97.5 %

#?# theta1 2.8448980 3.960982

#?# theta2 0.9433338 1.300629

#?# theta3 1.9147136 2.216113



#?# and 'fit' is:



#?# Call:

#?# mle(minuslogl = OU.lik, start = list(theta1 = 1, theta2 = 0.5,

#?# theta3 = 1), method = "L-BFGS-B", lower = c(-Inf, 0, 0))



#?# Coefficients:

#?# theta1 theta2 theta3

#?# 3.355322 1.106107 2.052815



#?# plus some graphic output



#?# Again, what's this telling me.



#?# This looks like a further example?

# ex3.01.R (cont.)

set.seed(123)

X <- sde.sim(model="OU", theta=c(3,1,2), N=1000, delta=1e-3)

mle(OU.lik, start=list(theta1=1, theta2=0.5, theta3=1),

method="L-BFGS-B", lower=c(-Inf,0,0)) -> fit2

summary(fit2)









Please excuse the length of this email (and my lack of understanding)



Hope you can help and thanks.









Stephen Choularton Ph.D., FIoD





On 13/10/2010 2:41 AM, stefano iacus wrote:



just for completeness: OU process is gaussian and transitiion density is known in exact form. So maximum likelihood estimation works fine and I suggest to avoid GMM.



sde package contains exact transition density for this process (e.g. ?dcOU) which you can use to build the likelihood to pass to mle() function.



This example taken from the "inst" directory of the package sde. For the parametrization of the model see ?dcOU





# ex3.01.R

OU.lik <- function(theta1, theta2, theta3){

n <- length(X)

dt <- deltat(X)

-sum(dcOU(X[2:n], dt, X[1:(n-1)], c(theta1,theta2,theta3), log=TRUE))

}



require(stats4)

require(sde)

set.seed(123)

X <- sde.sim(model="OU", theta=c(3,1,2), N=1000, delta=1)

mle(OU.lik, start=list(theta1=1, theta2=0.5, theta3=1),

method="L-BFGS-B", lower=c(-Inf,0,0)) -> fit

summary(fit)



# ex3.01.R (cont.)

prof <- profile(fit)

par(mfrow=c(1,3))

plot(prof)

par(mfrow=c(1,1))

vcov(fit)

confint(fit)



# ex3.01.R (cont.)

set.seed(123)

X <- sde.sim(model="OU", theta=c(3,1,2), N=1000, delta=1e-3)

mle(OU.lik, start=list(theta1=1, theta2=0.5, theta3=1),

method="L-BFGS-B", lower=c(-Inf,0,0)) -> fit2

summary(fit2)





I hope this helps out



stefano



On 12 Oct 2010, at 12:33, Bjorn Skogtro wrote:







Hi Stephen,



You could take a look at



http://sitmo.com/doc/Calibrating_the_Ornstein-Uhlenbeck_model



for the linear regression method, or take a look at the package "sde" which

contains some examples using GMM (not for the Ornstein-Uhlenbeck process,

though, only the CIR).



The half-life is given as log(2)/mean-reversion speed.



Do keep an eye on the partition of the time-axis, e.g. what frequency you

are using (daily, yearly) for interpreting the half-life.



BR,

Bjørn

















------------------------------



Message: 2

Date: Tue, 12 Oct 2010 05:43:32 -0400

From: Sarbo

To: r-sig-finance at stat.math.ethz.ch

Subject: Re: [R-SIG-Finance] Ornstein-Uhlenbeck

Message-ID:

Content-Type: text/plain; charset="utf-8"



By half-life, do you mean the speed of mean-reversion?



If so, there's a bit of algebraic tomfoolery that's required to

discretise the equation and then fit the data to it. I don't have the

time right now to go into all the details but it's not hard- you can

parameterise the process using simple linear regression. If you need

help with that I'll try and get back to you tonight about it.



On Tue, 2010-10-12 at 13:47 +1100, Stephen Choularton wrote:







Hi



Wonder if anyone could point me how I use this method to discover the

half life of a mean reverting process.



I am looking into pair trading and the time it takes for a

cointegrated pair to revert to the norm.



--

Stephen Choularton Ph.D., FIoD



9999 2226

0413 545 182





for insurance go to www.netinsure.com.au

for markets go to www.organicfoodmarkets.com.au





_______________________________________________

R-SIG-Finance at stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-sig-finance

-- Subscriber-posting only. If you want to post, subscribe first.

-- Also note that this is not the r-help list where general R questions





should go.





-------------- next part --------------

An HTML attachment was scrubbed...

URL: <

https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20101012/26e32fc7/attachment-0001.html





-------------- next part --------------

A non-text attachment was scrubbed...

Name: CoS2010Winner.JPG

Type: image/jpeg

Size: 16091 bytes

Desc: not available

URL: <

https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20101012/26e32fc7/attachment-0001.jpe





------------------------------



_______________________________________________

R-SIG-Finance mailing list

R-SIG-Finance at stat.math.ethz.ch

https://stat.ethz.ch/mailman/listinfo/r-sig-finance





End of R-SIG-Finance Digest, Vol 77, Issue 8

********************************************





[[alternative HTML version deleted]]



_______________________________________________

R-SIG-Finance at stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-sig-finance

-- Subscriber-posting only. If you want to post, subscribe first.

-- Also note that this is not the r-help list where general R questions should go.







-----------------------------------

Stefano M. Iacus

Department of Economics,

Business and Statistics

University of Milan

Via Conservatorio, 7

I-20123 Milan - Italy

Ph.: +39 02 50321 461

Fax: +39 02 50321 505

http://www.economia.unimi.it/iacus

------------------------------------------------------------------------------------

Please don't send me Word or PowerPoint attachments if not

absolutely necessary. See:

http://www.gnu.org/philosophy/no-word-attachments.html



_______________________________________________

R-SIG-Finance at stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-sig-finance

-- Subscriber-posting only. If you want to post, subscribe first.

-- Also note that this is not the r-help list where general R questions should go.











No virus found in this incoming message.

Checked by AVG - www.avg.com















_______________________________________________

R-SIG-Finance at stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-sig-finance

-- Subscriber-posting only. If you want to post, subscribe first.

-- Also note that this is not the r-help list where general R questions should go.











No virus found in this incoming message.

Checked by AVG - www.avg.com









_______________________________________________

R-SIG-Finance at stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-sig-finance --

Subscriber-posting only. If you want to post, subscribe first. -- Also

note that this is not the r-help list where general R questions should

go.

 		 	   		  

_______________________________________________

R-SIG-Finance at stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-sig-finance

-- Subscriber-posting only. If you want to post, subscribe first.

-- Also note that this is not the r-help list where general R questions should go.




No virus found in this incoming message.

Checked by AVG - www.avg.com 





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20101019/08079fb9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 16091 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-finance/attachments/20101019/08079fb9/attachment.jpe>


More information about the R-SIG-Finance mailing list