<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

</head>

<body bgcolor="#ffffff" text="#000000">

Hi Gero<br>

<br>

For me it looks like your regression is working on prices

(non-stationary integrated process) instead of returns (stationary

process). What you are observing is a spurious regression result, see

e.g., <a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Unit_root">http://en.wikipedia.org/wiki/Unit_root</a> under "Granger and Newbold

(1974) called such estimates spurious regression". Basically your

nearly perfect fit means that yesterdays close is a very good (if not

the best) predictor for today's close. Otherwise it would be very easy

to get rich...<br>

<br>

Best regards<br>

Adrian<br>

<br>

<br>

<blockquote type="cite">

  <pre wrap="">Message: 8

Date: Thu, 15 Oct 2009 10:12:57 +0200

From: Gero Schwenk <a class="moz-txt-link-rfc2396E"

 href="mailto:gero.schwenk@web.de">&lt;gero.schwenk@web.de&gt;</a>

Subject: [R-SIG-Finance] Perfect out-of-sample-fit in a model

        containing a lagged dependent variable?

To: <a class="moz-txt-link-abbreviated"

 href="mailto:r-sig-finance@stat.math.ethz.ch">r-sig-finance@stat.math.ethz.ch</a>

Message-ID: <a class="moz-txt-link-rfc2396E"

 href="mailto:4AD6D989.8030709@web.de">&lt;4AD6D989.8030709@web.de&gt;</a>

Content-Type: text/plain; charset=ISO-8859-15; format=flowed

Hello there!

I'm new to quantitative finance and now experimenting with the various 

tools. While playing with day-to-day predictions for the returns on 

closing call of the DAX, I observed a really strange behavior of my 

regression models - namely a R-Square of approx 1, which gets replicated 

by a nearly perfect fit of out-of-sample predictions for a prediction 

horizon of 57 trading days. (!) (Model details at the bottom of this mail.)

I think, the issue is connected to the lagged dependent variable 

included as predictor in model. (However, the Durbin-Watson test 

indicates no autocorrelation in the series, which itself implies 

misspecification.) Excluding this term leads to models with an R-Square 

of approx 0.4, which is not satisfying, but fits my expecations given 

the ad-hoc-model. This is also replicated in terms of out-of-sample fit.

However- there remains the question of the nearly perfect out-of-sample 

fit for the model including the AR1-term. Has anybody experienced 

similar behavior? Answers would really be appreciated!

Kind regards,

Gero

#

Model detalis:

- Model setup: linear model:  close.DAX ~ lag(close.DAX) + 

lag(close.NYSE) + lag(close.HangSeng)

- Data is the respective returns (backshifted index data)

- Datasource: Yahoo-Finance

- Training-Dataset: 4000 days back - without the last 57 trading days

- Test-Dataset: the last 57 trading days

- Max. correlation of the model-variables: 0.59

- Augmented Dickey-Fuller-Test indicates stationarity

- Durbin-Watson-Test indicates no autocorrelation (! - contratry to 

model structure)

- CumSum-Test indicates no structural change between training- and test-data

- In the fitted linear model, the AR1-term (lag(close.DAX)) dominates 

the other parameters vastly and seemingly channels all the 

intraday-correlation between the independent variables, R<sup

 class="moz-txt-sup">2</sup> is 1. Out of 

sample fit is close to perfect

- Residuals don't really look normally distributed but generally 

generalized-pareto (extreme value) distributed, as mean residual life 

plots indicate

- Bootstrapping the model yields lots of not accessible NA regression 

coefficients, probably due to shirinking variance in the 

bootstrap-sample. (But this is also an issue with the model excluding 

the AR1-term.)

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

Adrian Trapletti

Steinstrasse 9b

8610 Uster

Switzerland

Phone : +41 (0) 44 9945630

Mobile : +41 (0) 76 3705631

Email : <a class="moz-txt-link-abbreviated" href="mailto:a.trapletti@swissonline.ch">a.trapletti@swissonline.ch</a>

</pre>

</body>

</html>