<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Gero<br>
<br>
For me it looks like your regression is working on prices
(non-stationary integrated process) instead of returns (stationary
process). What you are observing is a spurious regression result, see
e.g., <a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Unit_root">http://en.wikipedia.org/wiki/Unit_root</a> under "Granger and Newbold
(1974) called such estimates spurious regression". Basically your
nearly perfect fit means that yesterdays close is a very good (if not
the best) predictor for today's close. Otherwise it would be very easy
to get rich...<br>
<br>
Best regards<br>
Adrian<br>
<br>
<br>
<blockquote type="cite">
<pre wrap="">Message: 8
Date: Thu, 15 Oct 2009 10:12:57 +0200
From: Gero Schwenk <a class="moz-txt-link-rfc2396E"
href="mailto:gero.schwenk@web.de"><gero.schwenk@web.de></a>
Subject: [R-SIG-Finance] Perfect out-of-sample-fit in a model
        containing a lagged dependent variable?
To: <a class="moz-txt-link-abbreviated"
href="mailto:r-sig-finance@stat.math.ethz.ch">r-sig-finance@stat.math.ethz.ch</a>
Message-ID: <a class="moz-txt-link-rfc2396E"
href="mailto:4AD6D989.8030709@web.de"><4AD6D989.8030709@web.de></a>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Hello there!
I'm new to quantitative finance and now experimenting with the various
tools. While playing with day-to-day predictions for the returns on
closing call of the DAX, I observed a really strange behavior of my
regression models - namely a R-Square of approx 1, which gets replicated
by a nearly perfect fit of out-of-sample predictions for a prediction
horizon of 57 trading days. (!) (Model details at the bottom of this mail.)
I think, the issue is connected to the lagged dependent variable
included as predictor in model. (However, the Durbin-Watson test
indicates no autocorrelation in the series, which itself implies
misspecification.) Excluding this term leads to models with an R-Square
of approx 0.4, which is not satisfying, but fits my expecations given
the ad-hoc-model. This is also replicated in terms of out-of-sample fit.
However- there remains the question of the nearly perfect out-of-sample
fit for the model including the AR1-term. Has anybody experienced
similar behavior? Answers would really be appreciated!
Kind regards,
Gero
#
Model detalis:
- Model setup: linear model: close.DAX ~ lag(close.DAX) +
lag(close.NYSE) + lag(close.HangSeng)
- Data is the respective returns (backshifted index data)
- Datasource: Yahoo-Finance
- Training-Dataset: 4000 days back - without the last 57 trading days
- Test-Dataset: the last 57 trading days
- Max. correlation of the model-variables: 0.59
- Augmented Dickey-Fuller-Test indicates stationarity
- Durbin-Watson-Test indicates no autocorrelation (! - contratry to
model structure)
- CumSum-Test indicates no structural change between training- and test-data
- In the fitted linear model, the AR1-term (lag(close.DAX)) dominates
the other parameters vastly and seemingly channels all the
intraday-correlation between the independent variables, R<sup
class="moz-txt-sup">2</sup> is 1. Out of
sample fit is close to perfect
- Residuals don't really look normally distributed but generally
generalized-pareto (extreme value) distributed, as mean residual life
plots indicate
- Bootstrapping the model yields lots of not accessible NA regression
coefficients, probably due to shirinking variance in the
bootstrap-sample. (But this is also an issue with the model excluding
the AR1-term.)
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Adrian Trapletti
Steinstrasse 9b
8610 Uster
Switzerland
Phone : +41 (0) 44 9945630
Mobile : +41 (0) 76 3705631
Email : <a class="moz-txt-link-abbreviated" href="mailto:a.trapletti@swissonline.ch">a.trapletti@swissonline.ch</a>
</pre>
</body>
</html>