[R-SIG-Finance] Perfect out-of-sample-fit in a model containing a lagged dependent variable?
Gero Schwenk
gero.schwenk at web.de
Thu Oct 15 10:12:57 CEST 2009
Hello there!
I'm new to quantitative finance and now experimenting with the various
tools. While playing with day-to-day predictions for the returns on
closing call of the DAX, I observed a really strange behavior of my
regression models - namely a R-Square of approx 1, which gets replicated
by a nearly perfect fit of out-of-sample predictions for a prediction
horizon of 57 trading days. (!) (Model details at the bottom of this mail.)
I think, the issue is connected to the lagged dependent variable
included as predictor in model. (However, the Durbin-Watson test
indicates no autocorrelation in the series, which itself implies
misspecification.) Excluding this term leads to models with an R-Square
of approx 0.4, which is not satisfying, but fits my expecations given
the ad-hoc-model. This is also replicated in terms of out-of-sample fit.
However- there remains the question of the nearly perfect out-of-sample
fit for the model including the AR1-term. Has anybody experienced
similar behavior? Answers would really be appreciated!
Kind regards,
Gero
#
Model detalis:
- Model setup: linear model: close.DAX ~ lag(close.DAX) +
lag(close.NYSE) + lag(close.HangSeng)
- Data is the respective returns (backshifted index data)
- Datasource: Yahoo-Finance
- Training-Dataset: 4000 days back - without the last 57 trading days
- Test-Dataset: the last 57 trading days
- Max. correlation of the model-variables: 0.59
- Augmented Dickey-Fuller-Test indicates stationarity
- Durbin-Watson-Test indicates no autocorrelation (! - contratry to
model structure)
- CumSum-Test indicates no structural change between training- and test-data
- In the fitted linear model, the AR1-term (lag(close.DAX)) dominates
the other parameters vastly and seemingly channels all the
intraday-correlation between the independent variables, R^2 is 1. Out of
sample fit is close to perfect
- Residuals don't really look normally distributed but generally
generalized-pareto (extreme value) distributed, as mean residual life
plots indicate
- Bootstrapping the model yields lots of not accessible NA regression
coefficients, probably due to shirinking variance in the
bootstrap-sample. (But this is also an issue with the model excluding
the AR1-term.)
More information about the R-SIG-Finance
mailing list