[R] Out-of-sample prediction with VAR
Pfaff, Bernhard Dr.
Bernhard_Pfaff at fra.invesco.com
Mon Feb 8 09:58:55 CET 2010
Hello Peter,
by judging from your code snippet:
|> ts_Y <- ts(log_residuals[1:104]); # detrended sales data
|> ts_XGG <- ts(salesmodeldata$gtrends_global[1:104]);
|> ts_XGL <- ts(salesmodeldata$gtrends_local[1:104]);
|> training_matrix <- data.frame(ts_Y, ts_XGG, ts_XGL);
|>
|> ### Try VAR(3)
|> var_model <- VAR (y=training_matrix, p=3,
|> type="both", season=NULL,
|> exogen=NULL, lag.max=NULL);
you have one endogenous variable, namely ts_Y, and two exgoenous
variables, namely ts_XGG and ts_XGL. Now, how you have set up
'training_matrix' all three variables are treated as endogenous (see
?VAR for more information).
What you really want to estimate and predict is a **univariate** AR(3)
model with two exogenous variables. For these type of models VAR() is
not the right function, but you could rather use lm() and/or dynlm().
The forcasts should then be computed recursively.
Best,
Bernhard
|> -----Original Message-----
|> From: r-help-bounces at r-project.org
|> [mailto:r-help-bounces at r-project.org] On Behalf Of
|> peter at linelink.nl
|> Sent: Sunday, February 07, 2010 11:37 PM
|> To: r-help at r-project.org
|> Subject: [R] Out-of-sample prediction with VAR
|>
|> Good day,
|>
|> I'm using a VAR model to forecast sales with some extra
|> variables (google
|> trends data). I have divided my dataset into a trainingset
|> (weekly sales +
|> vars in 2006 and 2007) and a holdout set (2008).
|> It is unclear to me how I should predict the out-of-sample
|> data, because
|> using the predict() function in the vars package seems to
|> estimate my
|> google trends vars as well. However, I want to forecast
|> the sales figures,
|> with knowledge of the actual google trends data.
|>
|> My questions:
|> 1. How should I do this? I currently extract the linear
|> model generated by
|> the VAR(3) function to predict the holdout set, but that seems
|> inappropriate?
|> 2. In case that I am doing it right, how is it possible that a
|> automatically fitted model with more variables actually
|> performs less good
|> (in terms of MAPE)? Shouldn't it at least predict just as
|> well as the
|> simple AR(3) by finding that the extra variables have no
|> added value?
|>
|> My code:
|>
|> ts_Y <- ts(log_residuals[1:104]); # detrended sales data
|> ts_XGG <- ts(salesmodeldata$gtrends_global[1:104]);
|> ts_XGL <- ts(salesmodeldata$gtrends_local[1:104]);
|> training_matrix <- data.frame(ts_Y, ts_XGG, ts_XGL);
|>
|> ### Try VAR(3)
|> var_model <- VAR (y=training_matrix, p=3,
|> type="both", season=NULL,
|> exogen=NULL, lag.max=NULL);
|>
|> ## Out of sample forecasting
|> var.lm = lm(var_model$varresult$ts_Y); # the
|> generated LM
|>
|> ts_Y <- ts(log_residuals[105:155]);
|> ts_XGG <- ts(salesmodeldata$gtrends_global[105:155]);
|> ts_XGL <- ts(salesmodeldata$gtrends_local[105:155]);
|>
|> # Notice how I manually create the lagged
|> values to be used in the
|> Linear Model
|> holdout_matrix <-
|> na.omit(data.frame(ts.union(ts_Y, ts_XGG, ts_XGL,
|> ts_Y.l1 = lag(ts_Y,-1), ts_Y.l2 = lag(ts_Y,-2), ts_Y.l3 =
|> lag(ts_Y,-3),
|> ts_XGG.l1 = lag(ts_XGG,-1), ts_XGG.l2 = lag(ts_XGG,-2), ts_XGG.l3 =
|> lag(ts_XGG,-3), ts_XGL.l1 = lag(ts_XGL,-1), ts_XGL.l2 =
|> lag(ts_XGL,-2),
|> ts_XGL.l3 = lag(ts_XGL,-3), const=1, trend=0.0001514194 )));
|>
|> var.predict = predict(object=var_model,
|> n.ahead=52, dumvar=holdout_matrix);
|>
|> ## Assess accuracy
|> calc_mape (holdout_matrix$ts_Y, var.predict,
|> islog=T, print=T)
|>
|> Some context:
|> For my Master's thesis I'm using R to test the predictive
|> power of web
|> metrics (such as google trends data & pageviews) in sales
|> forecasting. To
|> properly assess this, I employ a simple AR model (for time
|> series without
|> the extra variables) and a VAR model for the predictions
|> with the extra
|> variables. I also develop a random forest with, and
|> without the buzz
|> variables and see if MAPE improves.
|>
|> Many thanks in advance!
|>
|> ______________________________________________
|> R-help at r-project.org mailing list
|> https://stat.ethz.ch/mailman/listinfo/r-help
|> PLEASE do read the posting guide
|> http://www.R-project.org/posting-guide.html
|> and provide commented, minimal, self-contained, reproducible code.
|>
*****************************************************************
Confidentiality Note: The information contained in this ...{{dropped:10}}
More information about the R-help
mailing list