[R] Out-of-sample prediction with VAR

Pfaff, Bernhard Dr. Bernhard_Pfaff at fra.invesco.com
Mon Feb 8 09:58:55 CET 2010


Hello Peter,

by judging from your code snippet:

 |>  	ts_Y <- ts(log_residuals[1:104]); # detrended sales data
 |>  	ts_XGG <- ts(salesmodeldata$gtrends_global[1:104]);
 |>  	ts_XGL <- ts(salesmodeldata$gtrends_local[1:104]);
 |>  	training_matrix <- data.frame(ts_Y, ts_XGG, ts_XGL);
 |>  
 |>  	### Try VAR(3)
 |>  		var_model <- VAR (y=training_matrix, p=3, 
 |>  type="both", season=NULL,
 |>  exogen=NULL,  lag.max=NULL);


you have one endogenous variable, namely ts_Y, and two exgoenous
variables, namely ts_XGG and ts_XGL. Now, how you have set up
'training_matrix' all three variables are treated as endogenous (see
?VAR for more information).
What you really want to estimate and predict is a **univariate** AR(3)
model with two exogenous variables. For these type of models VAR() is
not the right function, but you could rather use lm() and/or dynlm().
The forcasts should then be computed recursively.

Best,
Bernhard

 |>  -----Original Message-----
 |>  From: r-help-bounces at r-project.org 
 |>  [mailto:r-help-bounces at r-project.org] On Behalf Of 
 |>  peter at linelink.nl
 |>  Sent: Sunday, February 07, 2010 11:37 PM
 |>  To: r-help at r-project.org
 |>  Subject: [R] Out-of-sample prediction with VAR
 |>  
 |>  Good day,
 |>  
 |>  I'm using a VAR model to forecast sales with some extra 
 |>  variables (google
 |>  trends data). I have divided my dataset into a trainingset 
 |>  (weekly sales +
 |>  vars in 2006 and 2007) and a holdout set (2008).
 |>  It is unclear to me how I should predict the out-of-sample 
 |>  data, because
 |>  using the predict() function in the vars package seems to 
 |>  estimate my
 |>  google trends vars as well. However, I want to forecast 
 |>  the sales figures,
 |>  with knowledge of the actual google trends data.
 |>  
 |>  My questions:
 |>  1. How should I do this? I currently extract the linear 
 |>  model generated by
 |>  the VAR(3) function to predict the holdout set, but that seems
 |>  inappropriate?
 |>  2. In case that I am doing it right, how is it possible that a
 |>  automatically fitted model with more variables actually 
 |>  performs less good
 |>  (in terms of MAPE)? Shouldn't it at least predict just as 
 |>  well as the
 |>  simple AR(3) by finding that the extra variables have no 
 |>  added value?
 |>  
 |>  My code:
 |>  
 |>  	ts_Y <- ts(log_residuals[1:104]); # detrended sales data
 |>  	ts_XGG <- ts(salesmodeldata$gtrends_global[1:104]);
 |>  	ts_XGL <- ts(salesmodeldata$gtrends_local[1:104]);
 |>  	training_matrix <- data.frame(ts_Y, ts_XGG, ts_XGL);
 |>  
 |>  	### Try VAR(3)
 |>  		var_model <- VAR (y=training_matrix, p=3, 
 |>  type="both", season=NULL,
 |>  exogen=NULL,  lag.max=NULL);
 |>  
 |>  	## Out of sample forecasting
 |>  		var.lm = lm(var_model$varresult$ts_Y); # the 
 |>  generated LM
 |>  
 |>  		ts_Y <- ts(log_residuals[105:155]);
 |>  		ts_XGG <- ts(salesmodeldata$gtrends_global[105:155]);
 |>  		ts_XGL <- ts(salesmodeldata$gtrends_local[105:155]);
 |>  
 |>  		# Notice how I manually create the lagged 
 |>  values to be used in the
 |>  Linear Model
 |>  		holdout_matrix <- 
 |>  na.omit(data.frame(ts.union(ts_Y, ts_XGG, ts_XGL,
 |>  ts_Y.l1 = lag(ts_Y,-1), ts_Y.l2 = lag(ts_Y,-2), ts_Y.l3 = 
 |>  lag(ts_Y,-3),
 |>  ts_XGG.l1 = lag(ts_XGG,-1), ts_XGG.l2 = lag(ts_XGG,-2), ts_XGG.l3 =
 |>  lag(ts_XGG,-3), ts_XGL.l1 = lag(ts_XGL,-1), ts_XGL.l2 = 
 |>  lag(ts_XGL,-2),
 |>  ts_XGL.l3 = lag(ts_XGL,-3), const=1, trend=0.0001514194  )));
 |>  
 |>  		var.predict = predict(object=var_model, 
 |>  n.ahead=52, dumvar=holdout_matrix);
 |>  
 |>  	## Assess accuracy
 |>  		calc_mape (holdout_matrix$ts_Y, var.predict, 
 |>  islog=T, print=T)
 |>  
 |>  Some context:
 |>  For my Master's thesis I'm using R to test the predictive 
 |>  power of web
 |>  metrics (such as google trends data & pageviews) in sales 
 |>  forecasting. To
 |>  properly assess this, I employ a simple AR model (for time 
 |>  series without
 |>  the extra variables) and a VAR model for the predictions 
 |>  with the extra
 |>  variables. I also develop a random forest with, and 
 |>  without the buzz
 |>  variables and see if MAPE improves.
 |>  
 |>  Many thanks in advance!
 |>  
 |>  ______________________________________________
 |>  R-help at r-project.org mailing list
 |>  https://stat.ethz.ch/mailman/listinfo/r-help
 |>  PLEASE do read the posting guide 
 |>  http://www.R-project.org/posting-guide.html
 |>  and provide commented, minimal, self-contained, reproducible code.
 |>  
*****************************************************************
Confidentiality Note: The information contained in this ...{{dropped:10}}



More information about the R-help mailing list