[R] Anyone Familiar with Using arima function with exogenous variables?
Richard A. Bilonick
rab at nauticom.net
Mon Apr 21 16:14:17 CEST 2003
I've posted this before but have not been able to locate what I'm doing
wrong. I cannot determine how the forecast is made using the estimated
coefficients from a simple AR(2) model when there is an exogenous
variable. Does anyone know what the problem is? The help file for arima
doesn't show the model with any exogenous variables. I haven't been able
to locate any documents covering this. I put together a simple example
of an AR(2) model (no exogenous variables) and another example of an
AR(2) with one exogenous variable. In the first case it's easy to see
how the forecasts are computed. When there is an exogenous variable,
it's not clear (at leat to me) how the forecast is computed. I thought I
understood how the model is written but apparently not.
Using the LakeHuron data, fit a simple AR(2) model:
> data(LakeHuron)
> ar.lh <- arima(LakeHuron, order = c(2,0,0))
> ar.lh
Call:
arima(x = LakeHuron, order = c(2, 0, 0))
Coefficients:
ar1 ar2 intercept
1.0436 -0.2495 579.0473
s.e. 0.0983 0.1008 0.3319
sigma2 estimated as 0.4788: log likelihood = -103.63, aic = 215.27
Make a 1-step ahead forecast:
> predict(ar.lh,1)[[1]]
Time Series:
Start = 1973
End = 1973
Frequency = 1
[1] 579.7896
Compute the forecast manually:
> sum(ar.lh$coef*c(c(579.96,579.89)-ar.lh$coef[3],1))
[1] 579.7896
This just says that the forecast for the next period (after the end of
the data) is 579.0473 + 1.0436*(579.96 - 579.0473) - 0.2495*(579.89 -
579.0473). In other words: the forecast is the intercept plus the AR
coefficients times the (previous ts values minus the intercepts).
Now add an exogenous variable (in this case, the (year - 1920):
> ar.lh <- arima(LakeHuron, order = c(2,0,0), xreg = time(LakeHuron)-1920)
> ar.lh
Call:
arima(x = LakeHuron, order = c(2, 0, 0), xreg = time(LakeHuron) - 1920)
Coefficients:
ar1 ar2 intercept time(LakeHuron) - 1920
1.0048 -0.2913 579.0993 -0.0216
s.e. 0.0976 0.1004 0.2370 0.0081
sigma2 estimated as 0.4566: log likelihood = -101.2, aic = 212.4
The prediction is:
> predict(ar.lh,1,newxreg=53)[[1]]
Time Series:
Start = 1973
End = 1973
Frequency = 1
[1] 579.3972
Now try to manually forecast when the next time period is 53 (i.e., 1973
- 1920):
> sum(ar.lh$coef*c(c(579.96,579.89)-ar.lh$coef[3],1,53))
[1] 578.5907
What am I doing wrong? I've tried this with numerous examples and
whenever there is an exogenous variable I cannot get the manual forecast
to agree with predict. Is it not correct to just add (-0.0216 times 53)
to the rest? I need to know how to write the model correctly. Obviously
there is something I am overlooking. R's arima function and predict
function work correctly - at least they agree with SAS for example so
I'm not doing something right.
I would really appreciate some insight here.
Rick B.
More information about the R-help
mailing list