[R] Anyone Familiar with Using arima function with exogenous variables?

Mon Apr 21 16:14:17 CEST 2003

I've posted this before but have not been able to locate what I'm doing 
wrong. I cannot determine how the forecast is made using the estimated 
coefficients from a simple AR(2) model when there is an exogenous 
variable. Does anyone know what the problem is? The help file for arima 
doesn't show the model with any exogenous variables. I haven't been able 
to locate any documents covering this. I put together a simple example 
of an AR(2) model (no exogenous variables) and another example of an 
AR(2) with one exogenous variable.  In the first case it's easy to see 
how the forecasts are computed. When there is an exogenous variable, 
it's not clear (at leat to me) how the forecast is computed. I thought I 
understood how the model is written but apparently not.

Using the LakeHuron data, fit a simple AR(2) model:

 > data(LakeHuron)
 > ar.lh <- arima(LakeHuron, order = c(2,0,0))
 > ar.lh

Call:
arima(x = LakeHuron, order = c(2, 0, 0))

Coefficients:
        ar1      ar2  intercept
     1.0436  -0.2495   579.0473
s.e.  0.0983   0.1008     0.3319

sigma2 estimated as 0.4788:  log likelihood = -103.63,  aic = 215.27

Make a 1-step ahead forecast:

 > predict(ar.lh,1)[[1]]
Time Series:
Start = 1973
End = 1973
Frequency = 1
[1] 579.7896

Compute the forecast manually:

 > sum(ar.lh$coef*c(c(579.96,579.89)-ar.lh$coef[3],1))
[1] 579.7896

This just says that the forecast for the next period (after the end of 
the data) is 579.0473 + 1.0436*(579.96 - 579.0473) - 0.2495*(579.89 - 
579.0473). In other words: the forecast is the intercept plus the AR 
coefficients times the  (previous ts values minus the intercepts).

Now add an exogenous variable (in this case, the (year - 1920):

 > ar.lh <- arima(LakeHuron, order = c(2,0,0), xreg = time(LakeHuron)-1920)
 > ar.lh

Call:
arima(x = LakeHuron, order = c(2, 0, 0), xreg = time(LakeHuron) - 1920)

Coefficients:
         ar1      ar2  intercept  time(LakeHuron) - 1920
      1.0048  -0.2913   579.0993                 -0.0216
s.e.  0.0976   0.1004     0.2370                  0.0081

sigma2 estimated as 0.4566:  log likelihood = -101.2,  aic = 212.4

The prediction is:

 > predict(ar.lh,1,newxreg=53)[[1]]
Time Series:
Start = 1973
End = 1973
Frequency = 1
[1] 579.3972

Now try to manually forecast when the next time period is 53 (i.e., 1973 
- 1920):

 > sum(ar.lh$coef*c(c(579.96,579.89)-ar.lh$coef[3],1,53))
[1] 578.5907

What am I doing wrong? I've tried this with numerous examples and 
whenever there is an exogenous variable I cannot get the manual forecast 
to agree with predict. Is it not correct to just add (-0.0216 times 53) 
to the rest? I need to know how to write the model correctly. Obviously 
there is something I am overlooking. R's arima function and predict 
function work correctly - at least they agree with SAS for example so 
I'm not doing something right.

I would really appreciate some insight here.

Rick B.