# [R] Anyone Familiar with Using arima function with exogenous variables?

Richard A. Bilonick rab at nauticom.net
Mon Apr 21 16:14:17 CEST 2003

```I've posted this before but have not been able to locate what I'm doing
wrong. I cannot determine how the forecast is made using the estimated
coefficients from a simple AR(2) model when there is an exogenous
variable. Does anyone know what the problem is? The help file for arima
doesn't show the model with any exogenous variables. I haven't been able
to locate any documents covering this. I put together a simple example
of an AR(2) model (no exogenous variables) and another example of an
AR(2) with one exogenous variable.  In the first case it's easy to see
how the forecasts are computed. When there is an exogenous variable,
it's not clear (at leat to me) how the forecast is computed. I thought I
understood how the model is written but apparently not.

Using the LakeHuron data, fit a simple AR(2) model:

> data(LakeHuron)
> ar.lh <- arima(LakeHuron, order = c(2,0,0))
> ar.lh

Call:
arima(x = LakeHuron, order = c(2, 0, 0))

Coefficients:
ar1      ar2  intercept
1.0436  -0.2495   579.0473
s.e.  0.0983   0.1008     0.3319

sigma2 estimated as 0.4788:  log likelihood = -103.63,  aic = 215.27

> predict(ar.lh,1)[]
Time Series:
Start = 1973
End = 1973
Frequency = 1
 579.7896

Compute the forecast manually:

> sum(ar.lh\$coef*c(c(579.96,579.89)-ar.lh\$coef,1))
 579.7896

This just says that the forecast for the next period (after the end of
the data) is 579.0473 + 1.0436*(579.96 - 579.0473) - 0.2495*(579.89 -
579.0473). In other words: the forecast is the intercept plus the AR
coefficients times the  (previous ts values minus the intercepts).

Now add an exogenous variable (in this case, the (year - 1920):

> ar.lh <- arima(LakeHuron, order = c(2,0,0), xreg = time(LakeHuron)-1920)
> ar.lh

Call:
arima(x = LakeHuron, order = c(2, 0, 0), xreg = time(LakeHuron) - 1920)

Coefficients:
ar1      ar2  intercept  time(LakeHuron) - 1920
1.0048  -0.2913   579.0993                 -0.0216
s.e.  0.0976   0.1004     0.2370                  0.0081

sigma2 estimated as 0.4566:  log likelihood = -101.2,  aic = 212.4

The prediction is:

> predict(ar.lh,1,newxreg=53)[]
Time Series:
Start = 1973
End = 1973
Frequency = 1
 579.3972

Now try to manually forecast when the next time period is 53 (i.e., 1973
- 1920):

> sum(ar.lh\$coef*c(c(579.96,579.89)-ar.lh\$coef,1,53))
 578.5907

What am I doing wrong? I've tried this with numerous examples and
whenever there is an exogenous variable I cannot get the manual forecast
to agree with predict. Is it not correct to just add (-0.0216 times 53)
to the rest? I need to know how to write the model correctly. Obviously
there is something I am overlooking. R's arima function and predict
function work correctly - at least they agree with SAS for example so
I'm not doing something right.

I would really appreciate some insight here.

Rick B.

```