[R] forecast: bias in sampling from seasonal ARIMA model?

Nicolas Chapados nicolas.chapados at gmail.com
Wed Jul 6 16:49:36 CEST 2011


[Reposting, since there was problems with encodings the first time around.]

Dear all,

I stumbled upon what appears to be a troublesome issue when sampling from an
ARIMA model (from Rob Hyndman's excellent 'forecast' package) that contains
a seasonal AR component.

Here's how to reproduce the issue.  (I'm using R 2.9.2 with forecast 2.19;
see sessionInfo() below).

First some data:

> x <- c(
 0.132475,  0.143119,  0.108104,  0.247291,  0.029510, -0.119591, -0.133313,
-0.098128,  0.192698,  0.110328,  0.163671, -0.004925, -0.239209, -0.055122,
-0.051121,  0.154108,  0.008665, -0.074702,  0.066534, -0.098728, -0.068668,
 0.150935, -0.022547,  0.028625,  0.107092, -0.065396, -0.253247, -0.115240,
-0.113535, -0.064191, -0.006032,  0.039233,  0.129013, -0.068462,  0.022398,
-0.052427, -0.005586,  0.011447, -0.022667, -0.120536, -0.234398, -0.164087,
-0.177160, -0.120624, -0.025104,  0.001144, -0.193424, -0.260674, -0.036976,
-0.009590, -0.004920,  0.130545,  0.120527,  0.041121, -0.123321,  0.023836,
-0.188418,  0.015807, -0.056012,  0.000496,  0.051806, -0.067574,  0.012775,
 0.244083,  0.148857,  0.013874,  0.235252,  0.151935,  0.036986,  0.134482,
-0.003359, -0.019422,  0.086195,  0.206569,  0.123565,  0.070835, -0.183189,
-0.046513,  0.071920, -0.038360,  0.135293,  0.054746, -0.280340,  0.110638,
 0.009729,  0.115541,  0.021397,  0.097835, -0.028434, -0.218416,  0.044552,
 0.442563,  0.084317,  0.044149,  0.201100,  0.076112, -0.134955,  0.023870,
 0.077111,  0.085490,  0.023154,  0.099757, -0.026509, -0.189839,  0.026614,
 0.184916, -0.007266,  0.081276,  0.312526,  0.051199, -0.104707, -0.004206,
 0.062440,  0.126385, -0.018100,  0.092513,  0.186459, -0.170184, -0.126168,
 0.122739,  0.097495,  0.008633, -0.034519,  0.187264, -0.153409,  0.009440,
 0.150561,  0.067744,  0.045129,  0.230831, -0.079700, -0.162694, -0.044251,
-0.007663,  0.048986,  0.065724,  0.159706,  0.040067, -0.059949,  0.024810,
-0.154852,  0.018080,  0.165935,  0.203050,  0.011035, -0.232585, -0.162248,
-0.104872, -0.062516, -0.089766,  0.100304,  0.142170, -0.144969, -0.032500,
-0.002131,  0.165890,  0.107629,  0.075752,  0.119003,  0.095955,  0.039842,
 0.081208,  0.348529,  0.145694, -0.210700,  0.384966, -0.054503,  0.293329,
 0.184295,  0.368986,  0.135270,  0.124917,  0.185286, -0.252088, -0.169708,
-0.010204,  0.021934,  0.003572,  0.180148,  0.075836, -0.232065, -0.127255,
-0.147122,  0.056163,  0.067004,  0.217810,  0.074513, -0.167389,  0.172578,
-0.148127,  0.057025,  0.042623,  0.094214,  0.047004, -0.345453, -0.265104,
-0.082897,  0.052705, -0.067002,  0.191941,  0.010989, -0.298567, -0.162841,
 0.043773,  0.185459,  0.126305,  0.383101,  0.092747, -0.368453, -0.325097,
 0.029564, -0.015390,  0.013807,  0.152062, -0.047015, -0.429245, -0.097742,
 0.104502, -0.007547, -0.000245,  0.062830,  0.030093, -0.381043, -0.267704,
-0.125930, -0.032264, -0.041657,  0.040073,  0.084431, -0.276316, -0.305253,
-0.019942,  0.045390,  0.046090,  0.145700,  0.069920, -0.210079,  0.050967,
 0.042283,  0.248840,  0.007883,  0.203171,  0.050722, -0.109773, -0.110301,
-0.095433,  0.071133,  0.023793,  0.192476,  0.057746)

First, a CORRECT model, containing a seasonal MA component but no seasonal
AR component.  After estimation, I forecast for 1 time-step, and I take the
mean of sampling 10000 times from the same model:

> my.arima1 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(0,0,2), period=7), include.mean=FALSE)
> forecast(my.arima1, 1)
    Point Forecast      Lo 80     Hi 80     Lo 95     Hi 95
251    -0.03143283 -0.1882245 0.1253589 -0.271225 0.2083594
> set.seed(1827) ; mean(sapply(seq_len(10000), function(i) as.numeric(simulate(my.arima1, 1)) ))
[1] -0.03258454

The results ("Point Forecast" versus the output of mean()) are identical to
some sampling error.

Now the INCORRECT model arises from adding one seasonal AR component:

> my.arima2 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(1,0,2), period=7), include.mean=FALSE)
> forecast(my.arima2, 1)
    Point Forecast     Lo 80       Hi 80      Lo 95      Hi 95
251     -0.1848579 -0.322421 -0.04729492 -0.3952424 0.02552655
> set.seed(1827) ; mean(sapply(seq_len(10000), function(i) as.numeric(simulate(my.arima2, 1)) ))
[1] -0.05416299

For the results are substantially different (-0.18 versus -0.05), and the
latter does not change much if I take a much bigger sample.

Did anybody encounter this in the past?  Is this a bug?

For reference, here are the results of sessionInfo():
> sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-pc-linux-gnu

locale:
C

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] KernSmooth_2.23-3  digest_0.4.2       forecast_2.19      fracdiff_1.3-2
 [5] tseries_0.10-12    zoo_1.4-0          quadprog_1.4-11    glmnet_1.7
 [9] Matrix_0.999375-30 biglm_0.7          DBI_0.2-4          inline_0.3.7
[13] XML_2.6-0          timeSeries_2110.86 timeDate_2110.87   RODBC_1.3-2
[17] reshape_0.8.3      plyr_0.1.9         MASS_7.2-48        nnet_7.2-48
[21] latticeExtra_0.5-1 RColorBrewer_1.0-2 lattice_0.17-25    gsubfn_0.5-7
[25] proto_0.3-8        multicore_0.1-3    data.table_1.4.1

loaded via a namespace (and not attached):
[1] grid_2.9.2  tools_2.9.2


    Many thanks for any help or pointers!
    + Nicolas Chapados



More information about the R-help mailing list