[R] question regarding arima function and predicted values

Wed Dec 12 10:53:02 CET 2007

>Good evening!
>
>I have a question regarding  forecast package and time series analysis.
>My syntax:
>
>x<-c(253, 252, 275, 275, 272, 254, 272, 252, 249, 300, 244, 
>258, 255, 285, 301, 278, 279, 304, 275, 276, 313, 292, 302, 
>322, 281, 298, 305, 295, 286, 327, 286, 270, 289, 293, 287, 
>267, 267, 288, 304, 273, 264, 254, 263, 265, 278)
>library(forecast)
>arima(x, order=c(1,1,2), seasonal=list(order=c(0,1,0), period=12))->l
>auto.arima(x)->k
>sd(l$resid)
>sd(k$resid)
>predict(l,n.ahead=1)
>predict(k,n.ahead=1)
>
>1. I understand that auto.arima will find the best time series 
>model choosing the smaller AIC, BIC and AICc from competing 
>models, but my model finds a smaller AIC than that of the 
>auto.arima. but the sd of the residuals for my model is 
>somehow bigger. 
>Why? Am I missing something? 
>Now the sd of the residuals for my model is somehow bigger, as 
>well as the se for the predicted value.  What model would you 
>choose between this two and why?   
>

Hello Eugen,

in a nutshell, I would not use neither of these models, but an ARMA(1,
0, 1) fitted to the log(x). Now, to your questions. If you use the
"trace = TRUE" argument in auto.arima(), you will see that your model
specification (l) is not tested. Why is this? Because, you supply a
vector and the frequency is 1 (i.e. frequency(x). If you now spot at the
code in auto.arima() it is clear that seasonal differences are not
tested for. 

Try this instead:

x <- ts(x, frequency = 12)
k <- auto.arima(x, D = 1, trace = TRUE)
logLik(k)
k$aic

Hence, this yields an ARIMA(1, 0, 1)(2, 1, 0)[12] as an "optimal" model
specification, which yields an even "better" result than your l model.
However, the results you report for l and k can be attributed to
over-fitting / over-differencing. If you examine your series more
closely:

plot(x)
acf(x)
pacf(x)
library(urca)
ur.kpss(x)
plot(ur.za(x))

i.e. the traditional approach for the identification stage in the
Box-Jenkins approach, you will detect, that
1) The series seems not to be stationary with respect to its variance,
but is not "trending".
2) ACF and PACF tapers off slowly and neither has a single spike nor
gives the PACF hindsight of seasonality.
3) Your series is stationary with a structural break.

Therefore, one can use the log-transform of x for variance stabilisation
and specify an ARMA(1, 0, 1)-model:

xl <- log(x)
m <- arima(xl, order=c(1, 0, 1))
m

Best,
Bernhard

>2. This question is more theoretical 
>
> m<-sample(c(10:20),10,replace=T)
> f<-sample(c(10:20),10,replace=T)
> t<-m+f
> s<-rbind(m,f,t)
> s
>
>Let's say I have a panel sample at disposal and consider m to 
>be the monthly average quantity of juice consumption for the  
>male part of the sample and f to be the monthly average 
>quantity of juice consumption for the  female part of the 
>sample, and t the average quantity of juice consumption for 
>the whole sample. For the mean of the whole sample i have a 
>confidence interval of say +/-2 each month (say I have a 
>sample of 2000 individuals). If I try to come up with a 
>confidence interval only for the male population (which in my 
>sample is  say 1000) it would certainly by bigger, because i 
>now have a male sample of 1000 for determining the mean 
>consumption for the whole male population. So my confidence 
>interval is bigger for mean male consumption than for the 
>whole sample (because N declines from 2000 to 1000). Now if I 
>tried to predict the the next month's consumption for both my 
>time series (male and whole sample) the prediction would not 
>"care" that when establishing the
> mean consumption i used first 2000 people and then 1000. Am I right?
>Imagine that each month (from 10 that I sampled above) has 
>such a confidence interval of +/-3. Now how would a future 
>prediction would incorporate this fact: that my mean 
>consumption is not measured via a Census, but using a sample, 
>and that the number is an estimation of the real consumption, 
>within a confidence interval?
>Is there a good reference text for this incorporation of the 
>confidence interval  of past values in determining  the future 
>values ? 
>
>Thank you and have a great day!
>
>
>
>
>       
>---------------------------------
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide 
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
*****************************************************************
Confidentiality Note: The information contained in this ...{{dropped:10}}