[R] Non-normal residuals.
tomreilly
tomreilly at autobox.com
Thu Nov 12 18:02:33 CET 2009
Kevin,
Kudos to you for asking a question that most do not....
I have attached an analysis of your residuals for "10 inch" called
10inchres.zip. I have also attached our analysis as "10inches.zip". I have
posted some reports for you and added some commentary to help you understand
this all fully.
The conclusion is the your model/methodology is not capturing the pattern in
the data properly. Worse yet it is actually creating or "injecting"
structure into the errors. In turn, the forecast that comes out of a
model/approach will be doomed.
I have copied ACF/PACF from the enclosed report "details.htm" here. It shows
that there is a "blip" at lag 3. This is may be evidence of something wrong.
Either a model that is overzealous or a model that has not captured the
structure. Most people aren't aware that bad modeling can create issues.
Analysis for Variable Y 10inplates-RESIDUALS
LAG ACF STND. T- CHI-SQUARE & PACF STND. T-
VALUE ERROR RATIO PROBABILITY VALUE ERROR RATIO
1 .037 .154 .24 .1 .8059 .037 .154 .24
2 -.022 .155 -.14 .1 .9597 -.023 .154 -.15
3 -.383 .155 -2.48 7.0 .0711 -.382 .154 -2.47
4 -.174 .176 -.99 8.5 .0750 -.175 .154 -1.13
5 .148 .180 .82 9.6 .0877 .164 .154 1.06
6 -.001 .183 -.01 9.6 .1429 -.179 .154 -1.16
7 -.006 .183 -.03 9.6 .2128 -.176 .154 -1.14
8 -.009 .183 -.05 9.6 .2944 .113 .154 .73
9 -.011 .183 -.06 9.6 .3834 -.025 .154 -.16
10 -.035 .183 -.19 9.7 .4694 -.222 .154 -1.44
11 -.053 .183 -.29 9.8 .5448 -.021 .154 -.13
12 .036 .183 .20 9.9 .6229 .118 .154 .76
13 .013 .183 .07 9.9 .6995 -.157 .154 -1.02
14 .080 .183 .43 10.3 .7362 -.017 .154 -.11
15 -.132 .184 -.72 11.5 .7132 -.050 .154 -.33
16 -.109 .186 -.59 12.4 .7165 -.192 .154 -1.25
17 -.029 .188 -.16 12.5 .7717 -.073 .154 -.47
18 -.018 .188 -.09 12.5 .8214 -.084 .154 -.55
19 .157 .188 .84 14.5 .7556 -.027 .154 -.18
20 .040 .191 .21 14.6 .7984 -.017 .154 -.11
21 .030 .191 .16 14.7 .8384 -.032 .154 -.21
22 -.005 .192 -.03 14.7 .8753 -.018 .154 -.12
23 .008 .192 .04 14.7 .9053 .082 .154 .53
24 .046 .192 .24 14.9 .9232 .039 .154 .25
If you refer to stat.htm in the zip file you will see the model I pasted
here. You will see that there are two "Seasonal Pulse" Interventions
Identified starting 12/2007 and 1/2008. This indicates that this seasonal
effect is being missed in your model. Also, note the two "level shift"
Interventions identified at (or around) 5/08 and 4/09 indicating residuals
that are clustered on one side of the negative or positive sign. There is
also an Autoregressive factor with a lag of 3 (see Box-Jenkins textbook for
more on ARIMA modeling). There are a few one-time or "pulse" interventions
which reflect large or small (ie 3/09) values that are not being adjusted
for.
FORECASTING WITH FINAL MODEL
MODEL COMPONENT LAG COEFF STANDARD P T
# (BOP) ERROR VALUE VALUE
1CONSTANT .154 .804E-01 .0653 1.91
2Autoregressive-Factor # 1 3 -.711 .141 .0000 -5.04
INPUT SERIES X1 I~P00035 2009/ 3 PULSE
3Omega (input) -Factor # 2 0 3.24 .320 .0000 10.13
INPUT SERIES X2 I~S00021 2008/ 1 SEASP
4Omega (input) -Factor # 3 0 3.36 .353 .0000 9.53
INPUT SERIES X3 I~L00036 2009/ 4 LEVEL
5Omega (input) -Factor # 4 0 -.888 .159 .0000 -5.58
INPUT SERIES X4 I~L00025 2008/ 5 LEVEL
6Omega (input) -Factor # 5 0 .287 .110 .0143 2.60
INPUT SERIES X5 I~P00036 2009/ 4 PULSE
7Omega (input) -Factor # 6 0 -2.71 .373 .0000 -7.27
INPUT SERIES X6 I~P00031 2008/ 11 PULSE
8Omega (input) -Factor # 7 0 -1.44 .338 .0002 -4.26
INPUT SERIES X7 I~S00020 2007/ 12 SEASP
9Omega (input) -Factor # 8 0 -1.21 .224 .0000 -5.40
INPUT SERIES X8 I~P00037 2009/ 5 PULSE
10Omega (input) -Factor # 9 0 -.838 .334 .0177 -2.51
INPUT SERIES X9 I~P00021 2008/ 1 PULSE
11Omega (input) -Factor # 10 0 -2.18 .452 .0000 -4.83
INPUT SERIES X 10 I~P00025 2008/ 5 PULSE
12Omega (input) -Factor # 11 0 .648 .313 .0470 2.07
Here is our model for 10 inch plates using the historical data. Autobox
identified a seasonal AR1 and AR12 model. Note that the again the seasonal
pulse found at November and December appears in the model along with two
interventions.
MODEL COMPONENT LAG COEFF STANDARD P T
# (BOP) ERROR VALUE VALUE
1CONSTANT 119. 72.9 .1113 1.63
2Autoregressive-Factor # 1 1 .941 .557E-01 .0000 16.90
3Autoregressive-Factor # 2 12 -.738 .220 .0019 -3.35
INPUT SERIES X1 I~P00035 2009/ 3 PULSE
4Omega (input) -Factor # 3 0 .110E+04 109. .0000 10.12
INPUT SERIES X2 I~S00020 2007/ 12 SEASP
5Omega (input) -Factor # 4 0 -645. 71.6 .0000 -9.01
INPUT SERIES X3 I~S00019 2007/ 11 SEASP
6Omega (input) -Factor # 5 0 -342. 64.4 .0000 -5.31
INPUT SERIES X4 I~P00033 2009/ 1 PULSE
7Omega (input) -Factor # 6 0 297. 122. .0197 2.44
With all of this said, you have some very difficult time series. Using
simple and free methods may not give you what you are looking for. Autobox
is completely automatic like R, but has the ability to recognize and adjust
for 4 types of interventions. If you don’t adjust the model for these
interventions then the "fit" would be off as we have seen with this case
study.
Contact me or go to our website to learn more about us.
Tom Reilly
Vice President of Sales
Automatic Forecasting Systems
215-675-0652
http://www.autobox.com
tomreilly at autobox.com
skype:tomreilly at autobox.com
Here is Kevin's original post......
This is kind of a general question about methodology more than anything. But
I was looking for fome advice. I have fit a time-series model and feel
pretty confident that I have taken this model (exponential smoothing) as far
as it will go. In other words looking at the data and the fitted curves I
think it is as close as I can get. But when I plot the residuals and form a
qqplot it seems that the residuals are not "normal". From the QQ-plot there
is some factor that is influencing the series that cannot be attributed to
"noramal random" fluxuation. I can run 'tsdiag' to determine basically
whether the residuals are normall and random, but what if they are not? What
would be the next set of 'R' commands that I might run to find this
influence?
Any suggestions?
Kevin
rkevinburton wrote:
>
> Hello,
>
> I asked a question about what the most likely process to follow if after a
> time-series fit is performed the residuals are found to be non-normal. One
> peron responded and offered to help if I supplied a sample data set.
> Unfortunately now that I have a sample I have lost the emai addressl. If
> you are that person or have some ideas please email me back at
> rkevinburton at charter.net.
>
> Thank you.
>
> Kevin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
http://old.nabble.com/file/p26322376/10inches.zip 10inches.zip
http://old.nabble.com/file/p26322376/10inchres.zip 10inchres.zip
--
View this message in context: http://old.nabble.com/Non-normal-residuals.-tp26083746p26322376.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list