[R] Autocorrelation in linear models
Arni Magnusson
arnima at hafro.is
Thu Mar 17 00:48:11 CET 2011
I have been reading about autocorrelation in linear models over the last
couple of days, and I have to say the more I read, the more confused I
get. Beyond confusion lies enlightenment, so I'm tempted to ask R-Help for
guidance.
Most authors are mainly worried about autocorrelation in the residuals,
but some authors are also worried about autocorrelation within Y and
within X vectors before any model is fitted. Would you test for
autocorrelation both in the data and in the residuals?
If we limit our worries to the residuals, it looks like we have a variety
of tests for lag=1:
stats::cor.test(residuals(fm)[-n], residuals(fm)[-1])
stats::Box.test(residuals(fm))
lmtest::dwtest(fm, alternative="two.sided")
lmtest::bgtest(fm, type="F")
In my model, a simple lm(y~x1+x2) with n=20 annual measurements, I have
significant _positive_ autocorrelation within Y and within both X vectors,
but _negative_ autocorrelation in the residuals. The residual
autocorrelation is not quite significant, with the p-values
0.070
0.064
0.125
0.077
from the tests above. I seem to remember some authors saying that the
Durbin-Watson test has less power than some alternative tests, as
reflected here. The difference in p-values is substantial, so choosing
which test to use could in many cases make a big difference for the
subsequent analysis and conclusions. Most of them (cor.test, Box.test,
bgtest) can also test lags>1. Which test would you recommend? I imagine
the basic cor.test is somehow inappropriate for this; the other tests
wouldn't have been invented otherwise, right?
The car::dwt(fm) has p-values fluctuating by a factor of 2, unless I run a
very long simulation, which results in a p-value similar to
lmtest::dwtest, at least in my case.
Finally, one question regarding remedies. If there was significant
_positive_ autocorrelation in the residuals, some authors suggest
remedying this by deflating the df (fewer effective df in the data) and
redo the t-tests of the regression coefficients, rejecting fewer null
hypotheses. Does that mean if the residuals are _negatively_ correlated
then I should inflate the df (more effective df in the data) and reject
more null hypotheses?
That's four question marks. I'd greatly appreciate guidance on any of
them.
Thanks in advance,
Arni
More information about the R-help
mailing list