[R-sig-eco] Autoregressive modelling (Gavin Simpson)

Wed Nov 24 13:55:22 CET 2010

------------------------------

Message: 2
Date: Tue, 23 Nov 2010 13:23:30 +0000
From: Gavin Simpson<gavin.simpson at ucl.ac.uk>
To: Saskia Otto<Saskia.Otto at uni-hamburg.de>
Cc: r-sig-ecology at r-project.org
Subject: Re: [R-sig-eco] Autoregressive modelling
Message-ID:<1290518610.11915.30.camel at prometheus.geog.ucl.ac.uk>
Content-Type: text/plain; charset="UTF-8"

On Sat, 2010-11-20 at 11:36 +0100, Saskia Otto wrote:
>  >  Dear Frank,
>
>  >  thanks for your suggestions! I still have some questions:
>
>  I though that autocorrelated / autoregressive residuals inflate p-
>  values, thus insignificant variables become significant, not the other
>  way around?
>  Did I get it right?

Kind of - it is the standard errors of the estimated coefficients that
are underestimated in the presence of dependence (e.g autocorrelation)
in the residuals.

>  You suggest to use in the second step a GLS or
>  mixed model (where I included AR1 correlation structure), where only
>  those variables are included that have been significant in the first
>  step?
>  I tried both ways: your 2-step approach as well as the GLS/mixed model
>  as in your second step but with a full model (including all
>  covariates) and then do the model selection. The results where the
>  same. So why is it not ok to use a full GLS/mixed model followed by
>  the model selection in the first place?

Whether you detrend or not (before fitting a model) is an important
consideration - statistician colleagues of mine have told me *not* to
detrend as you are throwing away information (amongst other reasons).
instead, model the trend explicitly. Of course, you have to the posit a
valid reason for the relationship between the response and your
covariates to guard against spurious regressions - where you get a
significant covariate because both it and the response have a trend but
there is no mechanistic reason to presume that the covariate is
controlling the response.

>  I still do not understand the difference between an AR1 model where
>  other covariates are included as well (e.g. by using the arima()
>  function) and a model where I included an AR1 correlation structure
>  (by using e.g. gls() or lme() )

Zuur et al [1] suggest a different approach, along the lines of i)
fitting the full model, ii) the fit something for the autocorrelation in
the residuals of this full model, then iii) having included ii), refine
the fitted model by getting rid of insignificant covariates etc.

**
My experience with this is that if you include both a trend component and an
auto-regressive correlation structure on the residuals in the same model, AND you estimate them
together (at the same time), then they are going to fight with each other who is going to get
the information. Hence the suggestion to:
1. Fit a model without correlation
2. Get an impression of the strength of the correlation
3. Refit the model while keeping the autoregressive parameter(s) fixed.

It is a bit dodgy I guess..well..pragmatic. Note....I would only do this with these AR and ARMA
type structures. And the same for these spatial correlation structures. Things like a random intercept
(and the associated correlation structure) is must easier to work with.

Alain Zuur
**

I might modify this a bit (maybe Zuur et al already suggest this?), by
thinking about what model I want to fit, what is plausible, and fit
that. Then check the residuals for lack of independence. If residuals
are dependent, fit a model that allows for autocorrelation in residuals
directly by specifying a simple process for the covariance matrix (AR or
ARMA say), such as via GLS.

Alternatively, we can make use of sandwich estimators for the covariance
matrix. Recall that it is the standard errors of the coefficients that
are too small. These standard errors come from the model covariance
matrix. This covariance matrix is essentially a plug-in (several of the
assumptions of OLS essentially arise because it assumes a particular
form for the covariance matrix) and we can estimate a different
covariance matrix that accounts for correlations between residuals, by
estimating the parameters of an AR or ARMA process fitted to the model
residuals, and use those parameters to form a new covariance matrix,
from which we can get standard errors.

This latter approach is very flexible because it can be applied to lots
of modelling situations, but you have to do all the heavy lifting as, in
many cases, you will have to estimate the model for the residuals
yourself, and then compute all the standard errors and tests on
coefficients yourself.

[1] Zuur et al 2009 Mixed Effects Models and Extensions in Ecology with
R. Springer.

An alternative book I very much recommend, but is not yet quite
published is Chandler and Scott (2011) Statistical methods for trend
detection and analysis in the environmental sciences. John Wiley and
Sons. This book covers what I discuss above and a whole lot more.

HTH

G