[R-sig-eco] Autoregressive modelling

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Nov 23 14:23:30 CET 2010


On Sat, 2010-11-20 at 11:36 +0100, Saskia Otto wrote:
> > Dear Frank,
> 
> > thanks for your suggestions! I still have some questions:
> 
> I though that autocorrelated / autoregressive residuals inflate p- 
> values, thus insignificant variables become significant, not the other  
> way around?
> Did I get it right?

Kind of - it is the standard errors of the estimated coefficients that
are underestimated in the presence of dependence (e.g autocorrelation)
in the residuals.

> You suggest to use in the second step a GLS or  
> mixed model (where I included AR1 correlation structure), where only  
> those variables are included that have been significant in the first  
> step?
> I tried both ways: your 2-step approach as well as the GLS/mixed model  
> as in your second step but with a full model (including all  
> covariates) and then do the model selection. The results where the  
> same. So why is it not ok to use a full GLS/mixed model followed by  
> the model selection in the first place?

Whether you detrend or not (before fitting a model) is an important
consideration - statistician colleagues of mine have told me *not* to
detrend as you are throwing away information (amongst other reasons).
instead, model the trend explicitly. Of course, you have to the posit a
valid reason for the relationship between the response and your
covariates to guard against spurious regressions - where you get a
significant covariate because both it and the response have a trend but
there is no mechanistic reason to presume that the covariate is
controlling the response.

> I still do not understand the difference between an AR1 model where  
> other covariates are included as well (e.g. by using the arima()  
> function) and a model where I included an AR1 correlation structure  
> (by using e.g. gls() or lme() )

Zuur et al [1] suggest a different approach, along the lines of i)
fitting the full model, ii) the fit something for the autocorrelation in
the residuals of this full model, then iii) having included ii), refine
the fitted model by getting rid of insignificant covariates etc.

I might modify this a bit (maybe Zuur et al already suggest this?), by
thinking about what model I want to fit, what is plausible, and fit
that. Then check the residuals for lack of independence. If residuals
are dependent, fit a model that allows for autocorrelation in residuals
directly by specifying a simple process for the covariance matrix (AR or
ARMA say), such as via GLS.

Alternatively, we can make use of sandwich estimators for the covariance
matrix. Recall that it is the standard errors of the coefficients that
are too small. These standard errors come from the model covariance
matrix. This covariance matrix is essentially a plug-in (several of the
assumptions of OLS essentially arise because it assumes a particular
form for the covariance matrix) and we can estimate a different
covariance matrix that accounts for correlations between residuals, by
estimating the parameters of an AR or ARMA process fitted to the model
residuals, and use those parameters to form a new covariance matrix,
from which we can get standard errors.

This latter approach is very flexible because it can be applied to lots
of modelling situations, but you have to do all the heavy lifting as, in
many cases, you will have to estimate the model for the residuals
yourself, and then compute all the standard errors and tests on
coefficients yourself.

[1] Zuur et al 2009 Mixed Effects Models and Extensions in Ecology with
R. Springer.

An alternative book I very much recommend, but is not yet quite
published is Chandler and Scott (2011) Statistical methods for trend
detection and analysis in the environmental sciences. John Wiley and
Sons. This book covers what I discuss above and a whole lot more.

HTH

G

> Is there any book where I can find out more about this matter?
> 
> Thanks again,
> Saskia
> >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Thu, 11 Nov 2010 16:26:21 +0200
> > From: Frank Berninger <frankberninger at gmail.com>
> > To: r-sig-ecology at r-project.org
> > Subject: [R-sig-eco] Autoregressive modelling
> > Message-ID: <4CDBFD0D.7030806 at gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >
> > Dear Saskia,
> >
> > 1) Autoregressive residuals (for stationary AR processes) do not bias
> > your estimates but will bias your confidence intervals. (Unfortunately
> > mostly making significant variables insignificant). If the AR model  
> > for
> > your residuals is not stationary you are in statistical trouble.
> >
> > 2) I think that there are different ways of proceeding. The most
> > conservative way would be to do a two stage process: (A) first you
> > identify your model (using detrended dependent and independent
> > variables). The variables that are significant are thereafter used in
> > the second stage. (B) You build the model using the variables found in
> > A. This can be done in using gls (nlme package) or mixed models. That
> > model the autocovariance structure.
> >
> > Some people do not do A but use a gls model directly. I think in the
> > case of strong trends in the data (both X and Y) A could be  
> > insightful.
> > On the other hand the identification might fail for variables that
> > increase evenly with a strong time trend without much variation in the
> > rate of increase.
> >
> > Frank
> >
> >
> >
> >> ----------------------------------------------------------------------
> >>
> >> Message: 1
> >> Date: Wed, 10 Nov 2010 13:56:33 +0100
> >> From: Saskia Otto<Saskia.Otto at uni-hamburg.de>
> >> To: r-sig-ecology at r-project.org
> >> Subject: [R-sig-eco] handling autocorrelation in regression data
> >> Message-ID:<4CDA9681.4080405 at uni-hamburg.de>
> >> Content-Type: text/plain
> >>
> >> Dear list members,
> >>
> >> to identify the drivers of a temporal trend in fish abundance, I  
> >> applied
> >> a linear regression including several covariates in the full model:
> >> Abun_t ~ b_0 + b_1 *cov1_t + b_2 *cov2_t + ... + e_t , where t
> >> represents the year.
> >>
> >> The problem is that the residuals of the full model show strong
> >> autocorrelation (order 1 autoregressive (AR(1) ) errors), thus  
> >> violating
> >> the independence assumptions.
> >> My question is now: what can I do alternatively?
> >> Since I want to model explicitly the trend, I do not want to  
> >> detrend the
> >> time series.
> >> Should I fit an autoregressive time series model where I include  
> >> other
> >> covariates as well:
> >> Abun_t ~ b_0 + b_1 *Abund_t-1 + _b_2 *cov1_t + b_3 *cov2_t +...+  
> >> e_t ?
> >> Can I do this using the lm() function:lm(Abun_t ~ Abund_t-1 +  
> >> cov1_t +
> >> cov2_t + ...)
> >> or do I have to use the function arima(): arima(Abund _t , order =
> >> c(1,0,0), xreg = cbind(cov1,cov2,...) )  ?
> >>
> >> What is the difference between an AR1 model (by using for instance  
> >> the
> >> arima function where I define only the AR order) and the gls()  
> >> function
> >> where I include an AR1 correlation structure:
> >> gls(Abun_t ~ cov1_t + cov2_t + ..., correlation = corAR1(form =~ 1 |
> >> Year) )  ?
> >>
> >> To me it seems quite similar, but I only came across literature that
> >> dealt with either one of these approaches.
> >>
> >> I would be glad if someone of you could help me out and get things
> >> clearer on this issue.
> >>
> >> Thanks a lot in advance!
> >> Saskia
> >>
> >>
> >
> >
> 
> 
> Saskia A. Otto
> 
> Mail: Saskia.Otto at uni-hamburg.de
> Phone: +49(0)40-42838 6648
> Hamburg University
> Institute for Hydrobiology and Fisheries Science
> Grosse Elbstrasse 133
> 22767 Hamburg
> Germany
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-sig-ecology mailing list