[R-sig-eco] Autoregressive modelling (Gavin Simpson)

Wed Nov 24 17:42:53 CET 2010

> Yes, indeed. I've been battling with some palaeoceanographic data of a
> PhD student in our group. Sometimes these models work well and we can
> partition into trend + autocorrelated noise. Other times, you wait a
> week for the model to converge (yep lots of data!) and it either
> interpolates the points and has an effectively zero correlation
> parameter or it fits an almost flat, straight line through the data and
> a large, well bounded away from zero, correlation parameter. (I'm using
> gamm() models.)
>

Gavin,

In my experience...cross-validation in the GAMM is causing the main 
trouble here. It seems to work better to go for the option that S-PLUS 
used to do...set the degrees of freedom to 4...and stick to that (unless 
the residuals show patterns).

For large data sets, filling in the correlation matrix may be time 
consuming. Imagine calculating

rho^{Time difference}

if {Time difference} is very large..and you ahve lots of observations. 
This is the AR-1 correlation structure. Better set it to 0 for relative 
large values of the time difference (or spatial difference). I guess 
these numerical optimisation routines will have a hard time filling in 
a  big correlation matrix with values of 0.000000***, and then 
optimising it over rho.  There is a nice example in Wood (2006) in which 
he limits the AR1 correlation to 12 months in a longer time series 
(Cairo temperature). This sometimes speeds up the calculation process 
considerable.

Alain

>> It is a bit dodgy I guess..well..pragmatic. Note....I would only do
>> this with these AR and ARMA
>> type structures. And the same for these spatial correlation
>> structures. Things like a random intercept
>> (and the associated correlation structure) is must easier to work
>> with.
> For the palaeo data I've been working with, I think I've given up with
> trying to fit models with correlation structures directly. Instead I
> looking at fitting the model I want (with some constraint on how
> wiggly/smooth my fitted trend should be - i.e. I limit the df on the
> spline used for my trend), then estimate a covariance matrix from the
> residuals to use like a sandwich estimator and plug that in rather than
> assumed covariance matrix. Well, at least before I switch back to trying
> to get my head round DLMs...
>
> Cheers Alain,
>
> All the best,
>
> G
>
>> Alain Zuur
>> **
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> I might modify this a bit (maybe Zuur et al already suggest this?), by
>> thinking about what model I want to fit, what is plausible, and fit
>> that. Then check the residuals for lack of independence. If residuals
>> are dependent, fit a model that allows for autocorrelation in residuals
>> directly by specifying a simple process for the covariance matrix (AR or
>> ARMA say), such as via GLS.
>>
>> Alternatively, we can make use of sandwich estimators for the covariance
>> matrix. Recall that it is the standard errors of the coefficients that
>> are too small. These standard errors come from the model covariance
>> matrix. This covariance matrix is essentially a plug-in (several of the
>> assumptions of OLS essentially arise because it assumes a particular
>> form for the covariance matrix) and we can estimate a different
>> covariance matrix that accounts for correlations between residuals, by
>> estimating the parameters of an AR or ARMA process fitted to the model
>> residuals, and use those parameters to form a new covariance matrix,
>> from which we can get standard errors.
>>
>> This latter approach is very flexible because it can be applied to lots
>> of modelling situations, but you have to do all the heavy lifting as, in
>> many cases, you will have to estimate the model for the residuals
>> yourself, and then compute all the standard errors and tests on
>> coefficients yourself.
>>
>> [1] Zuur et al 2009 Mixed Effects Models and Extensions in Ecology with
>> R. Springer.
>>
>> An alternative book I very much recommend, but is not yet quite
>> published is Chandler and Scott (2011) Statistical methods for trend
>> detection and analysis in the environmental sciences. John Wiley and
>> Sons. This book covers what I discuss above and a whole lot more.
>>
>> HTH
>>
>> G
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 

Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7

2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9

3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3

Other books: http://www.highstat.com/books.htm

Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highstat at highstat.com
URL: www.highstat.com
URL: www.brodgar.com