[R-sig-eco] Autoregressive modelling (Gavin Simpson)

Wed Nov 24 18:06:04 CET 2010

On Wed, 2010-11-24 at 12:42 -0400, Highland Statistics wrote:
> > Yes, indeed. I've been battling with some palaeoceanographic data of a
> > PhD student in our group. Sometimes these models work well and we can
> > partition into trend + autocorrelated noise. Other times, you wait a
> > week for the model to converge (yep lots of data!) and it either
> > interpolates the points and has an effectively zero correlation
> > parameter or it fits an almost flat, straight line through the data and
> > a large, well bounded away from zero, correlation parameter. (I'm using
> > gamm() models.)
> >

Thanks Alain,

> Gavin,
> 
> In my experience...cross-validation in the GAMM is causing the main 
> trouble here. It seems to work better to go for the option that S-PLUS 
> used to do...set the degrees of freedom to 4...and stick to that (unless 
> the residuals show patterns).

Simon Wood communicated to me some time back that using method = "ML" or
"REML" gave much better behaving p-values etc in the fitted model, so
I've long given up using GCV/UBRE for model fitting. But ML and REML are
slower...

Setting some reasonably low DF for the spline is working quite well, and
using the adaptive splines in recent mgcv packages allows me to spend my
wiggliness allowance where it is needed, not spread out everywhere.

> For large data sets, filling in the correlation matrix may be time 
> consuming. Imagine calculating
> 
> rho^{Time difference}
> 
> if {Time difference} is very large..and you ahve lots of observations. 

I think this is where the code is getting bogged down - I have
irregularly spaced data, so have been using a corCAR() structure. There
are 7000+ observations. inverting that matrix at each iteration of the
algorithm is asking to be punished with a long wait.

> This is the AR-1 correlation structure. Better set it to 0 for relative 
> large values of the time difference (or spatial difference). I guess 
> these numerical optimisation routines will have a hard time filling in 
> a  big correlation matrix with values of 0.000000***, and then 
> optimising it over rho.  There is a nice example in Wood (2006) in which 
> he limits the AR1 correlation to 12 months in a longer time series 
> (Cairo temperature). This sometimes speeds up the calculation process 
> considerable.

In that example, Simon has a logical variable that he can use to
restrict the correlations within. This isn't quite so simple with the
palaeo data I am currently looking at.

Although not tried yet - busy elsewhere - I wouldn't have thought
estimating the exponential variogram on the residuals would take too
long and then filling a matrix with the correct rho^{tdiff} should be
quite quick; at least I'd only have to do it once at the end of the
model fitting.

Being subjective in fitting models is something I am coming to terms
with!

Thanks for your thoughts,

G

> 
> Alain
> 
> 
> >> It is a bit dodgy I guess..well..pragmatic. Note....I would only do
> >> this with these AR and ARMA
> >> type structures. And the same for these spatial correlation
> >> structures. Things like a random intercept
> >> (and the associated correlation structure) is must easier to work
> >> with.
> > For the palaeo data I've been working with, I think I've given up with
> > trying to fit models with correlation structures directly. Instead I
> > looking at fitting the model I want (with some constraint on how
> > wiggly/smooth my fitted trend should be - i.e. I limit the df on the
> > spline used for my trend), then estimate a covariance matrix from the
> > residuals to use like a sandwich estimator and plug that in rather than
> > assumed covariance matrix. Well, at least before I switch back to trying
> > to get my head round DLMs...
> >
> > Cheers Alain,
> >
> > All the best,
> >
> > G
> >
> >> Alain Zuur
> >> **
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> I might modify this a bit (maybe Zuur et al already suggest this?), by
> >> thinking about what model I want to fit, what is plausible, and fit
> >> that. Then check the residuals for lack of independence. If residuals
> >> are dependent, fit a model that allows for autocorrelation in residuals
> >> directly by specifying a simple process for the covariance matrix (AR or
> >> ARMA say), such as via GLS.
> >>
> >> Alternatively, we can make use of sandwich estimators for the covariance
> >> matrix. Recall that it is the standard errors of the coefficients that
> >> are too small. These standard errors come from the model covariance
> >> matrix. This covariance matrix is essentially a plug-in (several of the
> >> assumptions of OLS essentially arise because it assumes a particular
> >> form for the covariance matrix) and we can estimate a different
> >> covariance matrix that accounts for correlations between residuals, by
> >> estimating the parameters of an AR or ARMA process fitted to the model
> >> residuals, and use those parameters to form a new covariance matrix,
> >> from which we can get standard errors.
> >>
> >> This latter approach is very flexible because it can be applied to lots
> >> of modelling situations, but you have to do all the heavy lifting as, in
> >> many cases, you will have to estimate the model for the residuals
> >> yourself, and then compute all the standard errors and tests on
> >> coefficients yourself.
> >>
> >> [1] Zuur et al 2009 Mixed Effects Models and Extensions in Ecology with
> >> R. Springer.
> >>
> >> An alternative book I very much recommend, but is not yet quite
> >> published is Chandler and Scott (2011) Statistical methods for trend
> >> detection and analysis in the environmental sciences. John Wiley and
> >> Sons. This book covers what I discuss above and a whole lot more.
> >>
> >> HTH
> >>
> >> G
> >>
> >> _______________________________________________
> >> R-sig-ecology mailing list
> >> R-sig-ecology at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%