[R-sig-ME] time series with autcorrelated errors - matrix not invertible

Sat Jul 30 16:22:20 CEST 2016

On 16-07-29 06:12 PM, Steven Orzack wrote:
>  I am analyzing time series of a statistic (coefficient of variation or
> CV) generated by a Monte Carlo simulation of population dynamics. A
> given simulation might generate thousands of such time series.
> 
> For any given time series, the sequential estimates of CV are correlated
> with one another (because they are based on partially overlapping sets
> of the population numbers). Accordingly, I am using gls models with
> autocorrelated errors to analyze each time series.
> 
> Although the stochastic process generating all sample paths is the same,
> any given sample path might consist of a time series of CV for which
> AR(0), AR(1), AR(2), etc. generates the smallest AIC value, as produced
> by ar() (see below).
> 
> In practice, I have seen time series for which AR(0), AR(1), AR(2)
> generate the smallest AIC value and so always just using a specific
> order (e.g., AR(1)) does not seem appropriate. This is especially so
> because I am developing methods that would apply to time series for
> which the generating dynamics are not known.
> 
> For any given sample path y, the code is
> 
> #assess order of AR model using ar. use order with smallest AIC
> arorder <- ar(moving_statistics$CV, order.max = 5)$order
> 
> #evaluate model with constant and time
> if (arorder < 1) {
> summary(int_model <- lm(CV ~ 1, data = moving_statistics))
> summary(time_model <- lm(CV ~ 1 + time, data = moving_statistics))
> 
> #extract statistics for sample path y
> pvalue_cv$pvalue[y] <- anova(int_model,time_model)$Pr
> pvalue_cv$slope_time[y] <- time_model$coefficients[2]
> } else {
> summary(int_model <- gls(CV ~ 1, data = moving_statistics, correlation =
> corARMA(p=arorder), method="ML"))
> summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
> correlation = corARMA(p=arorder), method="ML"))
> #extract statistics for sample path y
> pvalue_cv$pvalue[y] <- anova(int_model,time_model)$p[2]
> pvalue_cv$slope_time[y] <- time_model$coefficients[2]
> }
> 
> 
> When I run the simulation, there are some sample paths for which AR(2)
> has the lowest AIC and so this is the model called, as in
> 
> summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
> correlation = corARMA(p=2), method="ML"))
> 
> 
> This call appears to always generate this error message
> 
> Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
>   Coefficient matrix not invertible
> 
> Here are my questions:
> 
> 1. how does ar generate an AIC value for p =2 even though the
> coefficient matrix is not invertible and so the model fit fails?

  gls and ar are using different algorithms.  Furthermore, I think ar()
is assuming a constant mean whereas your gls() has a linear effect of
time, so they're actually trying to fit different models.

> 2. Is there a way to actually fit such a model, say, by adjusting
> tolerances?

  Don't know.  I would start by debugging my way through the gls code to
say where it is actually breaking.  I don't think I have much to say
other than what I already said in the SO thread you reference below.
> 
> 3. a related thread
> (http://stackoverflow.com/questions/6706143/error-with-gls-function-in-nlme-package-in-r)
> 
> 
> indicates that there are too many parameters being estimated given the
> length of the time series.
> 
> Is there a numerical criterion that can be queried before the gls call
> so that I can test for invertibility and fit, say, a lower order model
> (e.g., AR(1)), so as to avoid an error?

  You could certainly wrap this attempt in a try() or tryCatch() clause
so that it wouldn't break your simulation run and you could then fall
back on AR(1) ...

> 
> I cannot figure out from the code for gls what actually generates the
> error message. A pointer to the specific numerical criterion that
> generates this error would be very much appreciated.

  Unpacking the nlme source code and searching for the error message
finds it on line 557 of src/corStruct.c:

   F77_CALL(dqrdc2) (coef, &P, &P, &P,  &sqrt_eps, &i, qraux, pivot, w\
ork);
            if (i < P)
                error(_("Coefficient matrix not invertible" ));

https://svn.r-project.org/R/trunk/src/appl/dqrdc2.f

says that dqrdc2 is a Householder transformation, but you'd really have
to figure this out for yourself ...

  the i parameter corresponds to a k argument inside the FORTRAN
function: "k contains the number of columns of x judged to be linearly
independent." so it looks like you end up with a rank-deficient matrix
at some point.
> 
> In the stackoverflow thread, this is mentioned:
> 
> The general rule of thumb is that you should have at least 10 times as
> many data points as parameters, and that's for standard fixed
> effect/regression parameters. (Generally variance structure parameters
> such as AR parameters are even a bit harder/require a bit more data than
> regression parameters to estimate.)
> 
> Where in the literature is this rule discussed?

  I got it from Frank Harrell's _Regression Modeling Strategies_ book.

> 4. If the only option is to fit a lower order model of error, say,
> AR(1), instead of AR(2), what kind of bias does this generate in the
> analysis?
> 

   Don't know.  Since you're running simulations, you could find out ...
> 
> 
> Many thanks in advance,
> 
> S.
>