# [R-sig-ME] time series with autcorrelated errors - matrix not invertible

Ben Bolker bbolker at gmail.com
Sat Jul 30 16:22:20 CEST 2016

```
On 16-07-29 06:12 PM, Steven Orzack wrote:
>  I am analyzing time series of a statistic (coefficient of variation or
> CV) generated by a Monte Carlo simulation of population dynamics. A
> given simulation might generate thousands of such time series.
>
> For any given time series, the sequential estimates of CV are correlated
> with one another (because they are based on partially overlapping sets
> of the population numbers). Accordingly, I am using gls models with
> autocorrelated errors to analyze each time series.
>
> Although the stochastic process generating all sample paths is the same,
> any given sample path might consist of a time series of CV for which
> AR(0), AR(1), AR(2), etc. generates the smallest AIC value, as produced
> by ar() (see below).
>
> In practice, I have seen time series for which AR(0), AR(1), AR(2)
> generate the smallest AIC value and so always just using a specific
> order (e.g., AR(1)) does not seem appropriate. This is especially so
> because I am developing methods that would apply to time series for
> which the generating dynamics are not known.
>
> For any given sample path y, the code is
>
> #assess order of AR model using ar. use order with smallest AIC
> arorder <- ar(moving_statistics\$CV, order.max = 5)\$order
>
> #evaluate model with constant and time
> if (arorder < 1) {
> summary(int_model <- lm(CV ~ 1, data = moving_statistics))
> summary(time_model <- lm(CV ~ 1 + time, data = moving_statistics))
>
> #extract statistics for sample path y
> pvalue_cv\$pvalue[y] <- anova(int_model,time_model)\$Pr
> pvalue_cv\$slope_time[y] <- time_model\$coefficients
> } else {
> summary(int_model <- gls(CV ~ 1, data = moving_statistics, correlation =
> corARMA(p=arorder), method="ML"))
> summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
> correlation = corARMA(p=arorder), method="ML"))
> #extract statistics for sample path y
> pvalue_cv\$pvalue[y] <- anova(int_model,time_model)\$p
> pvalue_cv\$slope_time[y] <- time_model\$coefficients
> }
>
>
> When I run the simulation, there are some sample paths for which AR(2)
> has the lowest AIC and so this is the model called, as in
>
> summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
> correlation = corARMA(p=2), method="ML"))
>
>
> This call appears to always generate this error message
>
> Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
>   Coefficient matrix not invertible
>
> Here are my questions:
>
> 1. how does ar generate an AIC value for p =2 even though the
> coefficient matrix is not invertible and so the model fit fails?

gls and ar are using different algorithms.  Furthermore, I think ar()
is assuming a constant mean whereas your gls() has a linear effect of
time, so they're actually trying to fit different models.

> 2. Is there a way to actually fit such a model, say, by adjusting
> tolerances?

Don't know.  I would start by debugging my way through the gls code to
say where it is actually breaking.  I don't think I have much to say
other than what I already said in the SO thread you reference below.
>
> 3. a related thread
> (http://stackoverflow.com/questions/6706143/error-with-gls-function-in-nlme-package-in-r)
>
>
> indicates that there are too many parameters being estimated given the
> length of the time series.
>
> Is there a numerical criterion that can be queried before the gls call
> so that I can test for invertibility and fit, say, a lower order model
> (e.g., AR(1)), so as to avoid an error?

You could certainly wrap this attempt in a try() or tryCatch() clause
so that it wouldn't break your simulation run and you could then fall
back on AR(1) ...

>
> I cannot figure out from the code for gls what actually generates the
> error message. A pointer to the specific numerical criterion that
> generates this error would be very much appreciated.

Unpacking the nlme source code and searching for the error message
finds it on line 557 of src/corStruct.c:

F77_CALL(dqrdc2) (coef, &P, &P, &P,  &sqrt_eps, &i, qraux, pivot, w\
ork);
if (i < P)
error(_("Coefficient matrix not invertible" ));

https://svn.r-project.org/R/trunk/src/appl/dqrdc2.f

says that dqrdc2 is a Householder transformation, but you'd really have
to figure this out for yourself ...

the i parameter corresponds to a k argument inside the FORTRAN
function: "k contains the number of columns of x judged to be linearly
independent." so it looks like you end up with a rank-deficient matrix
at some point.
>
> In the stackoverflow thread, this is mentioned:
>
> The general rule of thumb is that you should have at least 10 times as
> many data points as parameters, and that's for standard fixed
> effect/regression parameters. (Generally variance structure parameters
> such as AR parameters are even a bit harder/require a bit more data than
> regression parameters to estimate.)
>
> Where in the literature is this rule discussed?

I got it from Frank Harrell's _Regression Modeling Strategies_ book.

> 4. If the only option is to fit a lower order model of error, say,
> AR(1), instead of AR(2), what kind of bias does this generate in the
> analysis?
>

Don't know.  Since you're running simulations, you could find out ...
>
>
> Many thanks in advance,
>
> S.
>

```