[R-sig-ME] time series with autcorrelated errors - matrix not invertible

Steven Orzack orzack at freshpond.org
Sun Jul 31 18:27:28 CEST 2016

  Ben, Many thanks for the informative reply. I will work on this in the 
next few days and update. Once upon a time, I knew about Householder 
transformations. I will try to dust off that part of my brain.....

I am especially interested in the fact that ar and gls are using 
different algorithms. If you could expand on that, it would very 
helpful. I will also investigate.

any information and thoughts are most welcome.


On 16-07-29 06:12 PM, Steven Orzack wrote:

 > I am analyzing time series of a statistic (coefficient of variation or
 > CV) generated by a Monte Carlo simulation of population dynamics. A
 > given simulation might generate thousands of such time series.
 > For any given time series, the sequential estimates of CV are correlated
 > with one another (because they are based on partially overlapping sets
 > of the population numbers). Accordingly, I am using gls models with
 > autocorrelated errors to analyze each time series.
 > Although the stochastic process generating all sample paths is the same,
 > any given sample path might consist of a time series of CV for which
 > AR(0), AR(1), AR(2), etc. generates the smallest AIC value, as produced
 > by ar() (see below).
 > In practice, I have seen time series for which AR(0), AR(1), AR(2)
 > generate the smallest AIC value and so always just using a specific
 > order (e.g., AR(1)) does not seem appropriate. This is especially so
 > because I am developing methods that would apply to time series for
 > which the generating dynamics are not known.
 > For any given sample path y, the code is
 > #assess order of AR model using ar. use order with smallest AIC
 > arorder <- ar(moving_statistics$CV, order.max = 5)$order
 > #evaluate model with constant and time
 > if (arorder < 1) {
 > summary(int_model <- lm(CV ~ 1, data = moving_statistics))
 > summary(time_model <- lm(CV ~ 1 + time, data = moving_statistics))
 > #extract statistics for sample path y
 > pvalue_cv$pvalue[y] <- anova(int_model,time_model)$Pr
 > pvalue_cv$slope_time[y] <- time_model$coefficients[2]
 > } else {
 > summary(int_model <- gls(CV ~ 1, data = moving_statistics, correlation =
 > corARMA(p=arorder), method="ML"))
 > summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
 > correlation = corARMA(p=arorder), method="ML"))
 > #extract statistics for sample path y
 > pvalue_cv$pvalue[y] <- anova(int_model,time_model)$p[2]
 > pvalue_cv$slope_time[y] <- time_model$coefficients[2]
 > }
 > When I run the simulation, there are some sample paths for which AR(2)
 > has the lowest AIC and so this is the model called, as in
 > summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
 > correlation = corARMA(p=2), method="ML"))
 > This call appears to always generate this error message
 > Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
 >   Coefficient matrix not invertible
 > Here are my questions:
 > 1. how does ar generate an AIC value for p =2 even though the
 > coefficient matrix is not invertible and so the model fit fails?

   gls and ar are using different algorithms.  Furthermore, I think ar()
is assuming a constant mean whereas your gls() has a linear effect of
time, so they're actually trying to fit different models.

 > 2. Is there a way to actually fit such a model, say, by adjusting
 > tolerances?

   Don't know.  I would start by debugging my way through the gls code to
say where it is actually breaking.  I don't think I have much to say
other than what I already said in the SO thread you reference below.
 > 3. a related thread
 > indicates that there are too many parameters being estimated given the
 > length of the time series.
 > Is there a numerical criterion that can be queried before the gls call
 > so that I can test for invertibility and fit, say, a lower order model
 > (e.g., AR(1)), so as to avoid an error?

   You could certainly wrap this attempt in a try() or tryCatch() clause
so that it wouldn't break your simulation run and you could then fall
back on AR(1) ...

 > I cannot figure out from the code for gls what actually generates the
 > error message. A pointer to the specific numerical criterion that
 > generates this error would be very much appreciated.

   Unpacking the nlme source code and searching for the error message
finds it on line 557 of src/corStruct.c:

    F77_CALL(dqrdc2) (coef, &P, &P, &P,  &sqrt_eps, &i, qraux, pivot, w\
             if (i < P)
                 error(_("Coefficient matrix not invertible" ));


says that dqrdc2 is a Householder transformation, but you'd really have
to figure this out for yourself ...

   the i parameter corresponds to a k argument inside the FORTRAN
function: "k contains the number of columns of x judged to be linearly
independent." so it looks like you end up with a rank-deficient matrix
at some point.
 > In the stackoverflow thread, this is mentioned:
 > The general rule of thumb is that you should have at least 10 times as
 > many data points as parameters, and that's for standard fixed
 > effect/regression parameters. (Generally variance structure parameters
 > such as AR parameters are even a bit harder/require a bit more data than
 > regression parameters to estimate.)
 > Where in the literature is this rule discussed?

   I got it from Frank Harrell's _Regression Modeling Strategies_ book.

 > 4. If the only option is to fit a lower order model of error, say,
 > AR(1), instead of AR(2), what kind of bias does this generate in the
 > analysis?

    Don't know.  Since you're running simulations, you could find out ...
 > Many thanks in advance,
 > S.

Steven Orzack
Fresh Pond Research Institute
173 Harvey Street
Cambridge, MA 02140
617 864-4307


More information about the R-sig-mixed-models mailing list