[R-sig-ME] time series with autcorrelated errors - matrix not invertible

Mon Aug 22 17:15:27 CEST 2016

   I have updated the code and have implemented tryCatch in order to 
avoid the error (noninvertible matrix), which halts the simulation. 
Thanks again to Ben Bolker for the suggestion. FWIW, the manual page 
documentation and examples for tryCatch are, ahem....., a bit opaque but 
in the end it is straightforward to code.

On the possibility that this will help others, here is what I wrote:

try_AIC <- function(ind) {
     out <- tryCatch(
     {
           AIClist[ind] <- summary(gls(CV ~ 1, data = data_try, 
correlation = corARMA(p=ind), method="ML"))$AIC

     },
         error=function(e) {
            AIClist[ind] = 0
         },
         warning=function(w) {
              return(NULL)
         },
         finally={
         }
     )
     return(out)
}

This fits the gls model for a given order (ind) and stores an AIC value 
for that fit. if the matrix cannot be inverted and the gls model cannot 
be fit, the return is AIC = 0 (AIC values for models that are fitted are 
negative).

In the main simulation, the code where the fitting occurs is

#scan AR models (0 - 4)
AIClist <- c(rep(0,5))
  AIClist[1] <- summary(gls(CV ~ 1, data = data_try, correlation = NULL, 
method="ML"))$AIC
for (autor in 1:4) {
   index <- autor + 1
   AIClist[index] <- try_AIC(autor)
   }#CLOSE LOOP FOR AIC
#determine which AR model has smallest AIC
   arorder <- match(min(AIClist), AIClist) - 1

arorder returns the order of the AR model with the minimum AIC.

Now that I have solved this, I have a more strategic question about the 
overall structure of the analysis. Pleas see the next email!

  > I am analyzing time series of a statistic (coefficient of variation or
  > CV) generated by a Monte Carlo simulation of population dynamics. A
  > given simulation might generate thousands of such time series.
  >
  > For any given time series, the sequential estimates of CV are correlated
  > with one another (because they are based on partially overlapping sets
  > of the population numbers). Accordingly, I am using gls models with
  > autocorrelated errors to analyze each time series.
  >
  > Although the stochastic process generating all sample paths is the same,
  > any given sample path might consist of a time series of CV for which
  > AR(0), AR(1), AR(2), etc. generates the smallest AIC value, as produced
  > by ar() (see below).
  >
  > In practice, I have seen time series for which AR(0), AR(1), AR(2)
  > generate the smallest AIC value and so always just using a specific
  > order (e.g., AR(1)) does not seem appropriate. This is especially so
  > because I am developing methods that would apply to time series for
  > which the generating dynamics are not known.
  >
  > For any given sample path y, the code is
  >
  > #assess order of AR model using ar. use order with smallest AIC
  > arorder <- ar(moving_statistics$CV, order.max = 5)$order
  >
  > #evaluate model with constant and time
  > if (arorder < 1) {
  > summary(int_model <- lm(CV ~ 1, data = moving_statistics))
  > summary(time_model <- lm(CV ~ 1 + time, data = moving_statistics))
  >
  > #extract statistics for sample path y
  > pvalue_cv$pvalue[y] <- anova(int_model,time_model)$Pr
  > pvalue_cv$slope_time[y] <- time_model$coefficients[2]
  > } else {
  > summary(int_model <- gls(CV ~ 1, data = moving_statistics, correlation =
  > corARMA(p=arorder), method="ML"))
  > summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
  > correlation = corARMA(p=arorder), method="ML"))
  > #extract statistics for sample path y
  > pvalue_cv$pvalue[y] <- anova(int_model,time_model)$p[2]
  > pvalue_cv$slope_time[y] <- time_model$coefficients[2]
  > }
  >
  >
  > When I run the simulation, there are some sample paths for which AR(2)
  > has the lowest AIC and so this is the model called, as in
  >
  > summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
  > correlation = corARMA(p=2), method="ML"))
  >
  >
  > This call appears to always generate this error message
  >
  > Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
  >   Coefficient matrix not invertible
  >
  > Here are my questions:
  >
  > 1. how does ar generate an AIC value for p =2 even though the
  > coefficient matrix is not invertible and so the model fit fails?

    gls and ar are using different algorithms.  Furthermore, I think ar()
is assuming a constant mean whereas your gls() has a linear effect of
time, so they're actually trying to fit different models.

  > 2. Is there a way to actually fit such a model, say, by adjusting
  > tolerances?

    Don't know.  I would start by debugging my way through the gls code to
say where it is actually breaking.  I don't think I have much to say
other than what I already said in the SO thread you reference below.
  >
  > 3. a related thread
  > 
(http://stackoverflow.com/questions/6706143/error-with-gls-function-in-nlme-package-in-r)
  >
  >
  > indicates that there are too many parameters being estimated given the
  > length of the time series.
  >
  > Is there a numerical criterion that can be queried before the gls call
  > so that I can test for invertibility and fit, say, a lower order model
  > (e.g., AR(1)), so as to avoid an error?

    You could certainly wrap this attempt in a try() or tryCatch() clause
so that it wouldn't break your simulation run and you could then fall
back on AR(1) ...

  >
  > I cannot figure out from the code for gls what actually generates the
  > error message. A pointer to the specific numerical criterion that
  > generates this error would be very much appreciated.

    Unpacking the nlme source code and searching for the error message
finds it on line 557 of src/corStruct.c:

     F77_CALL(dqrdc2) (coef, &P, &P, &P,  &sqrt_eps, &i, qraux, pivot, w\
ork);
              if (i < P)
                  error(_("Coefficient matrix not invertible" ));

https://svn.r-project.org/R/trunk/src/appl/dqrdc2.f

says that dqrdc2 is a Householder transformation, but you'd really have
to figure this out for yourself ...

    the i parameter corresponds to a k argument inside the FORTRAN
function: "k contains the number of columns of x judged to be linearly
independent." so it looks like you end up with a rank-deficient matrix
at some point.
  >
  > In the stackoverflow thread, this is mentioned:
  >
  > The general rule of thumb is that you should have at least 10 times as
  > many data points as parameters, and that's for standard fixed
  > effect/regression parameters. (Generally variance structure parameters
  > such as AR parameters are even a bit harder/require a bit more data than
  > regression parameters to estimate.)
  >
  > Where in the literature is this rule discussed?

    I got it from Frank Harrell's _Regression Modeling Strategies_ book.

  > 4. If the only option is to fit a lower order model of error, say,
  > AR(1), instead of AR(2), what kind of bias does this generate in the
  > analysis?
  >

     Don't know.  Since you're running simulations, you could find out ...
  >
  >
  > Many thanks in advance,
  >
  > S.
  >

-- 
Steven Orzack
Fresh Pond Research Institute
173 Harvey Street
Cambridge, MA 02140
617 864-4307

www.freshpond.org