[R-sig-ME] time series with autcorrelated errors - matrix not invertible
Steven Orzack
orzack at freshpond.org
Sun Jul 31 18:27:28 CEST 2016
Ben, Many thanks for the informative reply. I will work on this in the
next few days and update. Once upon a time, I knew about Householder
transformations. I will try to dust off that part of my brain.....
I am especially interested in the fact that ar and gls are using
different algorithms. If you could expand on that, it would very
helpful. I will also investigate.
any information and thoughts are most welcome.
S.
On 16-07-29 06:12 PM, Steven Orzack wrote:
> I am analyzing time series of a statistic (coefficient of variation or
> CV) generated by a Monte Carlo simulation of population dynamics. A
> given simulation might generate thousands of such time series.
>
> For any given time series, the sequential estimates of CV are correlated
> with one another (because they are based on partially overlapping sets
> of the population numbers). Accordingly, I am using gls models with
> autocorrelated errors to analyze each time series.
>
> Although the stochastic process generating all sample paths is the same,
> any given sample path might consist of a time series of CV for which
> AR(0), AR(1), AR(2), etc. generates the smallest AIC value, as produced
> by ar() (see below).
>
> In practice, I have seen time series for which AR(0), AR(1), AR(2)
> generate the smallest AIC value and so always just using a specific
> order (e.g., AR(1)) does not seem appropriate. This is especially so
> because I am developing methods that would apply to time series for
> which the generating dynamics are not known.
>
> For any given sample path y, the code is
>
> #assess order of AR model using ar. use order with smallest AIC
> arorder <- ar(moving_statistics$CV, order.max = 5)$order
>
> #evaluate model with constant and time
> if (arorder < 1) {
> summary(int_model <- lm(CV ~ 1, data = moving_statistics))
> summary(time_model <- lm(CV ~ 1 + time, data = moving_statistics))
>
> #extract statistics for sample path y
> pvalue_cv$pvalue[y] <- anova(int_model,time_model)$Pr
> pvalue_cv$slope_time[y] <- time_model$coefficients[2]
> } else {
> summary(int_model <- gls(CV ~ 1, data = moving_statistics, correlation =
> corARMA(p=arorder), method="ML"))
> summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
> correlation = corARMA(p=arorder), method="ML"))
> #extract statistics for sample path y
> pvalue_cv$pvalue[y] <- anova(int_model,time_model)$p[2]
> pvalue_cv$slope_time[y] <- time_model$coefficients[2]
> }
>
>
> When I run the simulation, there are some sample paths for which AR(2)
> has the lowest AIC and so this is the model called, as in
>
> summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
> correlation = corARMA(p=2), method="ML"))
>
>
> This call appears to always generate this error message
>
> Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
> Coefficient matrix not invertible
>
> Here are my questions:
>
> 1. how does ar generate an AIC value for p =2 even though the
> coefficient matrix is not invertible and so the model fit fails?
gls and ar are using different algorithms. Furthermore, I think ar()
is assuming a constant mean whereas your gls() has a linear effect of
time, so they're actually trying to fit different models.
> 2. Is there a way to actually fit such a model, say, by adjusting
> tolerances?
Don't know. I would start by debugging my way through the gls code to
say where it is actually breaking. I don't think I have much to say
other than what I already said in the SO thread you reference below.
>
> 3. a related thread
>
(http://stackoverflow.com/questions/6706143/error-with-gls-function-in-nlme-package-in-r)
>
>
> indicates that there are too many parameters being estimated given the
> length of the time series.
>
> Is there a numerical criterion that can be queried before the gls call
> so that I can test for invertibility and fit, say, a lower order model
> (e.g., AR(1)), so as to avoid an error?
You could certainly wrap this attempt in a try() or tryCatch() clause
so that it wouldn't break your simulation run and you could then fall
back on AR(1) ...
>
> I cannot figure out from the code for gls what actually generates the
> error message. A pointer to the specific numerical criterion that
> generates this error would be very much appreciated.
Unpacking the nlme source code and searching for the error message
finds it on line 557 of src/corStruct.c:
F77_CALL(dqrdc2) (coef, &P, &P, &P, &sqrt_eps, &i, qraux, pivot, w\
ork);
if (i < P)
error(_("Coefficient matrix not invertible" ));
https://svn.r-project.org/R/trunk/src/appl/dqrdc2.f
says that dqrdc2 is a Householder transformation, but you'd really have
to figure this out for yourself ...
the i parameter corresponds to a k argument inside the FORTRAN
function: "k contains the number of columns of x judged to be linearly
independent." so it looks like you end up with a rank-deficient matrix
at some point.
>
> In the stackoverflow thread, this is mentioned:
>
> The general rule of thumb is that you should have at least 10 times as
> many data points as parameters, and that's for standard fixed
> effect/regression parameters. (Generally variance structure parameters
> such as AR parameters are even a bit harder/require a bit more data than
> regression parameters to estimate.)
>
> Where in the literature is this rule discussed?
I got it from Frank Harrell's _Regression Modeling Strategies_ book.
> 4. If the only option is to fit a lower order model of error, say,
> AR(1), instead of AR(2), what kind of bias does this generate in the
> analysis?
>
Don't know. Since you're running simulations, you could find out ...
>
>
> Many thanks in advance,
>
> S.
>
--
Steven Orzack
Fresh Pond Research Institute
173 Harvey Street
Cambridge, MA 02140
617 864-4307
www.freshpond.org
More information about the R-sig-mixed-models
mailing list