[R-sig-ME] time series with autcorrelated errors - matrix not invertible

Mon Aug 1 17:16:59 CEST 2016

On 16-07-31 12:27 PM, Steven Orzack wrote:
>  Ben, Many thanks for the informative reply. I will work on this in the
> next few days and update. Once upon a time, I knew about Householder
> transformations. I will try to dust off that part of my brain.....
> 
> I am especially interested in the fact that ar and gls are using
> different algorithms. If you could expand on that, it would very
> helpful. I will also investigate.
> 
> any information and thoughts are most welcome.

  gls uses generalized least squares -- described in detail in Pinheiro
and Bates 2000 -- basically, nonlinear optimization over the correlation
parameter, minimizing the sums of squares of the 'whitened' residuals.
   For ar, see ?ar: the default solves the Yule-Walker equations (see
ar.yw) ...

  As I hinted, if I had to do this I would (1) build my own brute-force
AR solver (probably based on nlme::corARMA), (2) use try() to catch the
cases where AR(2) failed and revert to AR(1)

> 
> 
> 
> S.
> 
> On 16-07-29 06:12 PM, Steven Orzack wrote:
> 
>> I am analyzing time series of a statistic (coefficient of variation or
>> CV) generated by a Monte Carlo simulation of population dynamics. A
>> given simulation might generate thousands of such time series.
>>
>> For any given time series, the sequential estimates of CV are correlated
>> with one another (because they are based on partially overlapping sets
>> of the population numbers). Accordingly, I am using gls models with
>> autocorrelated errors to analyze each time series.
>>
>> Although the stochastic process generating all sample paths is the same,
>> any given sample path might consist of a time series of CV for which
>> AR(0), AR(1), AR(2), etc. generates the smallest AIC value, as produced
>> by ar() (see below).
>>
>> In practice, I have seen time series for which AR(0), AR(1), AR(2)
>> generate the smallest AIC value and so always just using a specific
>> order (e.g., AR(1)) does not seem appropriate. This is especially so
>> because I am developing methods that would apply to time series for
>> which the generating dynamics are not known.
>>
>> For any given sample path y, the code is
>>
>> #assess order of AR model using ar. use order with smallest AIC
>> arorder <- ar(moving_statistics$CV, order.max = 5)$order
>>
>> #evaluate model with constant and time
>> if (arorder < 1) {
>> summary(int_model <- lm(CV ~ 1, data = moving_statistics))
>> summary(time_model <- lm(CV ~ 1 + time, data = moving_statistics))
>>
>> #extract statistics for sample path y
>> pvalue_cv$pvalue[y] <- anova(int_model,time_model)$Pr
>> pvalue_cv$slope_time[y] <- time_model$coefficients[2]
>> } else {
>> summary(int_model <- gls(CV ~ 1, data = moving_statistics, correlation =
>> corARMA(p=arorder), method="ML"))
>> summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
>> correlation = corARMA(p=arorder), method="ML"))
>> #extract statistics for sample path y
>> pvalue_cv$pvalue[y] <- anova(int_model,time_model)$p[2]
>> pvalue_cv$slope_time[y] <- time_model$coefficients[2]
>> }
>>
>>
>> When I run the simulation, there are some sample paths for which AR(2)
>> has the lowest AIC and so this is the model called, as in
>>
>> summary(time_model <- gls(CV ~ 1 + time, data = moving_statistics,
>> correlation = corARMA(p=2), method="ML"))
>>
>>
>> This call appears to always generate this error message
>>
>> Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) :
>>   Coefficient matrix not invertible
>>
>> Here are my questions:
>>
>> 1. how does ar generate an AIC value for p =2 even though the
>> coefficient matrix is not invertible and so the model fit fails?
> 
>   gls and ar are using different algorithms.  Furthermore, I think ar()
> is assuming a constant mean whereas your gls() has a linear effect of
> time, so they're actually trying to fit different models.
> 
> 
>> 2. Is there a way to actually fit such a model, say, by adjusting
>> tolerances?
> 
>   Don't know.  I would start by debugging my way through the gls code to
> say where it is actually breaking.  I don't think I have much to say
> other than what I already said in the SO thread you reference below.
>>
>> 3. a related thread
>>
> (http://stackoverflow.com/questions/6706143/error-with-gls-function-in-nlme-package-in-r)
> 
>>
>>
>> indicates that there are too many parameters being estimated given the
>> length of the time series.
>>
>> Is there a numerical criterion that can be queried before the gls call
>> so that I can test for invertibility and fit, say, a lower order model
>> (e.g., AR(1)), so as to avoid an error?
> 
>   You could certainly wrap this attempt in a try() or tryCatch() clause
> so that it wouldn't break your simulation run and you could then fall
> back on AR(1) ...
> 
>>
>> I cannot figure out from the code for gls what actually generates the
>> error message. A pointer to the specific numerical criterion that
>> generates this error would be very much appreciated.
> 
>   Unpacking the nlme source code and searching for the error message
> finds it on line 557 of src/corStruct.c:
> 
>    F77_CALL(dqrdc2) (coef, &P, &P, &P,  &sqrt_eps, &i, qraux, pivot, w\
> ork);
>             if (i < P)
>                 error(_("Coefficient matrix not invertible" ));
> 
> 
> https://svn.r-project.org/R/trunk/src/appl/dqrdc2.f
> 
> says that dqrdc2 is a Householder transformation, but you'd really have
> to figure this out for yourself ...
> 
>   the i parameter corresponds to a k argument inside the FORTRAN
> function: "k contains the number of columns of x judged to be linearly
> independent." so it looks like you end up with a rank-deficient matrix
> at some point.
>>
>> In the stackoverflow thread, this is mentioned:
>>
>> The general rule of thumb is that you should have at least 10 times as
>> many data points as parameters, and that's for standard fixed
>> effect/regression parameters. (Generally variance structure parameters
>> such as AR parameters are even a bit harder/require a bit more data than
>> regression parameters to estimate.)
>>
>> Where in the literature is this rule discussed?
> 
>   I got it from Frank Harrell's _Regression Modeling Strategies_ book.
> 
>> 4. If the only option is to fit a lower order model of error, say,
>> AR(1), instead of AR(2), what kind of bias does this generate in the
>> analysis?
>>
> 
>    Don't know.  Since you're running simulations, you could find out ...
>>
>>
>> Many thanks in advance,
>>
>> S.
>>
>