[R-sig-Geo] error in spautolm spdep and model selection?

Sun Jan 18 19:55:42 CET 2009

On Sun, Jan 18, 2009 at 2:41 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
> On Sat, 17 Jan 2009, Michael Denslow wrote:
>
>> Dear r-sig-geo,
>>
>> I am attempting to compare the results from an lm() regression with an
>> appropriate spatial regression.

As Roger mentioned, without spatially autocorrelated residuals the
spatial regression is not the 'appropriate' model.

>> I have run into a problem when using the
>> spautolm function in spdep. When I include a polynomial term in the model I
>> get the following error message:
>>
>>> SAR <- spautolm(log10ns2a ~ log10a2a + floras2a$Field.Year +
>>
>> floras2a$mid.elev +  I(floras2a$mid.elev^2),
>> listw = nb72km.w)

Note that it's safer to use the 'data' argument rather than using '$'
within model formulas

>>
>> Error in solve.default(t(X) %*% as.matrix(IlW %*% X), tol = tol.solve) :
>>  system is computationally singular: reciprocal condition number =
>> 2.07505e-17
>>
>> I am not sure how to interpret this. The function seems to work fine if I
>> remove the polynomial term.
>
> For spautolm, this is a new report, and I'll try to find out how to make the
> error message clearer. What is happening is the when the squared term is
> included, X' (I - \lambda W)' \Omega (I - \lambda W) X is singular. My guess
> would be that:
>
> SAR <- errorsarlm(log10ns2a ~ log10a2a + floras2a$Field.Year +
> floras2a$mid.elev +  I(floras2a$mid.elev^2), listw = nb72km.w)
>
> might very well fail too - almost certainly floras2a$mid.elev and
> I(floras2a$mid.elev^2) are highly correlated. spautolm does check whether
> the X variables are aliased, but using QR, rather than the less robust
> solve.default(). Can you try changing the scale of mid.elev, which will then
> change the scale of its square?

Also, orthogonal polynomial terms can be created automatically using
'poly' within model formulas

>
> If you believe that the variable scaling is as it should be, and want to
> proceed, set the tol.solve= argument in spautolm to a value less than
> 2.07505e-17, such as 1e-17. In principle, the internal function should not
> use solve.default(), but this is the first report of it causing trouble
> since spautolm was introduced in late 2005.
>

It does seem suspicious that just adding a squared term causes
singularity.  I wonder if there's something else going on with the
model matrix or weights

>>
>> Perhaps to larger question is which is the appropriate SAR/CAR model to
>> use for my data. I realize that this is not an R question but any help would
>> be most appreciated! My data has some spatial autocorrelation in the
>> predictor variables. The residuals of the lm() model are not autocorrelated.
>> Also the elevation variable is strongly autocorrelated and directional in my
>> data set. As I understand it this would rule out CAR as an option.

Tobler's law guarantees that your spatial predictors will be
autocorrelated.  If spatial variables were random the world would be a
very strange place indeed!

We don't know enough about your study to offer statistical advice
(e.g. I assume it's regional/lattice/areal data?), and this really
isn't the place for consulting, but I'll reiterate what Roger said in
different terms.

The problem with spatially autocorrelated residuals in an lm model is
that you are not meeting the assumption of independence; points close
together contain redundant information and you overestimate the
degrees of freedom available for formal inference.  This not only
provides anti-conservative SEs, but when GLS estimators are used (as I
believe they are under the hood of spautolm), it also changes the
estimated coefficients.  If there is no autocorrelation in the
residuals this suggests the conditional mean part of the model has
already explained the spatial structure in the response and you don't
have lack of independence (at least not due to spatial
autocorrelation).  I've never seen a spatial regression where I
believed there was truly no spatial autocorrelation in the residuals,
but formal inference is full of approximations and if we can get close
to no autocorrelation then we can get close to an optimal estimator.
The tricky part is looking for the spatial structure in the residuals
-- it can come in many forms.

hth,

Kingsford Jones

>
> The models you are fitting are trying to harvest spatial structure in the
> residual. If there isn't any spatial structure to harvest, lm() is quite
> good enough. Spatial pattern in the X variables is not a problem, especially
> if it matches the spatial patterning in the response. Do you have a
> reference for your claims above, they sound like heresay to me? In this
> case, the only interesting spatial patterning is in the residuals - if they
> are clean, and you beleive that your model is well specified, there is no
> spatial story left.
>
> Best wishes,
>
> Roger
>
> PS. access to your data set might be helpful in inserting a better error
> message into spautolm.
>
>>
>> Thank you in advance for any help you can provide,
>> Michael
>>
>>
>>
>>
>> Michael Denslow
>>
>> I.W. Carpenter Jr. Herbarium [BOON]
>> Appalachian State University
>> Boone, North Carolina U.S.A.
>>
>> -- AND --
>>
>> Communications Manager
>> Southeastern Regional Network of Expertise and Collections
>> sernec.org
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> --
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian School of
> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>