[R-sig-Geo] spautolm - sanity check
Brian Oney
zenlines at gmail.com
Wed Feb 9 21:03:51 CET 2011
Hello All,
I have gone through ASDAR and hope that I have properly applied it's
content.
I have water color data sampled from polygons, which lie at the mouths
of a rivers around an island. Due to ocean currents, bathymetry, and
wind etc., I have the need to account for spatial autocorrelation for
those polygons which are near to each other. I am trying to build a
statistical model to explain the correlations between land cover and
water quality.
Therefore:
### Coordinates of the polygons
coords <- coordinates(outlets)
### k=2 - assume that each outlet point could be/is possibly only
affecting the two nearest outlets
outlets.nb <- knn2nb(knearneigh(coords,k=2),row.names=outlets$BASIN_ID)
### We assume that SA is related to distance between outlets
### Calculate distance (in meters, the units of our projection)
distance.nb <- nbdists(outlets.nb, coords)
### According to ASDAR (Page 254) this shouldn't be a problem because
these distances are not on a grid but are asymmetrically placed in space,
### but we do ignore the dynamics physical processes that might
transport water (currents, wind, bathymetry etc.) and assume equal
diffusion/dispersal in all directions.
### Invert the distances: the bigger the distance the smaller the
weight. Weights are proportional to the inverse distance
invert.dist <- lapply(distance.nb, function(x) 1/x)
(outlets.lw <- nb2listw(outlets.nb, glist=invert.dist, style='B'))
OK, is this a correct structure?
It may be better to just base it on distance (dnearneigh) i.e. assume
that beyond 50km distance a nearest neighbor has no influence.
Any comments, suggestions, insults?
## Give it a whirl.
moran.test(response, listw=outlets.lw, randomisation=F)
Moran's I test under normality
data: response
weights: outlets.lw
Moran I statistic standard deviate = 2.8826, p-value = 0.001972
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.48914375 -0.02325581 0.03159666
### Tweaking suggests that heteroscedasticity is present as well the
data not being normally distributed. Then a boxcox transformation
suggests a 1/sqrt(response) transformation but it is difficult to
interpret so I opt for a log transformation.
> moran.test(log(response), listw=outlets.lw, randomisation=F)
Moran's I test under normality
data: log(response)
weights: outlets.lw
Moran I statistic standard deviate = 2.0702, p-value = 0.01922
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.34473683 -0.02325581 0.03159666
Moran's I changed quite a bit here. This is due to non-normality, right?
It appears the log takes most of the heteroscedasticity out and
transforms the distribution to almost normally ditributed (seen via
plot(model)).
### Continuing
### after narrowing down the explanatory variables while removing the
'worst' of those that are collinear (rho>0.5).
sp.model <- spautolm(log(response) ~., data=data, listw=outlets.lw)
I used the SAR default.
Also, without narrowing my explanatory variables down to a bare minimum,
I usually run into the error:
Error in solve.default(crossprod(X, as.matrix(IlW %*% X)), tol =
tol.solve) :
system is computationally singular: reciprocal condition number =
2.85163e-019
Which still exists even when the same non-spatial (just lm) model
contains no singularities. I get this often when adding an interaction
term to the model.
I am worried that the spatial structure has been misspecified, when
looking at the other post which addresses this same error:
http://r-sig-geo.2731867.n2.nabble.com/error-in-spautolm-spdep-and-model-selection-td2766159.html
> sp.model <- spautolm(form, data=borneo.data,listw=outlets.lw)
> summary(sp.model,Nagelkerke=T)
Call: spautolm(formula = form, data = borneo.data, listw = outlets.lw)
Residuals:
Min 1Q Median 3Q Max
-1.1164557 -0.3936249 0.0074938 0.3652388 1.4859714
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.2321e+00 1.6446e+00 -5.6135 1.983e-08
p_bare 1.2761e+00 2.5148e-01 5.0743 3.889e-07
mean_rfact 1.0769e-03 2.1112e-04 5.1010 3.378e-07
km_roads 8.0991e-05 3.3864e-05 2.3917 0.016772
p_bare:mean_rfact -1.5911e-04 3.1192e-05 -5.1009 3.381e-07
p_bare:km_roads -1.5490e-05 5.9758e-06 -2.5922 0.009537
Lambda: 3040.2 LR test value: 0.94862 p-value: 0.33007
Log likelihood: -44.52488
ML residual variance (sigma squared): 0.43704, (sigma: 0.66109)
Number of observations: 44
Number of parameters estimated: 8
AIC: 105.05
Nagelkerke pseudo-R-squared: 0.54572
So there seems to be little spatial autocorrelation in the residuals.
### This is how it looks in non-spatial manner.
summary(model);AIC(model)
Call:
lm(formula = response ~ p_bare + mean_rfact + km_roads +
p_bare:mean_rfact +
p_bare:km_roads, data = borneo.data)
Residuals:
Min 1Q Median 3Q Max
-0.7111 -0.3342 -0.1216 0.3065 2.0646
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.067e+00 1.378e+00 -2.951 0.00539 **
p_bare 6.878e-01 2.142e-01 3.210 0.00270 **
mean_rfact 6.132e-04 1.766e-04 3.472 0.00130 **
km_roads 6.282e-05 2.863e-05 2.194 0.03439 *
p_bare:mean_rfact -8.611e-05 2.660e-05 -3.237 0.00251 **
p_bare:km_roads -1.091e-05 5.067e-06 -2.154 0.03767 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.559 on 38 degrees of freedom
Multiple R-squared: 0.4248, Adjusted R-squared: 0.3491
F-statistic: 5.613 on 5 and 38 DF, p-value: 0.0005723
## and then...
lm.morantest(model.log,listw=outlets.lw)### Moran's I is big
Global Moran's I for regression residuals
data:
model: lm(formula = log(response) ~ p_bare + mean_rfact + km_roads +
p_bare:mean_rfact + p_bare:km_roads, data = borneo.data)
weights: outlets.lw
Moran I statistic standard deviate = 1.1192, p-value = 0.1315
alternative hypothesis: greater
sample estimates:
Observed Moran's I Expectation Variance
0.15521190 -0.04488877 0.03196742
So SA may or may not exist. It doesn't appear significant here, however,
the R2 values increase in the spatial model and the AIC drops. The
spatial model seems better, but is it overkill?
Thanks for the comments, suggestions, or insults!
Regards,
Brian
More information about the R-sig-Geo
mailing list