[R-sig-Geo] dnearneigh, knearneigh and SAR - not removing spatial autocorrelation

Roger Bivand Roger.Bivand at nhh.no
Thu Apr 28 16:55:30 CEST 2011

```On Wed, 27 Apr 2011, Chris Mcowen wrote:

> Hi Roger,
>
> Thanks for this, as previously mentioned i am relatively new to this, i
> started with a GLS method but then realised it may not be the most
> effective so i decided to try SAR methods.
>
>> What definition of neighbour are you using in the correlogram?
>

You have not said where the correlogram is coming from. Please say whether
it is from pgirmess, or (probably) from ncf? Once you establish that,
you'll know more about the spatial process, which obviously isn't the one
you are modelling.

>
> I call the SAR as follows :- sem.nb1.5.w<-errorsarlm(ols2,
> listw=nb1.5.w, zero.policy=F, na.action=na.omit)
>
> Then extract the residuals: - res.sem.nb1.5.w <- residuals(sem.nb1.5.w)
>
> Then use these to construct the correlogram: -
> cor.sem.nb1.5.w<-correlog(data\$X, data\$Y, z=residuals(sem.nb1.5.w),
> na.rm=T, increment=1, resamp=1)
>
>
>> The correlogram obviously doesn't care about no-neighbour observations,
>> so perhaps do the same?
>
> How do i do this? If i set the dmax too low i get "Empty neighbour sets
> found" so it wont run the model. I feel i am artificially setting the
> dmax based on the fact it works rather than any scientific rationale.
>
> I tried a method used in your example:
>
>> coords <- coordinates(columbus)
>> rn <- sapply(slot(columbus, "polygons"), function(x) slot(x, "ID"))
>> k1 <- knn2nb(knearneigh(coords))
>> col.nb.0.all <- dnearneigh(coords, 0, all.linked, row.names=rn)
>
> To join everything on a NN basis. But that resulted in little difference
> in the correlgram and AIC.
>
> So my two questions are:
>
> Do i have to link every neighbour and if not how to i run the model
> without getting the error and how do i decide which neighbours not to

You do not have to, and if you do not want to, use the zero.policy=
argument to permit analysis with weights objects including no-neighbour
observations. It all depends on your model of the spatial process. If say
biology tells you that interaction beyond 100m is unlikely, use 100m.
Alternatively, choose a graph-based definition of neighbours, or possibly
k-nearest, if you have a reason for choosing k.

>
> Will it make a large difference if the correlgram doesn't change much?
> Does this mean the model is not "dealing" with the correlation
> effectively and how do i improve this?

By thinking about what the correlogram function is doing in expressing the
spatial process it thinks it sees - it isn't magic, and both
representations are reasonable (autocorrelation present with the
representation in the correlogram function, and autocorrelation absent
with your distance based representation and row-standardised weights).

Roger

>
> Thanks again and sorry for my relatively naive questions!
>
> Chris
>
>
> On 27 Apr 2011, at 21:16, Roger Bivand wrote:
>
> On Wed, 27 Apr 2011, Chris Mcowen wrote:
>
>> Dear list,
>>
>> I was wondering if somebody could possibly offer me some advice?
>>
>> I have a dataset where there is spatial autocorrelation present - visible from the correlogram. So i tried to remove it using a SAR. My data is for 458 regions of differing size for which i have long - lat co-ordinates.
>>
>>
>> coords<-cbind(data\$X,data\$Y)
>> coords<-as.matrix(coords)
>>
>> The first approach was to use dnearneigh to set up the neighbourhood. I am very new to this and was having problems as regions would often appear with no links ( see below) so i upped dmax until this no longer occurred - this maybe a incorrect method?
>>
>>> dnearneigh(coords, 0, 1500, row.names = NULL, longlat = TRUE)
>> Neighbour list object:
>> Number of regions: 458
>> Number of nonzero links: 8990
>> Percentage nonzero weights: 4.285769
>> Average number of links: 19.62882
>> 2 regions with no links:
>> 235 236
>>
>> Results in - Empty neighbour sets found
>>
>>> dnearneigh(coords, 0, 2000, row.names = NULL, longlat = TRUE)
>> Neighbour list object:
>> Number of regions: 458
>> Number of nonzero links: 14200
>> Percentage nonzero weights: 6.769512
>> Average number of links: 31.00437
>>
>> I then converted this to a weight matrix and used in my SAR.
>>
>> nb1.5 <- dnearneigh(coords, 0, 2000, row.names = NULL, longlat = TRUE)
>> nb1.5.w<-nb2listw(nb1.5, glist=NULL, style="W", zero.policy=FALSE)
>>
>> However, looking at the correlogram and the AIC ( below) it seems to not have made a huge difference
>>
>> AIC: -2581.8, (AIC for lm: -2574)
>
> With relatively large numbers of neighbours, you smooth more. What definition of neighbour are you using in the correlogram? Why not just use the same? The correlogram obviously doesn't care about no-neighbour observations, so perhaps do the same? If you want to link every observation in, why not then down-weight distant neighbours using inverse distance weights - see ?nbdists.
>
> The test neighbour definition and the weights used for model fitting do not match. Further, you don't say whether your correlogram is for the OLS model residuals or just the response variable. If the latter, the explanatory variables may co-vary in space with the response, so the residuals are in fact not spatially autocorrelated.
>
> Hope this helps,
>
> Roger
>
>>
>> So i tried defining my neighbourhood using the knearneigh function but that made very little difference
>>
>> test <- knearneigh(coords, k=1, longlat = NULL, RANN=TRUE)
>> knn2nb(test, row.names = NULL, sym = FALSE)
>> k1 <- knn2nb(knearneigh(coords))
>> col.nb.0.all <- dnearneigh(coords, 0, all.linked, row.names=rn)
>>
>> AIC: -2581.8, (AIC for lm: -2574)
>>
>> IS there something i am doing wrong or is there a step i am not doing?
>>
>> Any help would be gratefully received.
>>
>> Chris
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of