[R-sig-Geo] Different Moran's I correlograms using correlog{ncf} and sp.correlogram{spdep}

Thu Jan 29 16:17:13 CET 2009

On Fri, 30 Jan 2009, Xingli Giam wrote:

> Dear Roger,
>
> So what you trying to say about correlog{ncf} is that after the bands 
> are computed, all the correlations between data points whose distance 
> are within the bin are computed. So I suppose it is a binary kind of 
> specification, if pairwise distances are in the bin, the correlations 
> will be computed and all these correlations are considered equally in 
> computing Moran's I.
>
> So should the correlogram obtained by correlog{ncf} same as the one 
> obtained by sp.correlogram(spdep), when the neighbourhood upperlimit is 
> same as the increment level of the former.

No, because correlog{ncf} bins the distances, but sp.correlogram{spdep} 
adds higher order lags - here the second lag of i (with first order 
neighbours j) is the set of all points k that are neighbours of points j 
and neither i nor in j. It depends on graph edge counts, not distance as 
such. They aren't comparable, really. If you want to do it by hand, check 
the distances reported by correlog() (they may not be what you think), 
take the bin boundaries, plug them into multiple calls to dnearneigh(), 
and go from there.

Roger

>
> E.g.,
>
> #Correlogram from correlog{ncf}, where slm1600 is the chosen spatial model
>
> correlog.200b <- correlog(dat$x_long, dat$y_lat, residuals(mod1), na.rm=T,
> increment=200, resamp=99, latlon=T)
>
> # Plotting the first 50 bins
> plot(correlog.200b$correlation[1:50], type="b", pch=16, cex=1.5, lwd=1.5,
> col="red", xlab="distance (10^2 km)", ylab="Moran's I", cex.lab=2,
> cex.axis=1.5, ylim=c(-1,1)); abline(h=0)
>
> #should be the same as
>
> nb200 <- dnearneigh(as.matrix(dat$x_long, dat$y_lat), 0, 200, longlat=T)
> ME200.listw <- nb2listw(nb200, style="B", zero.policy=T)
>
> correl<-sp.correlogram(nb200, residuals(mod1), order = 50, method = "I", style
> = "W", randomisation = TRUE, zero.policy = TRUE, spChk=NULL)
>
> plot(correl1600),
>
> But the results of Moran's I are very different.
>
> Here are the results:
>
> For sp.correlogram(),
> Spatial correlogram for residuals(mod1)
> method: Moran's I
>      estimate expectation    variance standard deviate Pr(I) two sided
> 1   0.06882699 -0.00123153  0.00021800           4.7449       2.086e-06 ***
> 2   0.03190221 -0.00127877  0.00027178           2.0127       0.0441461 *
> 3   0.02914748 -0.00133511  0.00028490           1.8059       0.0709262 .
> 4   0.04458328 -0.00137363  0.00029717           2.6659       0.0076772 **
> 5   0.05503366 -0.00139665  0.00031539           3.1775       0.0014854 **
> 6   0.02696152 -0.00143062  0.00032685           1.5704       0.1163121
> 7   0.06049141 -0.00144718  0.00036181           3.2563       0.0011289 **
> 8   0.02526273 -0.00146413  0.00037732           1.3759       0.1688468
> 9   0.02901564 -0.00147493  0.00039208           1.5399       0.1235966
> 10  0.06624244 -0.00148810  0.00039586           3.4042       0.0006637 ***
> 11  0.04189714 -0.00151057  0.00041899           2.1206       0.0339535 *
> 12 -0.00321915 -0.00154560  0.00045316          -0.0786       0.9373374
> .........and so on
>
> For correlog():
> $correlation (just first 4 bins)
> 0.381911078  0.356233456  0.322664149  0.304275005
>
> How should we reconcile the differences?
>
> Many thanks,
> Xingli
>
>
>
>
> Quoting Roger Bivand <Roger.Bivand at nhh.no>:
>
>> On Thu, 29 Jan 2009, Xingli Giam wrote:
>>
>>> Hello List,
>>>
>>> I have been bothered by this problem for the entire day, hope someone
>>> can give me a hand here. Thanks in advance for reading! It could be
>>> really simple and basic so hope you guys bear with it.
>>>
>>> We often plot Moran's I spatial correlograms that contrast glm residuals
>>> and spatial model residuals (after incorporating spatial autocorrelation
>>> (SAC) via SEVM, autocov reg, etc) over lag distance classes.
>>
>> A spatial correlogram is not a well-defined concept. One view is that the
>> inter-pair distances should be binned, and the output computed for the
>> pairs of observations in each bin - equivalent to defining widening rings
>> of neighbours for each observation. There are issues that arise here when
>> some observations have no neighbours in a given bin - should one adjust
>> the values used to calculate E(I) and var(I)? This is in correlog() in
>> ncf, and to a certain extent in pgirmess.
>>
>> Another view is that the first order set of neighbours should be raised to
>> higher orders irrespective of how the set was defined: i and j are first
>> order neighbours, i and k are second order neighbours if j and k are first
>> order neighbours. This isn't quite matrix powering, because if you are
>> already a first order neighbour, you can't be a second order as well
>> (discussed in detail in Cliff & Ord 1981, pp. 118-122). See also ?nblag to
>> see what is going on. correlog() in pgirmess uses nclass.Sturges() to
>> choose the number of bins, but does not fully protect against bins with
>> no-neighbour observations.
>>
>>>
>>> After deciding the most appropriate neighbourhood distance (1600 km -
>>> mine's a global dataset), I created a list of weights via nb2listw that
>>> is an inverse function of distance (1/distance). I then performed SEVM
>>> using the list of weights, using the brute force ME() algorithm to
>>> extract eigenvectors that reduce SAC.
>>>
>>> I plan to report the Moran's I value given from moran.test() on both my
>>> (non- spatial) GLM and Spatial GLM to prove that the spatial model has
>>> reduced autocorrelation.
>>>
>>> In addition to that, I plan to show the correlograms of the model
>>> residuals both non-spatial and spatial GLMs.
>>>
>>> I would like to ask which function is a better choice for showing the
>>> correlograms. Should I use correlog{ncf} or sp.correlogram{spdep}. From
>>> what I know, calculating Moran's I require definition of a neighbourhood
>>> and a specific weighting scheme. Does anyone know what is the default
>>> neighbourhood and weighting scheme of correlog()? Is it - if I specify
>>> an increment of 100km, then, all points within 0-100, and then 100-200,
>>> etc, are considered neighbours with a binary weighting scheme for each
>>> distance class?
>>
>> The ncf correlog() uses many nxn matrices, and computes the bands by:
>>
>> dkl <- ceiling(dmat/increment)
>>
>> where dmat is a lower triangle distance matrix, increment is the band
>> width, and dkl has the same form as dmat (it says which distance band the
>> moran pairs in the lower triangle crossproduct matrix used belong to).
>>
>> As you can see, the source code is the documentation here.
>>
>>>
>>> For sp.correlogram, can I specify the nb object defined by min:0km,
>>> max:100km, and use a lag order of, for example 50, to plot the
>>> correlogram of my spatial model (which has a neighbourhood distance of
>>> 1600 km)?
>>>
>>> For both functions, it seems that there is no way to incorporate my
>>> inverse- distance weighting scheme that I used in my spatial models. Is
>>> this of practical concern if I intend to plot my correlograms?
>>>
>>> Lastly I have a side question regarding correlograms. When we see
>>> distances of, e.g., 100, 200, 300, etc (the increment or lag distance is
>>> 100), on the X-axis of a spatial correlogram, does the point at 200
>>> refer to correlations between neighbours of distance 100-200km apart? Or
>>> does it include the first 100km as well (i.e., correlations between
>>> neighbours of distance 0-200km apart)?
>>
>> In general, the pairs of distances within that band (or lag order). But
>> check the code to be sure.
>>
>> Roger
>>
>>>
>>> Sorry for the long post. Many thanks for reading, I really appreciate
>>> it. I know some are really basic questions but I can't find any answers
>>> to my questions on google or searching through the list.
>>>
>>> Many thanks,
>>> Xingli
>>>
>>>
>>
>> --
>> Roger Bivand
>> Economic Geography Section, Department of Economics, Norwegian School of
>> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
>> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no