[R-sig-Geo] nearest neighbour list from distance matrix, etc.

Sat Jun 6 18:49:39 CEST 2009

On Tue, 26 May 2009, Wakefield, Ewan wrote:

> Hi everyone,
>
> I have been constructing simple linear models of seabird colony size as 
> a function of habitat availability. Given that my data obviously contain 
> a spatial component, I would like to check whether both my response and 
> residuals exhibit any spatial auto-correlation. I understand that I can 
> test for this using Moran's I, calculated for pairs of colonies 
> separated into different distance classes. However, this presents a 
> number of problems:
>
> 1. All of the distance based methods I have come across for defining 
> nearest neighbours require a matrix of point coordinates. The species I 
> am working with do not fly over land, so the Euclidean or greater circle 
> distance between colonies is not really appropriate. Hence, I have 
> computed a matrix of the 'at sea' distance between all colonies. Is 
> there any way to pass this to a function that defines nearest 
> neighbours?

No, you cannot pass it to for example dnearneigh() in spdep. You would 
need to construct an "nb" neighbours object manually, something like:

library(spdep)
example(columbus)
col_d <- as.matrix(dist(coordinates(columbus)))
# create a distance matrix
nb_d062 <- dnearneigh(coordinates(columbus), 0, 0.62)
# to set a benchmark example using the same coordinates
col_d062 <- apply(col_d, 1, function(x) which(x <= 0.62 & x > 0))
# extract the column indices for the same criteria, dropping self
any(sapply(col_d052, length) < 1)
which(sapply(col_d052, length) < 1)
# check that every entity included (case not treated here, but empty
# vector needs replacing with integer value of 0)
col_d062a <- lapply(col_d062, function(x) {names(x) <- NULL; x})
names(col_d062a) <- NULL
# remove index names
class(col_d062a) <- "nb"
attr(col_d062a, "region.id") <- as.character(1:nrow(col_d))
col_d062a
n.comp.nb(col_d062a)$nc
nb_d062
n.comp.nb(nb_d062)$nc

The neighbour objects are the same, both with two components. This case is 
for a single distance band, you could use different intervals in the 
selection command for alternative bands.

>
> 2. I only have data for 48 colonies and they are clustered in space. As 
> such, I suspect I will end up with either many distance classes with 
> none or very few pairs of colonies in them or just one or two distance 
> classes with a larger number of pairs of colonies. Is there any rule of 
> thumb for how many data are required for a reasonable estimate of 
> Moran's I? Indeed, is Moran's I even appropriate in this case?

All observations ought to be included, but can form multiple graph 
components, so you most likely do not have to worry about this problem.

With regard to Moran correlograms, it could well be that including a 
couple of pairs of observations will make the estimated variance of the 
statistic very large, so it will not be significant. Note that 
correlograms also suffer from multiple comparisons, so adjustments may be 
required anyway. If only one or two bands have many pairs, why not go with 
those (or a single band and avoid adjustment)?

Hope this helps,

Roger

>
> Sorry not to include any code, but that doesn't seem appropriate to my 
> question.
>
> Kind regards,
>
> Ewan Wakefield (PhD Student)
> British Antarctic Survey
> High Cross
> Madingley Road
> Cambridge
> CB3 0ET
> UK
>
> tel. +44(0)1223 221215
> website www.antarctica.ac.uk
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no